# Python Basics II

In this notebook, we will work with the following:

- Functions and methods.
- Mutability.
- Control structures.
- Convenience syntax.

In [1]:
# imports

import pandas as pd
import numpy as np

# Functions and methods

Functions are objects that take input and map that input to a particular output for a given input.
Methods are functions that are bound to particular classes of objects.

Like other basic topics, we're staying at a cursory, familiarization level, but the [documentation](https://docs.python.org/3/tutorial/controlflow.html#defining-functions) has the details.

In [2]:
def f_to_c(temp_f):
    return (temp_f - 32) * 5/9

In [3]:
print(f'Freezing:   {f_to_c(32)}')
print(f'Just right: {f_to_c(72)}')
print(f'Oklahoma:   {f_to_c(117)}')

Freezing:   0.0
Just right: 22.22222222222222
Oklahoma:   47.22222222222222


Methods are like functions, except that we reference the object, and then tend to operate on the object itself.

In [4]:
c_string = 'MY CAPS LOCK KEY IS BROKEN, APPARENTLY.'
print(c_string)

MY CAPS LOCK KEY IS BROKEN, APPARENTLY.


In [5]:
# Using the lower method.
c_string.lower()

'my caps lock key is broken, apparently.'

In [6]:
# Look at the attributes and methods of c_string.
dir(c_string)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


In [7]:
# Let's try using .replace() chained with .lower().
c_string.lower().replace('broken', 'fixed')

'my caps lock key is fixed, apparently.'

# Mutability

As we talked about before, objects can be mutable or immutable.
Basic types tend to be immutable, and data structures tend to be mutable.

## String example

In [8]:
d_string = 'I am IMMUTABLE!'
d_string.lower()

'i am immutable!'

In [9]:
# Notice that it did not actually change.
d_string

'I am IMMUTABLE!'

In [10]:
# Because it's immutable, we'd need to assign the name d_string to the new lowercase object.
d_string = d_string.lower()

In [11]:
d_string

'i am immutable!'

## List example

In [12]:
c_list = [1, 2, 3, 4]
d_list = c_list

In [13]:
c_list

[1, 2, 3, 4]

In [14]:
d_list

[1, 2, 3, 4]

In [15]:
# Appending a new list element.
c_list.append(5)

In [16]:
# And there it is.
c_list

[1, 2, 3, 4, 5]

In [17]:
# But it's here, too. Both names point to the same object.
d_list

[1, 2, 3, 4, 5]

In [18]:
c_list is d_list

True

In [19]:
# If we want to avoid this, we can assign a name to a copy.
d_list = c_list.copy()

In [20]:
c_list.append(6)

In [21]:
c_list

[1, 2, 3, 4, 5, 6]

In [22]:
# Notice that they're now pointing to different objects.
d_list

[1, 2, 3, 4, 5]

In [23]:
c_list is d_list

False

# Control structures

## If-then

The `if` statement allows us to test some condition and then do something if it is true.

In [24]:
a = 5

In [25]:
# Note that this does not print anything.
if a > 5:
    print('a is greater than five.')

In [26]:
a > 5

False

In [27]:
# We can use an else clause to do something when the condition is False.
if a > 5:
    print('a is greater than five.')
else:
    print('a is not greater than five')

a is not greater than five


In [28]:
# We can also specify alternative tests using elif. 
# This is equivalent to having another if statement in the else clause.
if a > 5:
    print('a is greater than five.')
elif a == 5:
    print('a is exactly five.')
else:
    print('a is neither greater than nor exactly five')

a is exactly five.


In the example below, note the indentation of the `print()` function after the if clause.
Because it is not indented, it is not part of the block that is executed when the `if` condition is true.

In some other programming languages, code blocks like these are identified using braces (`{}`).
The Python version looks nicer, but it is important to remember that spacing/indenting is part of the syntax, not just a visual convention.

In [29]:
if a > 4:
    print('a is greater than five.')
print("This prints either way, because it's not in the if statement's code block.")

a is greater than five.
This prints either way, because it's not in the if statement's code block.


## While loops

While loops continue to do something while a condition is true.
A consequence is that we need to provide logic for it to end.
We typically use one of two methods: some kind of counter or a `break` statement.

In [30]:
b = 1
while b < 5:
    print(f'b is {b}')
    b = b + 1

b is 1
b is 2
b is 3
b is 4


In [31]:
c = 1
while True:
    print(f'c is {c}')
    c = c + 1
    if c >= 5:
        break

c is 1
c is 2
c is 3
c is 4


In [32]:
while False:
    print('This is never printed.')

## For loops

A `for` loop lets us do something for every item in a sequence.
If you think about it, you'll notice that anything you can do with a `for` loop can be done with a `while` loop, but the `for` syntax is more convenient.

In practice, I use `for` loops (or avoid explicit looping altogether by vectorizing) much more often than `while` loops.

In [33]:
d = [1, 2, 3, 4, 5]

In [34]:
for num in d:
    print(num + 1)

2
3
4
5
6


In [35]:
# We don't have to care about what the sequence is.
# This is what's called the "I don't care" underscore.
for _ in range(1, 5):
    print('Hi!')

Hi!
Hi!
Hi!
Hi!


In [36]:
# Note that ranges in Python do not include the end of the range.
for i in range(1, 5):
    print(i)

1
2
3
4


# Convenience syntax

Python has a number of language conveniences, but I'll share two.

A simple one is the `+=` operator.
For example, `x += 1` is equivalent to `x = x + 1`, meaning take the value that x points to, add one, and then assign the name x to that new number.

In [37]:
x = 5
print(x)
x += 1
print(x)

5
6


One of my favorites is the Python list comprehension (see the [tutorial](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions)).
Consider the following code that transforms every element in a list.

In [38]:
e_list = ['THIS', 'LIST', 'HAS', 'ITEMS', 'IN', 'CAPS']

In [39]:
new_list = []
for item in e_list:
    lower_item = item.lower()
    new_list.append(lower_item)
print(new_list)

['this', 'list', 'has', 'items', 'in', 'caps']


It may occur to you that this kind of pattern is something that we may do a lot.
It is, and this is an ideal case for some special syntax that makes it easier to accomplish.

In [40]:
[item.lower() for item in e_list]

['this', 'list', 'has', 'items', 'in', 'caps']

In [41]:
e_list

['THIS', 'LIST', 'HAS', 'ITEMS', 'IN', 'CAPS']

One of my favorite list comprehension snippets is this:

```python
df = (pd.concat([pd.read_csv(i, index_col=False) 
                 for i in glob.glob('dir/*.csv')]
               ).reset_index(drop=True))
```

This code:

1. Gets a list of files that match `*.csv` in the directory `dir`.
1. For each item in that list (a path to a file), it reads it from the csv format into a pandas dataframe.
1. After processing each item, we have a list of pandas dataframes.
1. Then, `pd.concat()` concatenates them into a single bigger dataframe.
1. Finally, it resets the index column and drops the old index values.

# Breakout Exercises

Let's do two exercises to reinforce the concepts we learned above.


1. functions
1. loops

## EX1: functions

Let's make a function to do the temperature calculation in the other direction.

1. Create a function named `c_to_f()` that takes a temerature in celsius and returns the temperature in fahrenheit. You may need to google the formula (I did).
1. Try your function by converting `100` from C to F.

In [42]:
# 1-1 code


In [43]:
# 1-2 code


## EX2: loops

Loops let us take a procedure and repeat it.
We'll use a `for` loop here to apply a simple computation.

1. Create a list, named `x_list`, that contains the integers `1` through `5`.
1. Create a new list, named `y_list` where each element is `2` times the corresponding element in `x_list`. Use a `for` loop.
1. Create a new list, named `z_list`, that matches what you did for `y_list`, but use a list comprehension to construct it.

**Note:** This is the first exercise where some cells will have multiple lines of code, but that will be the norm going forward.

In [44]:
# 2-1 code


In [45]:
# 2-2 code


In [46]:
# 2-3 code


# Bonus content

## Dynamic typing

Python does not require us to specify the types that our functions accept and return.
As a result, some things work that we might not expect.
For example, if we give our function, `f_to_c()`, a numpy array (basically a vector), it gives us back an array with the calculation.

Notice that the function doesn't have any special rules for handling a vector instead of a scalar.
This works because the `np.array` object has its own definitions of what addition (or, more precisely, the `+` operator, which uses the `__add__` method on the object) and multiplication mean for that object.
Those definitions are used when performing these operations.

In [47]:
a_array = np.array([32, 72, 117])

In [48]:
f_to_c(a_array)

array([ 0.        , 22.22222222, 47.22222222])

In [49]:
dir(a_array)

['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_function__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__complex__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__ilshift__',
 '__imatmul__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__