# 3.2: Modules, ".csv" files, and List Comprehensions

- Learn about python modules, how to load and use them
- Learn about comma separated value files and load them with python
- Practice list comprehensions, a powerful python syntax

## Intro to python modules

You have likely already encountered some python modules loaded and used already. Modules essentially contain extensions to pure python, giving you powerful functions and classes (objects).

We can import a module by typing **```import```** and then the module name.

In [1]:
import numpy

Now we have access to numpy's functions and classes. For example we can it's **```.mean()```** function to calculate the mean instead of writing it ourselves.

In [2]:
numpy.mean([1,2,3,4,5])

3.0

Modules can also be nicknamed, to make them easier to use with the ```as``` syntax.

In [3]:
import numpy as np

And now we can shorten our previous code to get the mean.

In [4]:
np.mean([1,2,3,4,5])

3.0

If we just want to pull out specific functions from a module, we can use the ```from``` syntax.

In [5]:
from numpy import mean, median

And we can now use these functions without having to prepend ```numpy``` or ```np```.

In [6]:
print(mean([4,5,6,7,8,9,9]))
print(median([2,5,3,7,2,3,4]))

6.85714285714
3.0


## Comma separated value files (.csv) and python

Say we have a file named ```mini_data.csv```. This extension indicates that this is a **comma separated value**. These files typically contain data in rows and columns like a spreadsheet. 

As the name implies, the columns are separated by commas and rows are separated by newlines.

Opened in a text editor, **```mini_data.csv```** looks like this:

    column1,column2,column3
    bird,1,2.3
    cat,3,4.6
    dog,120,0.001

### Loading csvs

To load the **```mini_data.csv```** file, we are going to introduce a few things:

 - the **```open()```** function
 - the **```csv```** module
 - the **```with```** syntax

The block of code for loading the **```mini_data.csv```** is below. Try it out yourself! 

I will go over each section of the code and what it does.

In [4]:
import csv

mini_data_path = '../datasets/mini_data.csv'
csv_rows = []

with open(mini_data_path, 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        csv_rows.append(row)

- The **```csv```** module, as you might expect, has convenience functions such as **```reader()```** that help parse csv files.

- **```with open(path, 'r') as f:```** indicates that we will assign an open file buffer to **```f```**. After the code in the **```with```** statement is completed, the file buffer is automatically closed.

In [6]:
for row in csv_rows:
    print(row)

['column1', 'column2', 'column3']
['bird', '1', '2.3']
['cat', '3', '4.6']
['dog', '120', '0.001']


Printing the rows stored in **```csv_rows```**, we can see that the **```csv.reader()```** function returns the rows in lists. The reader also reads each "cell" as a string.

Since data is very often stored in csv files, it is useful to know how to load them into python!

Later we will use the **```pandas```** package to do most of the heavy lifting for this.

## List Comprehensions

##### What are list comprehensions?

List comprehensions are statements that perform some kind of operation on each element of a list. Let's start with a simple array of numbers.

In [8]:
numbers = [0,1,2,3,4,5,6,7,8,9]

Imagine that we want to add 1 to every element of the list. We could do this a couple of ways without the use of list comprehensions.

In [10]:
nums_plus_one = []

for num in numbers:
    nums_plus_one.append(num+1)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [11]:
print(nums_plus_one)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


We could also use python's "map" with a lambda function. Map iterates over each element of a list and applies a function to it

In [None]:
nums_plus_one = map(lambda x: x+1, numbers)

In [None]:
# For loop:
for num in numbers:
    nums_plus_one.append(num+1)
    
# Map & lambda
nums_plus_one = map(lambda x: x+1, numbers)

These solutions each have pros and cons. The for loop is more readable and explicit (if you aren't familiar with how map and lambda works, at least), and the map with lambda is concise but arcane. 

Luckily list comprehensions combine the best of both worlds.

In [None]:
nums_plus_one = [x+1 for x in numbers]

In [12]:
nums_plus_one = [x+1 for x in numbers]

Let's go over how that works in more granular detail.

- Like the map statement, nums_plus_one is assigned on the left as a new variable.
- List comprehensions return a list, and the internal statement is wrapped in the list brackets: ```[...]```
- Within the brackets these elements are similar to a for loop:
  1. The **operation per element** comes first: ```x+1```
  2. Next is the **for loop variable assignment**: ```for x```
  3. Last comes the **list of elements to iterate over**: ```in numbers```

In [13]:
print(nums_plus_one)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


#### Conditional logic in list comprehensions

List comprehensions can be extended to cover more of the functionality of a for loop than just an operation over elements. 

Let's say we wanted to "binarize" a variable based on whether the elements are greater or less than the mean over all elements. The for loop could look something like this.

In [14]:
import numpy as np
n = [1, 2, 7, 21, 3, 1, 62, 3, 34, 12, 73, 44, 12, 11, 9]
n_bin = []
n_mean = np.mean(n)
for x in n:
    if x >= n_mean:
        n_bin.append(1)
    else:
        n_bin.append(0)

In [15]:
print(n_bin)

[0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0]


In [16]:
n = [1, 2, 7, 21, 3, 1, 62, 3, 34, 12, 73, 44, 12, 11, 9]
n_mean = np.mean(n)

n_bin = []
for x in n:
    if x >= n_mean:
        n_bin.append(1)
    else:
        n_bin.append(0)

A list comprehension can do the same thing much more concisely, without losing clarity.

In [17]:
## 2-A)
n_bin = [1 if x >= n_mean else 0 for x in n]

In [18]:
print(n_bin)

[0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0]


We can even do chained conditionals! 

This list comprehension:

- swaps 1s to 0s 
- swaps 0s to 1s
- otherwise sets the items to none:

In [19]:
## 2-B)
n = [0, 1, 0, 1, 2, 3, 5, 2, 1, 0]

bin_or_none = [0 if x == 1 else 1 if x == 0 else None for x in n]

In [20]:
print(bin_or_none)

[1, 0, 1, 0, None, None, None, None, 0, 1]


### Nested list comprehensions

As some of you may have suspected by now, we can even embed list comprehensions within other list comprehensions for extra power.

For example, let's say we want the square and the square root for every non-negative element in a list.

In [21]:
## 3-A)
import numpy as np
n = [0, 1, 50, -23, -1, 75, -3]

In [22]:
math_pairs = [[x**2, np.sqrt(x)] for x in [y for y in n if y >= 0]]

In [23]:
print(math_pairs)

[[0, 0.0], [1, 1.0], [2500, 7.0710678118654755], [5625, 8.6602540378443873]]


Note that the ```if``` statement in the embedded list comes _after_ the ```in``` statement in this example. When your condition is meant to be a filter the conditional comes after.

### List comprehensions with functions

We can also do operations on multiple lists. I often use the **```zip()```** and **```enumerate()```** functions in combination with list comprehensions.

- **zip** goes through each element of two lists iteratively at the same time
- **enumerate** keeps track of the index of each element of a list

In [None]:
a = ['a','b','c','d']
z = ['z','y','x','w']

zipped = []
for a_i, z_i in zip(a, z):
    zipped.append([a_i, z_i])

[['a', 'z'], ['b', 'y'], ['c', 'x'], ['d', 'w']]]

In [None]:
enumerated = []
for i, a_i in enumerate(a):
    enumerated.append([i, a_i])

[[0, 'a'], [1, 'b'], [2, 'c'], [3, 'd']]

Using a list comprehension and two lists, lets:

1. iterate through both lists element-by-elemnt
2. multiply the element of the first list by the current index
3. then divide that by the element of the second list

Remember: with enumerate the index is returned first and the element second.

In [24]:
## 4-C)
list_one = [10, 15, 20, 25, 40]
list_two = [1, 2, 3, 4, 5]

In [26]:
math_comp = [(x*i)/y for i, (x, y) in enumerate(zip(list_one, list_two))]

print(math_comp)

[0, 7, 13, 18, 32]


## Dictionary comprehensions

Comprehensions are not limited to lists. You can also use comprehensions to create dictionaries with key:value pairs.

Below, for example, we can create a dictionary with the integer value of each character in a string with the string as a key.

In [27]:
## 6-A)
keys = ['dog', 'cat', 'bird', 'horse']

# The ord() function returns the integer value of a string character:
animal_dict = {k:[ord(c) for c in k] for k in keys}

In [28]:
print(animal_dict)

{'bird': [98, 105, 114, 100], 'horse': [104, 111, 114, 115, 101], 'dog': [100, 111, 103], 'cat': [99, 97, 116]}
