## Iterables

Fundamental in data processing

## The Python `list`

`[]`

We can use the Python bulit-in function `range` to make a list of numbers:

In [1]:
x = list(range(100000))

x[:10]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

We can see how many items are in our list using the Python builtin `len`:

In [2]:
len(x)

100000

We can see if an item is in our list:

In [3]:
42 in x

True

The Python list can hold objects of different types
- this gives the programmer flexibility
- cost of memory used by a list

Lists reserve space for more objects than they have

Another common Python iterable is the numpy array, which will use less memory:

https://webcourses.ucf.edu/courses/1249560/pages/python-lists-vs-numpy-arrays-what-is-the-difference

In [4]:
import sys

sys.getsizeof(x)

900112

In [5]:
import numpy as np

sys.getsizeof(np.array(x))

800096

Numpy arrays are also quicker due to the operations being written in C:
- numpy = C with Python bindings

### Iterating over a list

In [6]:
x = [2, 4, 6]

y = []
for item in x:
    y.append(item * 2)
    
y

[4, 8, 12]

## List comprehensions

Another example of being Pythonic.  The list comprehension will **return a new list**.

Don't worry if the list comprehension syntax isn't immediately intuitive - you will get it eventually :)

In [7]:
y = [item * 2 for item in x]
y

[4, 8, 12]

We can put a conditional inside the list comp:

In [8]:
y = [item * 2 for item in x if item == 4]
y

[8]

## Iteration turns integral into a sum

In [9]:
from scipy.integrate import quad

def integrand(x):
    return x**2

ans, err = quad(integrand, 0, 1)
ans

0.33333333333333337

In [10]:
import numpy as np

x = np.linspace(0, 1, 1000000)

step = x[1] - x[0]

f = [integrand(v) for v in x]

area = sum([step*v for v in f])
area

0.3333338333339938

## Common patterns with looping

Appending to an empty list:

In [11]:
from random import gauss

data = []
for _ in range(5):
    data.append(gauss(0, 1))

data

[0.6943915249393937,
 -1.520931631844586,
 -1.617491592858226,
 -0.9277210976381901,
 1.5653768053402017]

Appending dicts to lists and making a pandas `DataFrame`:

In [12]:
from random import random as uniform

import pandas as pd

data = []
for _ in range(5):
    data.append(
        {'standard-normal': gauss(0, 1),
         'uniform': uniform()}
    )

for d in data:
    print(d)
pd.DataFrame(data)

{'standard-normal': 0.29236867467118666, 'uniform': 0.31832302793394174}
{'standard-normal': -0.5640117928942517, 'uniform': 0.631913210464561}
{'standard-normal': 0.642356172313028, 'uniform': 0.37192294490665034}
{'standard-normal': 1.440631980329184, 'uniform': 0.23278583708553602}
{'standard-normal': -0.2452601649800075, 'uniform': 0.6605524986491615}


Unnamed: 0,standard-normal,uniform
0,0.292369,0.318323
1,-0.564012,0.631913
2,0.642356,0.371923
3,1.440632,0.232786
4,-0.24526,0.660552


In [13]:
data = {'standard-normal': [], 'uniform': []}
for _ in range(5):
    data['standard-normal'].append(gauss(0, 1))
    data['uniform'].append(uniform())

    print(data)
    
#pd.DataFrame(data)
data = pd.DataFrame(data, index=list(range(5)))

{'standard-normal': [0.13561859800202083], 'uniform': [0.2914517359049731]}
{'standard-normal': [0.13561859800202083, -0.07721586800269041], 'uniform': [0.2914517359049731, 0.7192035379222816]}
{'standard-normal': [0.13561859800202083, -0.07721586800269041, -2.166360640475556], 'uniform': [0.2914517359049731, 0.7192035379222816, 0.02146111411366558]}
{'standard-normal': [0.13561859800202083, -0.07721586800269041, -2.166360640475556, -0.7988495901941858], 'uniform': [0.2914517359049731, 0.7192035379222816, 0.02146111411366558, 0.9899612183414147]}
{'standard-normal': [0.13561859800202083, -0.07721586800269041, -2.166360640475556, -0.7988495901941858, 0.2880612603078178], 'uniform': [0.2914517359049731, 0.7192035379222816, 0.02146111411366558, 0.9899612183414147, 0.6520053708043135]}


In [14]:
data

Unnamed: 0,standard-normal,uniform
0,0.135619,0.291452
1,-0.077216,0.719204
2,-2.166361,0.021461
3,-0.79885,0.989961
4,0.288061,0.652005


In [15]:
data.shape

(5, 2)

In [16]:
for n in range(data.shape[1]):
    print(data.iloc[:, n])
    print(' ')

0    0.135619
1   -0.077216
2   -2.166361
3   -0.798850
4    0.288061
Name: standard-normal, dtype: float64
 
0    0.291452
1    0.719204
2    0.021461
3    0.989961
4    0.652005
Name: uniform, dtype: float64
 


## `zip()`

Looping over two things at the same time:

In [19]:
f = list(range(0, 6))
s = list(range(6, 12))

assert len(f) == len(s)

for first, second in zip(f, s):
    print(first, second)

0 6
1 7
2 8
3 9
4 10
5 11


## `enumerate()`

Enumerate gives us an integer index as we enumerate:

In [20]:
x = list(range(100, 105))

for idx, item in enumerate(x):
    print(idx, item)

0 100
1 101
2 102
3 103
4 104


We can also start the index at a value other than zero:

In [21]:
for idx, item in enumerate(x, 2):
    print(idx, item)

2 100
3 101
4 102
5 103
6 104


## List algebra

In Python we can do interesting things with list addition & multiplication:

In [22]:
data = [
    0, 1, 0, 1, 1
]

data * 2

[0, 1, 0, 1, 1, 0, 1, 0, 1, 1]

In [23]:
data + data

[0, 1, 0, 1, 1, 0, 1, 0, 1, 1]

## Exercise

Create a **Cartesian product** - all the combinations between two lists:

In [25]:
colors = ['white', 'black']
sizes = ['small', 'medium', 'large']

white small
white medium
white large
black small
black medium
black large


## Indexing

Python uses **zero-based indexing**.

Index the first element at `0`:

In [27]:
x = list(range(100000))

x[0]

0

And the last at `-1`:

In [28]:
x[-1]

99999

## Slicing

We can select slices using similar notation:

In [29]:
x[4:8]

[4, 5, 6, 7]

## Strings are iterables

We can slice them:

In [30]:
gita = 'The ignorant work for their own profit, Arjuna the wise work for the welfare of the world, without thought for themselves - KRISHNA'

gita[:38]

'The ignorant work for their own profit'

We can also add them together:

In [31]:
bohr = 'Prediction is very difficult, especially if it is about the future - NEILS BOHR'

quotes = gita + ', ' + bohr
quotes

'The ignorant work for their own profit, Arjuna the wise work for the welfare of the world, without thought for themselves - KRISHNA, Prediction is very difficult, especially if it is about the future - NEILS BOHR'

The above is a csv (comma separated value) string:

In [32]:
import csv

list(csv.reader([quotes]))

[['The ignorant work for their own profit',
  ' Arjuna the wise work for the welfare of the world',
  ' without thought for themselves - KRISHNA',
  ' Prediction is very difficult',
  ' especially if it is about the future - NEILS BOHR']]

Above we can see a problem - we have commas in the quotes.  

## Writing data to files

We can use Python's `open` bulit-in to write to a file:

In [33]:
quotes = [bohr, gita]
with open('./quotes.txt', 'w') as dump:
    for line in quotes:
        dump.write(line)
        dump.write('\n')

Run bash commands to print file (`cat`) and then remove it (`rm`):

In [34]:
!cat quotes.txt
!rm quotes.txt

Prediction is very difficult, especially if it is about the future - NEILS BOHR
The ignorant work for their own profit, Arjuna the wise work for the welfare of the world, without thought for themselves - KRISHNA
