## Iterables

Fundamental in data processing

## The Python `list`

`[]`

We can use the Python bulit-in function `range` to make a list of numbers:

In [1]:
x = list(range(100000))

x[:10]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

We can see how many items are in our list using the Python builtin `len`:

In [2]:
len(x)

100000

We can see if an item is in our list:

In [3]:
42 in x

True

The Python list can hold objects of different types
- this gives the programmer flexibility
- cost of memory used by a list

Lists reserve space for more objects than they have

Another common Python iterable is the numpy array, which will use less memory:

https://webcourses.ucf.edu/courses/1249560/pages/python-lists-vs-numpy-arrays-what-is-the-difference

In [4]:
import sys

sys.getsizeof(x)

900112

In [5]:
import numpy as np

sys.getsizeof(np.array(x))

800096

Numpy arrays are also quicker due to the operations being written in C:
- numpy = C with Python bindings

### Iterating over a list

In [6]:
x = [2, 4, 6]

y = []
for item in x:
    y.append(item * 2)
    
y

[4, 8, 12]

## List comprehensions

Another example of being Pythonic.  The list comprehension will **return a new list**.

Don't worry if the list comprehension syntax isn't immediately intuitive - you will get it eventually :)

In [7]:
y = [item * 2 for item in x]
y

[4, 8, 12]

We can put a conditional inside the list comp:

In [8]:
y = [item * 2 for item in x if item == 4]
y

[8]

## Iteration turns integral into a sum

In [15]:
from scipy.integrate import quad

def integrand(x):
    return x**2

ans, err = quad(integrand, 0, 1)
ans

0.33333333333333337

In [26]:
import numpy as np

x = np.linspace(0, 1, 1000000)

step = x[1] - x[0]

f = [integrand(v) for v in x]

area = sum([step*v for v in f])
area

0.3333338333339938

## Common patterns with looping

Appending to an empty list:

In [27]:
from random import gauss

data = []
for _ in range(5):
    data.append(gauss(0, 1))

data

[1.3355007953147866,
 -1.988926441344213,
 0.9293730817110423,
 -0.8670057582230315,
 0.07967774984068092]

Appending dicts to lists and making a pandas `DataFrame`:

In [28]:
from random import random as uniform

import pandas as pd

data = []
for _ in range(5):
    data.append(
        {'standard-normal': gauss(0, 1),
         'uniform': uniform()}
    )

for d in data:
    print(d)
pd.DataFrame(data)

{'standard-normal': -0.17168203353882658, 'uniform': 0.4934184989017}
{'standard-normal': -0.6186764641285678, 'uniform': 0.5877900695056111}
{'standard-normal': -0.8422580309860632, 'uniform': 0.14045602520043643}
{'standard-normal': 3.32113764640708, 'uniform': 0.7184423799362724}
{'standard-normal': -0.4041579099706633, 'uniform': 0.4461563683046349}


Unnamed: 0,standard-normal,uniform
0,-0.171682,0.493418
1,-0.618676,0.58779
2,-0.842258,0.140456
3,3.321138,0.718442
4,-0.404158,0.446156


In [29]:
data = {'standard-normal': [], 'uniform': []}
for _ in range(5):
    data['standard-normal'].append(gauss(0, 1))
    data['uniform'].append(uniform())

    print(data)
    
#pd.DataFrame(data)
data = pd.DataFrame(data, index=list(range(5)))

{'standard-normal': [0.29553033461099465], 'uniform': [0.4650859381121234]}
{'standard-normal': [0.29553033461099465, 1.3569183359524357], 'uniform': [0.4650859381121234, 0.6906422157301993]}
{'standard-normal': [0.29553033461099465, 1.3569183359524357, -0.5283909601569745], 'uniform': [0.4650859381121234, 0.6906422157301993, 0.7577814827010518]}
{'standard-normal': [0.29553033461099465, 1.3569183359524357, -0.5283909601569745, 2.1226101195729505], 'uniform': [0.4650859381121234, 0.6906422157301993, 0.7577814827010518, 0.4468833701075213]}
{'standard-normal': [0.29553033461099465, 1.3569183359524357, -0.5283909601569745, 2.1226101195729505, 0.904398270361308], 'uniform': [0.4650859381121234, 0.6906422157301993, 0.7577814827010518, 0.4468833701075213, 0.17092896572205585]}


In [30]:
data

Unnamed: 0,standard-normal,uniform
0,0.29553,0.465086
1,1.356918,0.690642
2,-0.528391,0.757781
3,2.12261,0.446883
4,0.904398,0.170929


In [31]:
data.shape

(5, 2)

In [32]:
for n in range(data.shape[1]):
    print(data.iloc[:, n])
    print(' ')

0    0.295530
1    1.356918
2   -0.528391
3    2.122610
4    0.904398
Name: standard-normal, dtype: float64
 
0    0.465086
1    0.690642
2    0.757781
3    0.446883
4    0.170929
Name: uniform, dtype: float64
 


## `zip()`

Looping over two things at the same time:

In [33]:
f = list(range(0, 6))
s = list(range(6, 12))

assert len(f) == len(s)

for first, second in zip(f, s):
    print(first, second)

0 6
1 7
2 8
3 9
4 10
5 11


## `enumerate()`

Enumerate gives us an integer index as we enumerate:

In [34]:
x = list(range(100, 105))

for idx, item in enumerate(x):
    print(idx, item)

0 100
1 101
2 102
3 103
4 104


We can also start the index at a value other than zero:

In [35]:
for idx, item in enumerate(x, 2):
    print(idx, item)

2 100
3 101
4 102
5 103
6 104


## List algebra

In Python we can do interesting things with list addition & multiplication:

In [37]:
data = [
    0, 1, 0, 1, 1
]

data * 2

[0, 1, 0, 1, 1, 0, 1, 0, 1, 1]

In [38]:
data + data

[0, 1, 0, 1, 1, 0, 1, 0, 1, 1]

## Exercise

Create a **Cartesian product** - all the combinations between two lists:

In [40]:
colors = ['white', 'black']
sizes = ['small', 'medium', 'large']

## Indexing

Python uses **zero-based indexing**.

Index the first element at `0`:

In [62]:
x = list(range(100000))

x[0]

0

And the last at `-1`:

In [63]:
x[-1]

99999

## Slicing

We can select slices using similar notation:

In [64]:
x[4:8]

[4, 5, 6, 7]

## Strings are iterables

We can slice them:

In [65]:
gita = 'The ignorant work for their own profit, Arjuna the wise work for the welfare of the world, without thought for themselves - KRISHNA'

gita[:38]

'The ignorant work for their own profit'

We can also add them together:

In [66]:
bohr = 'Prediction is very difficult, especially if it is about the future - NEILS BOHR'

quotes = gita + ', ' + bohr
quotes

'The ignorant work for their own profit, Arjuna the wise work for the welfare of the world, without thought for themselves - KRISHNA, Prediction is very difficult, especially if it is about the future - NEILS BOHR'

The above is a csv (comma separated value) string:

In [67]:
import csv

list(csv.reader([quotes]))

[['The ignorant work for their own profit',
  ' Arjuna the wise work for the welfare of the world',
  ' without thought for themselves - KRISHNA',
  ' Prediction is very difficult',
  ' especially if it is about the future - NEILS BOHR']]

Above we can see a problem - we have commas in the quotes.  

## Writing data to files

We can use Python's `open` bulit-in to write to a file:

In [68]:
quotes = [bohr, gita]
with open('./quotes.txt', 'w') as dump:
    for line in quotes:
        dump.write(line)
        dump.write('\n')

Run bash commands to print file (`cat`) and then remove it (`rm`):

In [69]:
!cat quotes.txt
!rm quotes.txt

Prediction is very difficult, especially if it is about the future - NEILS BOHR
The ignorant work for their own profit, Arjuna the wise work for the welfare of the world, without thought for themselves - KRISHNA
