# Iterables and types
Python has a rather unique way of dealing with arrays. 
The traditional "for i from 1 to n" is gone. 
Instead, we express things in terms of *iterables*. 

An *iterable* is any object that can be accessed via: 

```
for item in iter: 
    ... do something with item ... 
```

# First, a few examples of iteration to try

In [None]:
foo = ['cats', 'are', 'fun']
print("in original order:")
for f in foo:
    print(f)
foo.sort()
print("in sorted order:")
for f in foo:
    print(f)

## List observations
* order is order in list.
* can be sorted via `foo.sort()` or `sorted(foo)`

In [None]:
tup = ('cat', 10, 'Frank', 'Furter')
for t in tup: 
    print(t)

## Tuple observations
* order is order in tuple. 
* meaning of elements is positional. 
* no reasonable definition of sorting a tuple. 

In [None]:
names = set(["Frank", "George", "Selma", "Selma", "Rick"])
for n in names: 
    print(n)

## Set observations
* Can't predict order. 
* Each element listed once. 

In [None]:
pets = {'cats': 10, 'snakes': 5, 'dogs': 20}
for k in pets:
    print(k)
    print(pets[k])

## Observations about dictionaries
* Order is implementation-dependent and unpredictable. 
* Iteration is over *keys*. 
* Values are computed from keys

# Some generalizations
For every one of these patterns: 
* the thing being iterated over has to be of class "iterable". 
* things are processed in an order dependent upon the type. 
* sorting may or may not make sense.

# ranges 

Let's play with a rather unusual iterable called `range`. Here are some demonstrations: 

In [None]:
for i in range(5):
    print(i)

In [None]:
for i in range(1,5): 
    print(i)

In [None]:
for i in range(1,5,2): 
    print(i)

## Some observations about ranges: 
* with one argument, iterates over values from 0 to value-1
* with two arguments, starts at the first, iterates to second-1
* with three arguments, starts at first, iterates to second-1, skips third-1 elements each step. 

# Things that work with all iterables. 
* casting to a list returns the list of all values. For example: 

In [None]:
print(list(pets))
print(list(names))

Note particularly that `list(some_dict)` returns a list of the *keys* to the dict. These can be sorted to determine an order. The dict itself cannot be sorted. One can write: 

In [None]:
keys = sorted(list(pets))
for k in keys: 
    print("I have {} {}".format(pets[k], k))

# Why are ranges so weird? 
It's a holdover from the old way of processing lists. 

Note that the clause: 

In [None]:
for f in foo: 
    print(f)

does exactly the same thing as: 

In [None]:
for i in range(len(foo)): 
    print(foo[i])

Making this work is the reason that: 
* `range(n)` ranges from `0` to `n-1`. These are the valid array indices for a list with `n` elements. So that
* `range(len(x))` (for x a list) ranges over all valid indexes i for which x[i] makes sense. 


# Parallel iteration
One very common form of iteration that is non-intuitive is that of parallel iteration. 

Suppose we have tuples of data: 

In [None]:
data = [('dogs', 10), ('cats', 20), ('gerbils', 5), ('geese', 4)]
data

Often we translate these into what is usually called a *frame*. This is a list of named columns that stores data meaning. E.g., we can write: 

In [None]:
d1 = []
d2 = []
for d in data:
    d1.append(d[0])
    d2.append(d[1])
frame = { 'kind of pet':d1, 'number of pets': d2 }
frame

# Facts about frames
1. The number of entries is the number of columns. 
2. Each entry is a named column. 
3. The name is the name of the column. 
4. The name documents the meaning of the column. 
5. Thus, *a frame is a dict of lists.*
6. This is the most natural representation for an Excel spreadsheet. 
7. Frames exhibit *parallel structure*: each column has the same number of rows.
8. The element at position i in two columns represents attributes of one object.

Let's play with frames. 

1. Using the variable 'frame' as input, print lines for each row based upon data in the columns. Hints: 
   
  a. Use `len` to figure out how many rows.
  
  b. You may assume both columns have the same number of rows. 
  

In [None]:
# fill in details here

2. Write a function `fetch` that takes as argument a frame and fetches the row corresponding to a specific pet's name, as a tuple. 

In [None]:
def fetch(frame, pet_kind):
    # fill in the details here
    
print(fetch(frame, 'dogs'))
print(fetch(frame, 'cats'))

You should get back the tuples we created before. 

3. Part of the reason that we organize things this way is that columns can be manipulated easily. Write code that computes the total number of pets documented in `frame`. Hint: `sum(x)`, for `x` a list, prints the sum of all elements of `x`. 

In [None]:
# { fill in the details here, replace ... } 
num_pets = ...
num_pets

4. It is very often necessary to convert a frame into something from which data can be looked up quickly. Write code that turns `frame` into a dictionary `indexed`, indexed by pet kind. You should get something similar to this (order may differ): 
```
{ 'cats':20, 'dogs':10, 'gerbils':5, 'geese':4 }
```

In [None]:
# { fill in details here }
indexed = ...
print(indexed['cats'])
print(indexed['gerbils'])
indexed

# An afterword on the importance of frames

Frames are the fundamental structures upon which Data Science algorithms operate, for better or worse. The reason for this is that they closely mimic and map to and from Excel spreadsheets, which contain a majority of the data that we analyze. This reality is a bit lamentable, but the crowd dominates the individual, and most data remains in this relatively primitive format. However, there is often a need to move from frames to other data representations in order to speed up access or perform specialized functions. 

We will see that the object `DataFrame` is in fact much more sophisticated than the simple frame we implemented here. But the principles are the same.     

# When you're done, submit the notebook

You can submit a notebook by saving it as PDF. In the cluster environment, it's File | Print (Save as PDF) and submit to Gradescope. https://www.gradescope.com/courses/182658, On other versions, it may be File | Download As (PDF) and then submit to Gradescope.

To submit to Gradescope, log into the [website](https://www.gradescope.com/courses/182658), add course **9W7PW3** (if not already added) and submit. The assignment name should match the name of this notebook.