# Iterators

## Fundamentals

Goes over the basics. Any iterable in python will have the `.iter` method. This allows you to call `next()` repeatedly upon the iterable to pull out the consecutive values. `for` loops create these iterables behind the scenes.

In [None]:
cobra_kai = ["JohnnyLawrence", "Daniel LaRusso", "John Kreese", "Terry Silver"]

In [None]:
# create an iterable
cobra_iter = iter(cobra_kai)
cobra_iter

In [None]:
print(next(cobra_iter))
print(next(cobra_iter))
print(next(cobra_iter))
print(next(cobra_iter))

In [None]:
# printing the statement once more will result in a stop iteration error
# print(next(cobra_iter))

## `range()`

This function is useful for iterating. It generates iterables.

In [None]:
for num in range(10):
    print(num)

In [None]:
for num in range(1, 11):
    print(num)

The `range()` function doesn't pre-create the iterable object in memory. Rather, it generates the sequence during the iteration. This can be exemplified by asking it to create an iterator of a stupidly large size and then only calling the first few values.

In [None]:
silly = iter(range(10**100000))
print(next(silly))
print(next(silly))
print(next(silly))

In [None]:
# functions like sum will take an iterator as an argument
sum(range(5, 11))

## `enumerate()`

Already used to unpack index values from iterables. Though this time it introduces the enumerate object format and how to splat the values.

In [None]:
turtles = ["Leonardo", "Donatello", "Rafael", "Michaelangelo"]
numbered_turtles = list(enumerate(turtles))
numbered_turtles
# a list of tuples is returned

In [None]:
# splat the enumerate object values
print(*enumerate(turtles))

## `zip()`

Use this to iterate over several iterables concurrently.

In [None]:
weapons = ["sword", "bo", "sai", "nunchuks"]
colours = ["blue", "purple", "red", "orange"]
turtle_list = list(zip(turtles, weapons, colours))
print(turtle_list)

In [None]:
# or unpack within a for loop

for x, y, z in zip(turtles, weapons, colours):
    print(x, y, z)

## Splat with `*`

In [None]:
turtle_zip = zip(turtles, weapons, colours)
print(*turtle_zip)

In [None]:
# you can achieve the equivalent of unzipping by using zip with splat on a zip object
# regenerate zip object
turtle_zip = zip(turtles, weapons, colours)
x, y, z = zip(*turtle_zip)
print(x, y, z)

## Reading files in chunks

This can involve processing the data in chunks too. Let's write to file and explore.

In [None]:
import os

import pandas as pd
import seaborn as sns
from pyprojroot import here

In [None]:
penguins = sns.load_dataset("penguins")
penguins.to_csv(os.path.join(here(), "data", "penguins.csv"), index=False)
del penguins

In [None]:
val_counts = {}
for chunk in pd.read_csv(os.path.join(here(), "data", "penguins.csv"), chunksize=20):
    for obs in chunk["species"]:
        if obs in val_counts.keys():
            val_counts[obs] += 1
        else:
            val_counts[obs] = 1

val_counts

Turning the above into a function.

In [None]:
def file_table(file_csv, chunksz, column):
    """Counts values within the specified column of a csv file"""
    val_counts = dict()
    for chunk in pd.read_csv(file_csv, chunksize=chunksz):
        for obs in chunk[column]:
            if obs in val_counts.keys():
                val_counts[obs] += 1
            else:
                val_counts[obs] = 1
    return val_counts

In [None]:
file_table(os.path.join(here(), "data", "penguins.csv"), 10, "island")

In [None]:
file_table(os.path.join(here(), "data", "penguins.csv"), 10, "sex")