#Chapter 1: Iterators <br>
Iterable is an object with an associated 'iter()' method <br>
Applying 'iter()' to an associated object creates an iterator <br>
Iterables include: lists, strings, dictionaries, file connections, more <br>
An iterator produces the next value with the 'next()' function <br>


In [None]:
#Can iterate through entire sequence at once with *
word = 'Data'
it = iter(word)
print(*it)

#Iterating through key-value pairs
sample_dict = {'sky': 'blue', 'forest':'green'}
for key, value in sample_dict.items():
  print(key, value)

D a t a
sky blue
forest green


Once an iterator has been fully iterated through it will not be usable.

In [29]:
small_value = iter(range(3))

for num in range(3):
  print(next(small_value))

"""The following line tries to call the iterator again, even though it has been
   fully iterated through, yielding an error."""
print(next(small_value))

0
1
2


StopIteration: 

The 'enumerate()' function takes an iterable as an argument, and returns a special enumerate object<br>
Thie enumerate object which consists of pairs of the original iterable element and its index <br>



In [None]:
sample_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']

for color, index in enumerate(sample_list):
  print(f'{color}: {index}')

0: red
1: orange
2: yellow
3: green
4: blue
5: indigo
6: violet


The 'zip()' function is an iterator of tuples. It takes two iterators as input <br>
It returns a list of tuples of the corresponding elements of the lists grouped by index value.

In [None]:
sample_list = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
other_list = reversed(sample_list)

z = zip(sample_list, other_list)
z_list = list(z)

print(z_list)

[('red', 'violet'), ('orange', 'indigo'), ('yellow', 'blue'), ('green', 'green'), ('blue', 'yellow'), ('indigo', 'orange'), ('violet', 'red')]


###Loading large datasets
When working with data too large to fit in memory, can use an iterator to load data in chunks. <br>
This is done, for example, with the pandas 'read_csv()' function by specifying 'chunksize'. <br>
In this case, each chunk is a DataFrame.

In [28]:
from inspect import ClosureVars
import pandas as pd

def count_entries(csv_file: str, c_size: int, colname: str) -> dict:
  """This takes 'csv_file', chunks it according to 'c_size' and returns a dict
     of the occurences of each entry in the 'colname'"""

  counts_dict = {}
  csv_df = pd.read_csv(csv_file, chunksize = c_size)

  for chunk in csv_df:

    for entry in chunk[colname]:
      if entry in counts_dict.keys():
        counts_dict[entry] += 1
      else:
        counts_dict[entry] = 1

  return counts_dict

result = count_entries('sample_data/tweets.csv', 10, 'lang')
print(result)


{'en': 97, 'et': 1, 'und': 2}


#Chapter 2: List comprehensions and generators

##List comprehensions
For loops are less efficient in many cases when iterating over a list

In [15]:
nums = [1, 3, 5, 4]

#Using a for loop to add 1 to each number
new_nums = []
for num in nums:
  new_nums.append(num + 1)
print(new_nums)

#Using a nested for loop to create a list of pairs
pairs=[]

for num1 in range (0,2):
  for num2 in range(6,8):
    pairs.append((num1,num2))
print(pairs)

[2, 4, 6, 5]
[(0, 6), (0, 7), (1, 6), (1, 7)]


In [16]:
nums = [1, 3, 5, 4]

#Using a list comprehension consolidates the above code into one line
new_nums = [num + 1 for num in nums]
print(new_nums)

#Can also use list comprehensions to replace nested for loops
pairs = [(num1, num2) for num1 in range(0,2) for num2 in range(6,8)]
print(pairs)

[2, 4, 6, 5]
[(0, 6), (0, 7), (1, 6), (1, 7)]


Can make matrices (lists of lists) with nested list comprehensions

In [24]:
matrix = [[col for col in range(0,5)] for row in range(0,5)]
print(matrix)

[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]


##Advanced List Comprehensions
List comprehensions can be used with any iterable and be used with conditionals

In [25]:
#For numbers 0-9 return the num squared if num is even, else return 0
num_squared = [num**2 if num % 2 == 0 else 0 for num in range(0,10)]
print(num_squared)

[0, 0, 4, 0, 16, 0, 36, 0, 64, 0]


Dict comprehensions function similarly, with slightly different syntax

In [30]:
#Dict comprehensions use curly brackets with a colon between the key and value
pos_neg = {num: -num for num in range(0,9)}
print(pos_neg)

{0: 0, 1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7, 8: -8}


##Generator expressions
The generator doesn't store the list in memory, and doesn't construct the list. <br> Instead, it creates a generator object which can be iterated over as required. <br>
Generators can also use the same actions available to list comprehensions like <br> filtering and conditionals.

In [39]:
result = (num for num in range(1,11))
print(type(result))

#Lists can be created by using the 'list()' function
res_list = list(result)
print(type(res_list))

"""Creating new 'result' object as the previous 'list(result)' command iterated
   through first 'result' generator object."""

result = (num for num in range(1,11))

#'Lazy evaluation': eval of expression is delayed until needed
print(next(result))
print(next(result))


<class 'generator'>
<class 'list'>
1
2


In [42]:
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

def get_lengths(input_list: list) -> int:
  """Generator function that yields the length
     of the strings in the input list"""

  for person in input_list:
    yield len(person)

for value in get_lengths(lannister):
  print(value)

6
5
5
6
7
