# Important tools

## Iterators and iterables

* **Iterable**: object that can return an iterator. Examples: lists, strings, dictionaries, files
* **Iterator**: object that keeps state and produces the next value when you call next() on it.

### Iterating over iterables

In [2]:
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']

# You can accomplish the same thing with a for loop
# for person in flash:
#     print(person)

# Create an iterator for flash: superhero
superhero = iter(flash)

# Print each item from the iterator
print(next(superhero))
print(next(superhero))
print(next(superhero))
print(next(superhero))

jay garrick
barry allen
wally west
bart allen


In [3]:
# Create an iterator for range(3): small_value
small_value = iter(range(3))

# Print the values in small_value
print(next(small_value))
print(next(small_value))
print(next(small_value))

0
1
2


In [4]:
# Create an iterator for range(10 ** 100): googol
googol = iter(range(10 ** 100))

# Print the first 5 values from googol
print(next(googol))
print(next(googol))

0
1


In [5]:
# Create a range object: values
values = range(10, 21)

# Print the range object
print(values)

# Create a list of integers: values_list
values_list = list(values)

# Print values_list
print(values_list)

# Get the sum of values: values_sum
values_sum = sum(values)

# Print values_sum
print(values_sum)

range(10, 21)
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
165


## Enumerate

Object that produces a sequence of tuples, and each of the tuples is an index-value pair.

In [6]:
# Create a list of strings: mutants
mutants = ['charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pryde']

# Create a list of tuples: mutant_list
mutant_list = list(enumerate(mutants))

# Print the list of tuples
print(mutant_list)

# Unpack and print the tuple pairs
for index1, value1 in enumerate(mutants):
    print(index1, value1)

# Change the start index
for index2, value2 in enumerate(mutants, start=1):
    print(index2, value2)

[(0, 'charles xavier'), (1, 'bobby drake'), (2, 'kurt wagner'), (3, 'max eisenhardt'), (4, 'kitty pryde')]
0 charles xavier
1 bobby drake
2 kurt wagner
3 max eisenhardt
4 kitty pryde
1 charles xavier
2 bobby drake
3 kurt wagner
4 max eisenhardt
5 kitty pryde


## Zip and argument unpacking

Takes any number of iterables and returns a zip object that is an iterator of tuples

In [8]:
mutants = ['charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pryde']
aliases = ['prof x', 'iceman', 'nightcrawler', 'magneto', 'shadowcat']
powers = ['telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility']

# Create a list of tuples: mutant_data
mutant_data = list(zip(mutants, aliases, powers))

# Print the list of tuples
print(mutant_data)

# Create a zip object using the three lists: mutant_zip
mutant_zip = zip(mutants, aliases, powers)

# Print the zip object
print(mutant_zip)

# Unpack the zip object and print the tuple values
for value1, value2, value3 in mutant_zip:
    print(value1, value2, value3)

[('charles xavier', 'prof x', 'telepathy'), ('bobby drake', 'iceman', 'thermokinesis'), ('kurt wagner', 'nightcrawler', 'teleportation'), ('max eisenhardt', 'magneto', 'magnetokinesis'), ('kitty pryde', 'shadowcat', 'intangibility')]
<zip object at 0x7f26dd39ec40>
charles xavier prof x telepathy
bobby drake iceman thermokinesis
kurt wagner nightcrawler teleportation
max eisenhardt magneto magnetokinesis
kitty pryde shadowcat intangibility


#### Unpacking zip with *

In [13]:
mutants = ['charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pryde']
powers = ['telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility']

# Create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# Print the tuples in z1 by unpacking with *
print(*z1)

# Re-create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
result1, result2 = zip(*z1)

# Check if unpacked tuples are equivalent to original tuples
print(list(result1) == mutants)
print(list(result2) == powers)

('charles xavier', 'telepathy') ('bobby drake', 'thermokinesis') ('kurt wagner', 'teleportation') ('max eisenhardt', 'magnetokinesis') ('kitty pryde', 'intangibility')
True
True


### Using iterators to load large files into memory

Processing large amounts of data

In [15]:
import pandas as pd

counts_dict = {}

# Iterate over the file chunk by chunk
for chunk in pd.read_csv('files/tweets.csv', chunksize=10):

    # Iterate over the column in DataFrame
    for entry in chunk['lang']:
        if entry in counts_dict.keys():
            counts_dict[entry] += 1
        else:
            counts_dict[entry] = 1

# Print the populated dictionary
print(counts_dict)


{'en': 97, 'et': 1, 'und': 2}


In [16]:
import pandas as pd

# Define count_entries()
def count_entries(csv_file, c_size, colname):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    
    # Initialize an empty dictionary: counts_dict
    counts_dict = {}

    # Iterate over the file chunk by chunk
    for chunk in pd.read_csv(csv_file, chunksize=c_size):

        # Iterate over the column in DataFrame
        for entry in chunk[colname]:
            if entry in counts_dict.keys():
                counts_dict[entry] += 1
            else:
                counts_dict[entry] = 1

    # Return counts_dict
    return counts_dict

# Call count_entries(): result_counts
result_counts = count_entries('files/tweets.csv', 10, 'lang')

# Print result_counts
print(result_counts)


{'en': 97, 'et': 1, 'und': 2}


## List comprehensions

List comprehensions are a great way to make your code more Pythonic.

In [18]:
# It works but it's inefficient, both computationally and in terms of coding time and space

nums = [12, 8, 21, 3, 16]
new_nums = []

for num in nums:
    new_nums.append(num + 1)

print(new_nums)

[13, 9, 22, 4, 17]


List comprehensions are a concise way to create lists. They are a combination of a for loop and a list append.

In [19]:
nums = [12, 8, 21, 3, 16]

new_nums = [num + 1 for num in nums]

print(new_nums)

[13, 9, 22, 4, 17]


List comprehensions can be used with any iterable, not just lists.

In [20]:
result = [num for num in range(11)]
print(result)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


**List comprehensions**

* Collapse for loops for building lists into a single line
* Components
  * Iterable
  * Iterator variable (represent members of iterable)
  * Output expression

List comprehensions can also be used with nested loops

In [25]:
# Using for loop
pairs_not_nested = []
for num1 in range(0, 2):
    for num2 in range(6, 8):
        pairs_not_nested.append((num1, num2))
print(pairs_not_nested)

# Using nested list comprehension
pairs_nested = [(num1, num2) for num1 in range(0, 2) for num2 in range(6, 8)]
print(pairs_nested)

assert pairs_nested == pairs_not_nested, "Nested and non-nested lists should be equal!"

# Trade-off: readability vs. efficiency

[(0, 6), (0, 7), (1, 6), (1, 7)]
[(0, 6), (0, 7), (1, 6), (1, 7)]


In [28]:
# Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(5)] for row in range(5)]

# Print the matrix
for row in matrix:
    print(row)

[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]


## Advanced comprehensions

### Conditionals in comprehensions

In [29]:
# Conditionals on the iterable
[num ** 2 for num in range(10) if num % 2 == 0]

[0, 4, 16, 36, 64]

In [35]:
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship, filtering by len
new_fellowship = [member for member in fellowship if len(member) >= 7]

new_fellowship

['samwise', 'aragorn', 'legolas', 'boromir']

In [32]:
# Conditionals on the output expression
[num ** 2 if num % 2 == 0 else 0 for num in range(10)]

[0, 0, 4, 0, 16, 0, 36, 0, 64, 0]

In [36]:
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship
new_fellowship = [member if len(member) >= 7 else "" for member in fellowship]

new_fellowship

['', 'samwise', '', 'aragorn', 'legolas', 'boromir', '']

### Dict comprehensions

In [37]:
pos_neg = {num: -num for num in range(9)}

pos_neg

{0: 0, 1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7, 8: -8}

In [38]:
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create dict comprehension: new_fellowship
new_fellowship = {member: len(member) for member in fellowship}

print(new_fellowship)

{'frodo': 5, 'samwise': 7, 'merry': 5, 'aragorn': 7, 'legolas': 7, 'boromir': 7, 'gimli': 5}


## Generator expressions

A generator is like a list comprehension except it does not store the list in memory: it does not construct the list, but is an object we can iterate over to produce elements of the list as required.

Just replace the square brackets with round parentheses and it becomes a generator.

In [40]:
type([0, 2, 4, 6, 8, 10, 12, 14, 16, 18])

list

In [41]:
type((2 * num for num in range(10)))

generator

### List comprehensions vs. generators
* List comprehension - returns a list
* Generators - returns a generator object
* Both can be iterated over

#### Lazy evaluation

In [42]:
result = (num for num in range(6))

result

<generator object <genexpr> at 0x7f26a3f8af10>

In [43]:
print(next(result))

0


### Conditionals in generator expressions

### Generator functions
* Produces generator objects when called
* Defined like a regular function - `def`
* Yields a sequence of values instead of returning a single value
* Generates a value with `yield` keyword

In [45]:
def num_sequence(n):
    """Generate values from 0 to n."""
    i = 0
    while i < n:
        yield i
        i += 1
    
result = num_sequence(5)
print(type(result))

<class 'generator'>


In [46]:
for item in result:
    print(item)

0
1
2
3
4


In [47]:
import pandas as pd

df = pd.read_csv('files/tweets.csv')

# Extract the created_at column from df: tweet_time
tweet_time = df['created_at']

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time]

# Print the extracted times
print(tweet_clock_time)

['23:40:17', '23:40:17', '23:40:17', '23:40:17', '23:40:17', '23:40:17', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:19', '23:40:18', '23:40:18', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23

Practice:

**World bank data**
* Data on world economies for over half a century
* Indicators
  * Population
  * Electricity consumption
  * CO2 emissions
  * Literacy rates
  * Unemployment
  * Mortality rates