# Iterators

## Iterating with a for loop

- We can iterate over a list using a for loop:

In [4]:
employees = ['Nick', 'Lore', 'Hugo']

for employee in employees:
    print(employee)

Nick
Lore
Hugo


- We can iterate over a string using a for loop:

In [6]:
for letter in 'DataCamp':
        print(letter)

D
a
t
a
C
a
m
p


- We can iterate over a range object using a for loop

In [7]:
for i in range(4):
    print(i)

0
1
2
3


## Iterators vs. iterables

- Iterable
 - Examples: lists, strings, dictionaries, file connections
 - An object with an associated <b>iter( )</b> method
 - Applying iter() to an iterable creates an iterator
- Iterator
 - Produces next value with <b>next( )</b>
- Summary: An iterable is an object that can return an iterator, while an iterator is an object that keeps state and produces the next value when you call next() on it.

### Iterating over iterables: next()

In [1]:
word = 'Da' # Iterable

it = iter(word) # Iterator

print(it) 

<str_iterator object at 0x000002B992CFC7C8>


In [2]:
print(word) # Iterable

Da


In [3]:
next(it)

'D'

In [4]:
next(it)

'a'

In [5]:
next(it)

StopIteration: 

In [12]:
word = 'Da'

it = iter(word)

print(*it) 
print(it)
print(*word) 
print(word) 

D a
<str_iterator object at 0x000002B994DB0488>
D a
Da


#### Example 1: 

In [51]:
# Create a list of strings: flash
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']

# Print each list item in flash using a for loop
for i in flash:
    print (i)


# Create an iterator for flash: superspeed
superspeed = iter(flash)
print('\n')
# Print each item from the iterator
print(next(superspeed))
print(next(superspeed))
print(next(superspeed))
print(next(superspeed))

jay garrick
barry allen
wally west
bart allen


jay garrick
barry allen
wally west
bart allen


#### Example 2:
range() doesn't actually create the list; instead, it creates a range object with an iterator that produces the values until it reaches the limit.

Create an iterator for range(10 ** 100): googol

In [13]:

# Create an iterator for range(3): small_value
small_value = iter(range(3))

# Print the values in small_value
print(next(small_value))
print(next(small_value))
print(next(small_value))
print('\n')
# Loop over range(3) and print the values
for i in range(3):
    print(i)

print('\n')
# Create an iterator for range(10 ** 100): googol
googol = iter(range(10 ** 100))

# Print the first 5 values from googol # Python 3 won't throw an error
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))


0
1
2


0
1
2


0
1
2
3
4


#### Example 3:

In [14]:
# Create a range object: values
values = range(10,21)

# Print the range object
print(values)

# Create a list of integers: values_list
values_list = list(values)

# Print values_list
print(values_list)

# Get the sum of values: values_sum
values_sum = sum(iter(values))

# Print values_sum
print(values_sum)

range(10, 21)
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
165


### Iterating at once with * - Splat/unpack operator

In [46]:
word = 'Data'

it = iter(word)

print(*it) # The single star * unpacks the sequence/collection into positional arguments
# Not allowing unpacking in Python 2.x has noted and fixed in Python 3.5

SyntaxError: invalid syntax (<ipython-input-46-c077c1d1584c>, line 5)

In [27]:
# Example: Unpacking with *
def sum(a, b):
    return a + b

values = (1, 2)

s = sum(*values)

print(s)

3


In [32]:
# Example: Unpacking with ** (dictionary)
# The double star ** does the same, only using a dictionary and thus named arguments:

def sum(a, b, c, d):
    return a + b + c + d

values1 = (1, 2)
values2 = { 'c': 10, 'd': 15 }

s = sum(*values1, **values2)

print(s)

28


### Iterating over dictionaries

In [35]:
pythonistas = {'hugo': 'bowne-anderson', 'francis':'castro'}

for key, value in pythonistas.items():
    print(key, value)


('hugo', 'bowne-anderson')
('francis', 'castro')


### Iterating over file connections

In [38]:
file = open('datasets/tweets.csv')

it = iter(file)

print(next(it)) # Prints first line

contributors,coordinates,created_at,entities,extended_entities,favorite_count,favorited,filter_level,geo,id,id_str,in_reply_to_screen_name,in_reply_to_status_id,in_reply_to_status_id_str,in_reply_to_user_id,in_reply_to_user_id_str,is_quote_status,lang,place,possibly_sensitive,quoted_status,quoted_status_id,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,timestamp_ms,truncated,user



In [39]:
print(next(it)) # Prints second line

,,Tue Mar 29 23:40:17 +0000 2016,"{'hashtags': [], 'user_mentions': [{'screen_name': 'bpolitics', 'name': 'Bloomberg Politics', 'id': 564111558, 'id_str': '564111558', 'indices': [3, 13]}, {'screen_name': 'krollbondrating', 'name': 'Kroll Bond Ratings', 'id': 1963523857, 'id_str': '1963523857', 'indices': [16, 32]}], 'symbols': [], 'media': [{'sizes': {'large': {'w': 1024, 'h': 691, 'resize': 'fit'}, 'medium': {'w': 600, 'h': 405, 'resize': 'fit'}, 'small': {'w': 340, 'h': 229, 'resize': 'fit'}, 'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}}, 'expanded_url': 'http://twitter.com/bpolitics/status/714950482930896897/photo/1', 'url': 'https://t.co/lJcw0N8EZf', 'media_url_https': 'https://pbs.twimg.com/media/CewDrPtWAAMerOm.jpg', 'source_user_id': 564111558, 'media_url': 'http://pbs.twimg.com/media/CewDrPtWAAMerOm.jpg', 'type': 'photo', 'indices': [139, 140], 'source_status_id': 714950482930896897, 'id_str': '714950482331041795', 'source_user_id_str': '564111558', 'id': 714950482331041795

## Using enumerate()

- It is a function that takes any iterable as an argument and returns an enumerate object that contain pairs of original iterable values and index within the iterable (as tuples).
- enumerate object is also an iterable that can be looped over.
- enumerate() returns an enumerate object that produces a sequence of tuples, and each of the tuples is an index-value pair.

In [69]:
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']

e = enumerate(avengers)

e_list = list(e)

print(type(avengers))

print(type(e))

print(type(e_list))

print(e_list) # prints the list of tuples

<type 'list'>
<type 'enumerate'>
<type 'list'>
[(0, 'hawkeye'), (1, 'iron man'), (2, 'thor'), (3, 'quicksilver')]


In [66]:
for index, value in enumerate(avengers):
    print(index, value)

(0, 'hawkeye')
(1, 'iron man')
(2, 'thor')
(3, 'quicksilver')


In [67]:
for index, value in enumerate(avengers, start=10):
    print(index, value)

(10, 'hawkeye')
(11, 'iron man')
(12, 'thor')
(13, 'quicksilver')


#### Example 1: enumerate()

In [83]:
# Create a list of strings: mutants
mutants = ['charles xavier', 
            'bobby drake', 
            'kurt wagner', 
            'max eisenhardt', 
            'kitty pride']

# Create a list of tuples: mutant_list
mutant_list = list(enumerate(mutants))

# Print the list of tuples
print(mutant_list)

print('\n')

# Unpack and print the tuple pairs
for index1,value1 in mutant_list:
    print(index1, value1)

print('\n')
# Change the start index
for index2,value2 in list(enumerate(mutants, start=1)):
    print(index2, value2)

[(0, 'charles xavier'), (1, 'bobby drake'), (2, 'kurt wagner'), (3, 'max eisenhardt'), (4, 'kitty pride')]


(0, 'charles xavier')
(1, 'bobby drake')
(2, 'kurt wagner')
(3, 'max eisenhardt')
(4, 'kitty pride')


(1, 'charles xavier')
(2, 'bobby drake')
(3, 'kurt wagner')
(4, 'max eisenhardt')
(5, 'kitty pride')


## Using zip ()

- It is a function that accepts an arbitrary number of iterables and returns an iterator of tuples (of stitched elements)
- If you wanted to print the values of a zip object, you can convert it into a list and then print it. Printing just a zip object will not return the values unless you unpack it first

In [17]:
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
names = ['barton', 'stark', 'odinson', 'maximoff']

z = zip(avengers, names)

print(type(z)) # returns a zip object, not list as below (python 2 thing)

z_list = list(z)

print(z_list)

<class 'zip'>
[('hawkeye', 'barton'), ('iron man', 'stark'), ('thor', 'odinson'), ('quicksilver', 'maximoff')]


In [85]:
# You can also unpack using for loop
for z1, z2 in zip(avengers, names):
    print(z1, z2)

('hawkeye', 'barton')
('iron man', 'stark')
('thor', 'odinson')
('quicksilver', 'maximoff')


In [77]:
# You can also use the splat operator (*) to unzip the zip object 

z = zip(avengers, names)

print(*z) # not valid in python 2

SyntaxError: invalid syntax (<ipython-input-77-be76e05d012b>, line 5)

#### Example 1: zip()

In [19]:
mutants = ['charles xavier', 'bobby drake','kurt wagner','max eisenhardt','kitty pride']
aliases = ['prof x', 'iceman', 'nightcrawler', 'magneto','shadowcat']
powers = ['telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis','intangibility']

# Create a list of tuples: mutant_data
mutant_data = list(zip(mutants, aliases, powers))

# Print the list of tuples
print(mutant_data)

# Create a zip object using the three lists: mutant_zip
mutant_zip = zip(mutants, aliases, powers)

# Print the zip object
print(mutant_zip) # wouldnt print in python 3 - it is a zip object

# Unpack the zip object and print the tuple values
for value1, value2, value3 in mutant_zip:
    print(value1, value2, value3)

[('charles xavier', 'prof x', 'telepathy'), ('bobby drake', 'iceman', 'thermokinesis'), ('kurt wagner', 'nightcrawler', 'teleportation'), ('max eisenhardt', 'magneto', 'magnetokinesis'), ('kitty pride', 'shadowcat', 'intangibility')]
<zip object at 0x000002B994DB08C8>
charles xavier prof x telepathy
bobby drake iceman thermokinesis
kurt wagner nightcrawler teleportation
max eisenhardt magneto magnetokinesis
kitty pride shadowcat intangibility


#### Example 2: zip() - cant be done in python 2

There is no unzip function for doing the reverse of what zip() does. We can, however, reverse what has been zipped together by using zip() with a little help from * which unpacks an iterable such as a list or a tuple into positional arguments in a function call.

In [34]:
# Create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# Print the tuples in z1 by unpacking with *
print(*z1)

# Re-create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
result1, result2 = zip(*z1)

# Check if unpacked tuples are equivalent to original tuples - need to convert the tuple results to list
print(list(result1)==mutants)
print(list(result2) == powers)

('charles xavier', 'telepathy') ('bobby drake', 'thermokinesis') ('kurt wagner', 'teleportation') ('max eisenhardt', 'magnetokinesis') ('kitty pride', 'intangibility')
True
False


## Using iterators for big data

### Loading data in chunks

- There can be too much data to hold in memory
- Solution: load data in chunks!
- Pandas function: read_csv()
 - Specify the chunk: chunksize

In [None]:
# summing a column of numbers from a large data set
import pandas as pd
result = []

for chunk in pd.read_csv('data.csv', chunksize=1000):
    result.append(sum(chunk['x'])) # 'x' column consisting of numbers

total = sum(result)

print(total)
4252532

In [None]:
# or we can do it this way:
import pandas as pd
total = 0
for chunk in pd.read_csv('data.csv', chunksize=1000):
    total += sum(chunk['x'])
print(total)
4252532

#### Example 1: Processing and extracting large amounts of Twitter data

In [96]:
import pandas as pd

# Initialize an empty dictionary: counts_dict = {}
counts_dict = {}

# Iterate over the file chunk by chunk
for chunk in pd.read_csv('datasets/tweets.csv',chunksize=10):

    # Iterate over the column in DataFrame
    for entry in chunk['lang']:
        if entry in counts_dict.keys():
            counts_dict[entry] += 1
        else:
            counts_dict[entry] = 1

# Print the populated dictionary
print(counts_dict)

{'et': 1, 'en': 97, 'und': 2}


In [98]:
# Define count_entries()
def count_entries(csv_file,c_size,colname):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    
    # Initialize an empty dictionary: counts_dict
    counts_dict = {}

    # Iterate over the file chunk by chunk
    for chunk in pd.read_csv(csv_file,chunksize=c_size):

        # Iterate over the column in DataFrame
        for entry in chunk[colname]:
            if entry in counts_dict.keys():
                counts_dict[entry] += 1
            else:
                counts_dict[entry] = 1

    # Return counts_dict
    return counts_dict

# Call count_entries(): result_counts
result_counts = count_entries('datasets/tweets.csv',3,'lang')

# Print result_counts
print(result_counts)


{'et': 1, 'en': 97, 'und': 2}
