# Using iterators in PythonLand

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction-to-iterators" data-toc-modified-id="Introduction-to-iterators-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction to iterators</a></span></li><li><span><a href="#Playing-with-iterators" data-toc-modified-id="Playing-with-iterators-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Playing with iterators</a></span></li><li><span><a href="#Using-iterators-to-load-large-files-into-memory" data-toc-modified-id="Using-iterators-to-load-large-files-into-memory-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Using iterators to load large files into memory</a></span></li></ul></div>

## Introduction to iterators

- Iterators vs. iterables
    - Iterable
        - Examples: lists, strings, dictionaries, file connections
        - An object with an associated iter() method

    - Iterator
        - Applying 
                iter() 
            to an iterable creates an iterator 
        - Produces next value with 
                next()
- Iterating over iterables
    - next()
            name = "Ellick"
            it = iter(name)
            next(it)
- Iterating at once
    - with *
            print(*it)
- Iterating over dictionaries
        for key, value in dict_name.items():
- Iterating over file connections
        In [1]: file = open('file.txt')
        In [2]: it = iter(file)
        In [3]: print(next(it))
                    This is the first line.
        In [4]: print(next(it))
                    This is the second line.

In [25]:
# Create an iterator for range(3): small_value
small_value = iter(range(5))

# Print the values in small_value
print(next(small_value))
print(*small_value)
print("ya", list(small_value))
# Loop over range(3) and print the values
for num in range(3):
    print(num)


# Create an iterator for range(10 ** 100): googol
googol = iter(range(10**100))

# Print the first 5 values from googol
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))


0
1 2 3 4
ya []
0
1
2
0
1
2
3
4


In [20]:
# Create a range object: values
values = range(10, 21)

# Print the range object
print(values)

# Create a list of integers: values_list
values_list = list(values)

# Print values_list
print(values_list)

# Get the sum of values: values_sum
values_sum = sum(values)

# Print values_sum
print(values_sum)


range(10, 21)
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
165


## Playing with iterators

- enumerate()
    -  returns an enumerate object that produces a sequence of tuples, and each of the tuples is an index-value pair.
    - can be unpack by using for loop
            for index, value in enumerate(something):
    - use start argument to set up the begining of enumerate index
            enumerate(something, start = int)
- zip()
    - takes any number of iterables and returns a zip object that is an iterator of tuples.
    - also can be unpack by using for loop
            for value1, value2, ... in enumerate(something1, something2, ...):
    - Using * and zip to 'unzip'
        - reverse what has been zipped together by using zip() with *, * unpacks an iterable such as a list or a tuple into positional arguments in a function call.
                ex:
                print(*z1)
                or
                zip(*z1)
- 任何操作都會消除iterator 物件中的元素，
        ex: 
        In [1]: m = iter(range(3))
        In [2]: print(list(m))
                [0, 1, 2]
        In [3]: print(list(m))
                []


In [16]:
# Create a list of strings: mutants
mutants = ['charles xavier', 
            'bobby drake', 
            'kurt wagner', 
            'max eisenhardt', 
            'kitty pryde']

# Create a list of tuples: mutant_list
mutant_list = (enumerate(mutants))

# Print the list of tuples
print(list(mutant_list))
print(list(mutant_list))

# Unpack and print the tuple pairs
for index1, value1 in enumerate(mutants):
    print(index1, value1)

# Change the start index
for index2, value2 in enumerate(mutants, start = 1):
    print(index2, value2)


[(0, 'charles xavier'), (1, 'bobby drake'), (2, 'kurt wagner'), (3, 'max eisenhardt'), (4, 'kitty pryde')]
[]
0 charles xavier
1 bobby drake
2 kurt wagner
3 max eisenhardt
4 kitty pryde
1 charles xavier
2 bobby drake
3 kurt wagner
4 max eisenhardt
5 kitty pryde


In [2]:
mutants = ['charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pryde']
aliases = ['prof x', 'iceman', 'nightcrawler', 'magneto', 'shadowcat']
powers = ['telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility']
# Create a list of tuples: mutant_data
mutant_data = list(zip(mutants, aliases, powers))

# Print the list of tuples
print(mutant_data)

# Create a zip object using the three lists: mutant_zip
mutant_zip = zip(mutants, aliases, powers)

# Print the zip object
print(mutant_zip)

# Unpack the zip object and print the tuple values
for value1, value2, value3 in mutant_zip:
    print(value1, value2, value3)

[('charles xavier', 'prof x', 'telepathy'), ('bobby drake', 'iceman', 'thermokinesis'), ('kurt wagner', 'nightcrawler', 'teleportation'), ('max eisenhardt', 'magneto', 'magnetokinesis'), ('kitty pryde', 'shadowcat', 'intangibility')]
<zip object at 0x11c74b550>
charles xavier prof x telepathy
bobby drake iceman thermokinesis
kurt wagner nightcrawler teleportation
max eisenhardt magneto magnetokinesis
kitty pryde shadowcat intangibility


In [12]:
# Create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# Print the tuples in z1 by unpacking with *

print(*z1)
# Re-create a zip object from mutants and powers: z1
z1 = list(zip(mutants, powers))

# 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
result1, result2 = zip(*z1)
# Check if unpacked tuples are equivalent to original tuples
print(list(result1) == mutants, result1)
print(list(result2) == powers, result2)


[('charles xavier', 'telepathy'), ('bobby drake', 'thermokinesis'), ('kurt wagner', 'teleportation'), ('max eisenhardt', 'magnetokinesis'), ('kitty pryde', 'intangibility')]
ya


"# Re-create a zip object from mutants and powers: z1\nz1 = list(zip(mutants, powers))\n\n# 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2\nresult1, result2 = zip(*z1)\n# Check if unpacked tuples are equivalent to original tuples\nprint(list(result1) == mutants, result1)\nprint(list(result2) == powers, result2)"

## Using iterators to load large files into memory

- Loading data in chunks
    - There can be too much data to hold in memory 
    - Solution: load data in chunks!
    - Pandas function: read_csv() 
        - Specify the chunk: chunksize
                for chunk in pd.read_csv('data.csv', chunksize=1000):
                    total += sum(chunk['x'])
        - chunkssize 決定一次loading 多少筆資料
        - 換句話說就是 pd.DataFrame 的多少行

In [6]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import jupyterthemes.jtplot as jtplot
%matplotlib inline
jtplot.style(theme='onedork')

tweets_df = pd.read_csv('exercise/tweets.csv')

# Define count_entries()
def count_entries(csv_file, c_size, colname):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    
    # Initialize an empty dictionary: counts_dict
    counts_dict = {}

    # Iterate over the file chunk by chunk
    for chunk in pd.read_csv(csv_file, chunksize= c_size):
        print(chunk)
        # Iterate over the column in DataFrame
        for entry in chunk[colname]:
            if entry in counts_dict.keys():
                counts_dict[entry] += 1
            else:
                counts_dict[entry] = 1

    # Return counts_dict
    return counts_dict

# Call count_entries(): result_counts
result_counts = count_entries('exercise/tweets.csv', 10, 'lang')

# Print result_counts
print(result_counts)


   contributors  coordinates                      created_at  \
0           NaN          NaN  Tue Mar 29 23:40:17 +0000 2016   
1           NaN          NaN  Tue Mar 29 23:40:17 +0000 2016   
2           NaN          NaN  Tue Mar 29 23:40:17 +0000 2016   
3           NaN          NaN  Tue Mar 29 23:40:17 +0000 2016   
4           NaN          NaN  Tue Mar 29 23:40:17 +0000 2016   
5           NaN          NaN  Tue Mar 29 23:40:17 +0000 2016   
6           NaN          NaN  Tue Mar 29 23:40:18 +0000 2016   
7           NaN          NaN  Tue Mar 29 23:40:17 +0000 2016   
8           NaN          NaN  Tue Mar 29 23:40:18 +0000 2016   
9           NaN          NaN  Tue Mar 29 23:40:18 +0000 2016   

                                            entities  \
0  {'hashtags': [], 'user_mentions': [{'screen_na...   
1  {'hashtags': [{'text': 'cruzsexscandal', 'indi...   
2  {'hashtags': [], 'user_mentions': [], 'symbols...   
3  {'hashtags': [], 'user_mentions': [], 'symbols...   
4  {'hashtags':