# List comprehensions and generators

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#List-comprehensions" data-toc-modified-id="List-comprehensions-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>List comprehensions</a></span></li><li><span><a href="#Advanced-comprehensions" data-toc-modified-id="Advanced-comprehensions-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Advanced comprehensions</a></span></li><li><span><a href="#Introduction-to-generator-expressions" data-toc-modified-id="Introduction-to-generator-expressions-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Introduction to generator expressions</a></span></li><li><span><a href="#Wrapping-up-comprehensions-and-generators" data-toc-modified-id="Wrapping-up-comprehensions-and-generators-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Wrapping up comprehensions and generators</a></span></li></ul></div>

## List comprehensions

- Collapse for loops for building lists into a single line 
- Components
    - Iterable(range(), list, ... ,etc)
    - Iterator variable (represent members of iterable) 
    - Output expression
            something = [[output expression] for iterator variable in iterable]
    - Tradeoff: readability
- Nested loop is capable
            ex: generate matrix
            matrix = [[col for col in range(0, 5)] for row in range(0,5)]
                                        inner loop                 outer loop

In [1]:
# Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(0, 5)] for row in range(0,5)]
matrix2 = []
for row in range(0,4):
    matrix2.append([])
    for col in range(0,5):
        matrix2[row] += [col]
# Print the matrix
print(matrix)
print(matrix2)


[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]
[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]


## Advanced comprehensions

- Conditionals on the iterable
        In [1]: [num ** 2 for num in range(10) if num % 2 == 0]
        Out[1]: [0, 4, 16, 36, 64]
- Conditionals on the output expression
         In [2]: [num ** 2     for num in range(10)]
         Out[2]: [0, 0, 4, 0, 16, 0, 36, 0, 64, 0]
- Dict comprehensions
        pos_neg = {num: -num for num in range(9)}

In [2]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship, Conditionals on the iterable
new_fellowship = [member for member in fellowship if len(member) >= 7]

# Print the new list
print(new_fellowship)

['samwise', 'aragorn', 'legolas', 'boromir']


In [3]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship, Conditionals on the output expression
new_fellowship = [member if len(member) >= 7 else '' for member in fellowship ]

# Print the new list
print(new_fellowship)


['', 'samwise', '', 'aragorn', 'legolas', 'boromir', '']


In [4]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create dict comprehension: new_fellowship
new_fellowship = {member : len(member) for member in fellowship}

# Print the new dictionary
print(new_fellowship)


{'frodo': 5, 'samwise': 7, 'merry': 5, 'aragorn': 7, 'legolas': 7, 'boromir': 7, 'gimli': 5}


## Introduction to generator expressions

- List comprehensions vs. generators
    - List comprehension - returns a list 
    - Generators - returns a generator object
    - Both can be iterated over
            result = (num for num in range(10 ** 10000))
- generator object 不會真正占用到記憶體空間，因為內容還沒有真正被創造，但仍可以進行操作，適用於過於巨大的資料
- Conditionals in generator expressions
        In [1]: even_nums = (num for num in range(10) if num % 2 == 0)
        In [2]: print(list(even_nums))
        [0, 2, 4, 6, 8]
- Generator functions
    - Produces generator objects when called
    - Defined like a regular function - def
    - Yields a sequence of values instead of returning a single value
    - Generates a value with yield keyword
            def num_sequence(n):
                 """Generate values from 0 to n."""
                i= 0
                while i < n:
                    yield i 
                    i += 1

In [5]:
import numpy as np
# Create generator object: result
result = (num for num in range(0, 7))

# Print the first 3 values
print(next(result))
print(next(result))
print(next(result))

'''print(np.array(list(result)) < 5)
for value in result:
    if value < 5:
        print(value)'''
r1 = list(result)
r2 = list(result)
print(r1, r2)
print("generator object 的元素如 iterators 一樣，使用過就會消失，"
      "在 for loop 中 result 的每一個值都經過迭代了，"
     "不管有無output出來都視作使用過")

0
1
2
[3, 4, 5, 6] []
generator object 的元素如 iterators 一樣，使用過就會消失，在 for loop 中 result 的每一個值都經過迭代了，不管有無output出來都視作使用過


In [6]:
# Create a list of strings: lannister
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Create a generator object: lengths
lengths = (len(person) for person in lannister)

# Iterate over and print the values in lengths
for value in lengths:
    print(value)


6
5
5
6
7


In [7]:
# Create a list of strings
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Define generator function get_lengths
def get_lengths(input_list):
    """Generator function that yields the
    length of the strings in input_list."""

    # Yield the length of a string
    for person in input_list:
        yield len(person)

# Print the values generated by get_lengths()
for value in get_lengths(lannister):
    print(value)

6
5
5
6
7


## Wrapping up comprehensions and generators
- Basic
        [<output expression> for <iterator variable> in <iterable>] 
- Advanced
        [<output expression + conditional on output> for <iterator variable> in <iterable + conditional on iterable>]

In [10]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import jupyterthemes.jtplot as jtplot
%matplotlib inline
jtplot.style(theme='onedork')

df = pd.read_csv('exercise/tweets.csv')

# Extract the created_at column from df: tweet_time
tweet_time = df['created_at']

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] == '19']

# Print the extracted times
print(tweet_clock_time)



['23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19']
