# List comprehensions and generators

## List comprehensions

- Collapse for loops for building lists into a single line
- Components
 - Iterable
 - Iterator variable (represent members of iterable)
 - Output expression
- [[output expression] for iterator variable in iterable]

#### Populate a list with a for loop:

In [1]:
nums = [12, 8, 21, 3, 16]

new_nums = []

for num in nums:
    new_nums.append(num + 1)
    
print(new_nums)

[13, 9, 22, 4, 17]


#### List comprehension:

- Put output expression (num+1) first, then the for clause refering to the original list

In [2]:
nums = [12, 8, 21, 3, 16]

new_nums = [num + 1 for num in nums]

print(new_nums)

[13, 9, 22, 4, 17]


#### Syntax
new_nums = [num + 1 for num in nums]

- for num in nums: -- <b>for num in nums</b>
 -    new_nums.append(num + 1) --- <b>num + 1</b>

### List comprehension with range()

In [3]:
result = [num for num in range(11)]

result

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

### Nested Loops (1)

In [4]:
pairs_1 = []

for num1 in range(0, 2):
    for num2 in range(6, 8):
        pairs_1.append((num1,num2)) # you need to pass this as a tuple

print(pairs_1)

[(0, 6), (0, 7), (1, 6), (1, 7)]


### Nested Loops (2)

In [5]:
pairs_2 = [(num1, num2) for num1 in range(0, 2) for num2 in range(6, 8)] # Tradeoff: readability

print (pairs_2)

[(0, 6), (0, 7), (1, 6), (1, 7)]


#### Example 1:
Write a list comprehension to create a list of the first character of each string in list: doctor

In [10]:
doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson']

new_doctor = [doc[0] for doc in doctor]

print (new_doctor)

['h', 'c', 'c', 't', 'w']


#### Example 2:

In [11]:
# Create list comprehension: squares
squares = [i**2 for i in range(0,10)]
print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


#### Example 3: Creating Matrices

One of the ways in which lists can be used are in representing multi-dimension objects such as matrices. Matrices can be represented as a list of lists in Python. For example a 5 x 5 matrix with values 0 to 4 in each row can be written as:

matrix = <br>[[0, 1, 2, 3, 4],<br>
          [0, 1, 2, 3, 4],<br>
          [0, 1, 2, 3, 4],<br>
          [0, 1, 2, 3, 4],<br>
          [0, 1, 2, 3, 4]]

In [17]:
# To create the list of lists, you simply have to supply the list comprehension as the output expression 
# of the overall list comprehension:

# [[output expression] for iterator variable in iterable] --- the output expression here is itself a list comprehension.


# Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(0,5)] for row in range(0,5)]
print(matrix)
print('\n')

# Print the matrix
for row in matrix:
    print(row)

[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]


[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]


## Advanced List comprehensions

### Conditionals in comprehensions 
- Conditionals on the iterable:
You can apply a conditional statement to test the iterator variable by adding an if statement in the optional predicate expression part after the for statement in the comprehension:
[ output expression for iterator variable in iterable if predicate expression ].

In [9]:
[num ** 2 for num in range(10) if num % 2 == 0]

[0, 4, 16, 36, 64]

In [10]:
# Python documentation on the % modulo operator (remainder):
print(5 % 2)

print(6 % 2)

1
0


- Conditionals on the output expression

In [15]:
[num ** 2 if num % 2 == 0 else 0 for num in range(10)]

[0, 0, 4, 0, 16, 0, 36, 0, 64, 0]

#### Example 1:

In [14]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship
new_fellowship = [member for member in fellowship if len(member) >= 7]

# Print the new list
print(new_fellowship)

['samwise', 'aragorn', 'legolas', 'boromir']


#### Example 2:

In [15]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship
new_fellowship = [member if len(member) >= 7 else '' for member in fellowship ]

# Print the new list
print(new_fellowship)

['', 'samwise', '', 'aragorn', 'legolas', 'boromir', '']


### Dict comprehensions 
- Create dictionaries
- Use curly braces {} instead of brackets []

In [10]:
pos_neg = {num: -num for num in range(9)}

print(pos_neg)

print(type(pos_neg))

{0: 0, 1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7, 8: -8}
<type 'dict'>


#### Example 1:

Create a dictionary with the members of the list as the keys and the length of each string as the corresponding values.

In [16]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create dict comprehension: new_fellowship
new_fellowship = {member: len(member) for member in fellowship}

# Print the new list
print(new_fellowship)

{'aragorn': 7, 'frodo': 5, 'samwise': 7, 'merry': 5, 'gimli': 5, 'boromir': 7, 'legolas': 7}


## Introduction to generators

If you have ever iterated over a dictionary with .items(), or used the range() function, for example, you have already encountered and used generators before, without knowing it! When you use these functions, Python creates generators for you behind the scenes.

### Generator expressions 
- Recall list comprehension

In [1]:
[2 * num for num in range(10)]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

- Use ( ) instead of [ ]

In [2]:
(2 * num for num in range(10))

<generator object <genexpr> at 0x00000000063E2750>

### List comprehensions vs. generators 
- List comprehension - returns a list
- Generators - returns a generator object
- Both can be iterated over

### Printing values from generators (1)

In [4]:
result = (num for num in range(6))

for num in result:
    print(num)

0
1
2
3
4
5


In [6]:
result = (num for num in range(6))

print(list(result))

[0, 1, 2, 3, 4, 5]


### Printing values from generators (2)

In [7]:
# Lazy evaluation
result = (num for num in range(6))

print(next(result))
print(next(result))
print(next(result))
print(next(result))

0
1
2
3


### Generators vs list comprehensions

In [9]:
[num for num in range(10**10000)]

OverflowError: range() result has too many items

In [10]:
(num for num in range(10**10000))

OverflowError: range() result has too many items

### Conditionals in generator expressions

In [12]:
even_nums = (num for num in range(10) if num % 2 == 0)

print(even_nums)
print(list(even_nums))

<generator object <genexpr> at 0x0000000006524F30>
[0, 2, 4, 6, 8]


## Generator functions

 Generator functions are functions that, like generator expressions, yield a series of values, instead of returning a single value. A generator function is defined as you do a regular function, but whenever it generates a value, it uses the keyword yield instead of return.
- Produces generator objects when called
- Defined like a regular function - def
- Yields a sequence of values instead of returning a single value
- Generates a value with yield keyword

### Build a generator function

In [13]:
def num_sequence(n):
    """Generate values from 0 to n."""
    i = 0
    while i < n:
        yield i
        i += 1

### Use a generator function

In [22]:
result = num_sequence(5)
print(type(result))

<type 'generator'>


In [23]:
for item in result:
    print(item)

0
1
2
3
4


#### Example 1: 

In [26]:
# Create generator object: result
result = (num for num in range (0,12))

# Print the first 5 values
print(next(result))
print(next(result))
print(next(result))
print(next(result))
print(next(result))

# Print the rest of the values
for value in result:
    print(value)

0
1
2
3
4
5
6
7
8
9
10
11


#### Example 2: 

In [29]:
# Create a list of strings: lannister
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Create a generator object: lengths
lengths = (len(person) for person in lannister)

# Iterate over and print the values in lengths
for value in lengths:
    print(value)

6
5
5
6
7


#### Example 3: Generator Function

In [30]:
# Create a list of strings
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Define generator function get_lengths
def get_lengths(input_list):
    """Generator function that yields the length of the strings in input_list."""

    # Yield the length of a string
    for person in input_list:
        yield len(person)

# Print the values generated by get_lengths()
for value in get_lengths(lannister):
    print(value)

6
5
5
6
7


## Summary: list comprehensions

- Basic

[output expression for iterator variable in iterable]
- Advanced

[output expression + conditional on output for iterator variable
in iterable + conditional on iterable]

### Exercise: List comprehensions for time-stamped data (Data Extraction w/ Pandas)

In [38]:
# Import pandas
import pandas as pd

# Import Twitter data as DataFrame: df
df = pd.read_csv('datasets/tweets.csv')

# Extract the created_at column from df: tweet_time - the extracted column in tweet_time here is a Series data structure
tweet_time = df['created_at']

print(tweet_time)


# Extract the clock time: tweet_clock_time - Create a list comprehension that extracts the time from each row in tweet_time. 
# Each row is a string that represents a timestamp, and you will access the 12th to 19th characters in the string 
# to extract the time.
tweet_clock_time = [entry [11:19] for entry in tweet_time]

# Print the extracted times
print(tweet_clock_time)


0     Tue Mar 29 23:40:17 +0000 2016
1     Tue Mar 29 23:40:17 +0000 2016
2     Tue Mar 29 23:40:17 +0000 2016
3     Tue Mar 29 23:40:17 +0000 2016
4     Tue Mar 29 23:40:17 +0000 2016
5     Tue Mar 29 23:40:17 +0000 2016
6     Tue Mar 29 23:40:18 +0000 2016
7     Tue Mar 29 23:40:17 +0000 2016
8     Tue Mar 29 23:40:18 +0000 2016
9     Tue Mar 29 23:40:18 +0000 2016
10    Tue Mar 29 23:40:18 +0000 2016
11    Tue Mar 29 23:40:17 +0000 2016
12    Tue Mar 29 23:40:18 +0000 2016
13    Tue Mar 29 23:40:18 +0000 2016
14    Tue Mar 29 23:40:17 +0000 2016
15    Tue Mar 29 23:40:18 +0000 2016
16    Tue Mar 29 23:40:18 +0000 2016
17    Tue Mar 29 23:40:17 +0000 2016
18    Tue Mar 29 23:40:18 +0000 2016
19    Tue Mar 29 23:40:17 +0000 2016
20    Tue Mar 29 23:40:18 +0000 2016
21    Tue Mar 29 23:40:18 +0000 2016
22    Tue Mar 29 23:40:18 +0000 2016
23    Tue Mar 29 23:40:18 +0000 2016
24    Tue Mar 29 23:40:17 +0000 2016
25    Tue Mar 29 23:40:18 +0000 2016
26    Tue Mar 29 23:40:18 +0000 2016
2

In [39]:
# Conditional list comprehesions for time-stamped data
# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] == '19']

# Print the extracted times
print(tweet_clock_time)

['23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19']
