# List comprehensions and generators
In this chapter, you'll build on your knowledge of iterators and be introduced to list comprehensions, which allow you to create complicated lists—and lists of lists—in one line of code! List comprehensions can dramatically simplify your code and make it more efficient, and will become a vital part of your Python data science toolbox. You'll then learn about generators, which are extremely helpful when working with large sequences of data that you may not want to store in memory, but instead generate on the fly.

In [10]:
nums = [12,8,21,3,16]
new_nums = [num + 1 for num in nums]
result = [num for num in range(11)]
print(result)
print(new_nums)

pair_1 = []
for num1 in range(0,2):
    for num2 in range(6,8):
        pair_1.append((num1,num2))
print(pair_1)        

pair_2 = [(num1,num2) for num1 in range(0,2) for num2 in range(6,8)]
print(pair_2)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[13, 9, 22, 4, 17]
[(0, 6), (0, 7), (1, 6), (1, 7)]
[(0, 6), (0, 7), (1, 6), (1, 7)]


                                            Writing list comprehensions
You now have all the knowledge necessary to begin writing list comprehensions! Your job in this exercise is to write a list comprehension that produces a list of the squares of the numbers ranging from 0 to 9.

#1 Instructions:
    1. Using the range of numbers from 0 to 9 as your iterable and i as your iterator variable, write a list comprehension that produces a list of numbers consisting of the squared values of i.

In [11]:
squares = [i**2 for i in range(0,10)]
print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


                                            Nested list comprehensions
Great! At this point, you have a good grasp of the basic syntax of list comprehensions. Let's push your code-writing skills a little further. In this exercise, you will be writing a list comprehension within another list comprehension, or nested list comprehensions. It sounds a little tricky, but you can do it!

Let's step aside for a while from strings. One of the ways in which lists can be used are in representing multi-dimension objects such as matrices. Matrices can be represented as a list of lists in Python. For example a 5 x 5 matrix with values 0 to 4 in each row can be written as:

---------------------------------

matrix = [[0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4]]

---------------------------------

Your task is to recreate this matrix by using nested listed comprehensions. Recall that you can create one of the rows of the matrix with a single list comprehension. To create the list of lists, you simply have to supply the list comprehension as the output expression of the overall list comprehension:

[[output expression] for iterator variable in iterable]

Note that here, the output expression is itself a list comprehension.

#1 Instructions:
    1. In the inner list comprehension - that is, the output expression of the nested list comprehension - create a list of values from 0 to 4 using range(). Use col as the iterator variable.
    2. In the iterable part of your nested list comprehension, use range() to count 5 rows - that is, create a list of values from 0 to 4. Use row as the iterator variable; note that you won't be needing this to create values in the list of lists.

In [16]:
matrix = [[col for col in range(0,4)]for row in range(0,4)]
print(matrix)
for row in matrix:
    print(row)

[[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3]]
[0, 1, 2, 3]
[0, 1, 2, 3]
[0, 1, 2, 3]
[0, 1, 2, 3]


In [20]:
print([num**2 for num in range(10) if num%2==0])
print([num if num%2==0 else 0 for num in range(10)])


[0, 4, 16, 36, 64]
[0, 0, 2, 0, 4, 0, 6, 0, 8, 0]


In [21]:
pos_neg = {num:-num for num in range(9)}
print(pos_neg)

{0: 0, 1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7, 8: -8}


                                    Using conditionals in comprehensions (1)
You've been using list comprehensions to build lists of values, sometimes using operations to create these values.

An interesting mechanism in list comprehensions is that you can also create lists with values that meet only a certain condition. One way of doing this is by using conditionals on iterator variables. In this exercise, you will do exactly that!

Recall from the video that you can apply a conditional statement to test the iterator variable by adding an if statement in the optional predicate expression part after the for statement in the comprehension:

[ output expression for iterator variable in iterable if predicate expression ].

You will use this recipe to write a list comprehension for this exercise. You are given a list of strings fellowship and, using a list comprehension, you will create a list that only includes the members of fellowship that have 7 characters or more.       

#1 Instructions:
    1. Use member as the iterator variable in the list comprehension. For the conditional, use len() to evaluate the iterator variable. Note that you only want strings with 7 characters or more.

In [6]:
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
new_fellowship = [member for member in fellowship if len(member)>=7]
print(new_fellowship1)

[0, 'samwise', 0, 'aragorn', 'legolas', 'boromir', 0]


                                    Using conditionals in comprehensions (2)
In the previous exercise, you used an if conditional statement in the predicate expression part of a list comprehension to evaluate an iterator variable. In this exercise, you will use an if-else statement on the output expression of the list.

You will work on the same list, fellowship and, using a list comprehension and an if-else conditional statement in the output expression, create a list that keeps members of fellowship with 7 or more characters and replaces others with an empty string. Use member as the iterator variable in the list comprehension.

#1 Instructions:
    1. In the output expression, keep the string as-is if the number of characters is >= 7, else replace it with an empty string - that is, '' or "".



In [8]:
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
new_fellowship = [member if len(member)>=7 else "" for member in fellowship]
print(new_fellowship)


['', 'samwise', '', 'aragorn', 'legolas', 'boromir', '']


                                                Dict comprehensions
Comprehensions aren't relegated merely to the world of lists. There are many other objects you can build using comprehensions, such as dictionaries, pervasive objects in Data Science. You will create a dictionary using the comprehension syntax for this exercise. In this case, the comprehension is called a dict comprehension.

Recall that the main difference between a list comprehension and a dict comprehension is the use of curly braces {} instead of []. Additionally, members of the dictionary are created using a colon :, as in <key> : <value>.

You are given a list of strings fellowship and, using a dict comprehension, create a dictionary with the members of the list as the keys and the length of each string as the corresponding values.
    
#1 Instructions:
    1. Create a dict comprehension where the key is a string in fellowship and the value is the length of the string. Remember to use the syntax <key> : <value> in the output expression part of the comprehension to create the members of the dictionary. Use member as the iterator variable.    

In [9]:
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
new_fellowship = {member:len(member) for member in fellowship}
print(new_fellowship)

{'frodo': 5, 'samwise': 7, 'merry': 5, 'aragorn': 7, 'legolas': 7, 'boromir': 7, 'gimli': 5}


                                Introduction to generator expressions
                                

In [28]:
a = [num * 2 for num in range(0,10)]
b = (num * 2 for num in range(0,10))
print(a)
print(type(b))


print(next(b))

def num_sequence(n):
    i = 0
    while i <n:
        yield i
        i += 1
result = num_sequence(3)
print(result)
print(next(result))
print(next(result))
print(next(result))

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
<class 'generator'>
0
<generator object num_sequence at 0x000002123B4CB048>
0
1
2


                                            Write your own generator expressions
You are familiar with what generators and generator expressions are, as well as its difference from list comprehensions. In this exercise, you will practice building generator expressions on your own.

Recall that generator expressions basically have the same syntax as list comprehensions, except that it uses parentheses () instead of brackets []; this should make things feel familiar! Furthermore, if you have ever iterated over a dictionary with .items(), or used the range() function, for example, you have already encountered and used generators before, without knowing it! When you use these functions, Python creates generators for you behind the scenes.

Now, you will start simple by creating a generator object that produces numeric values.

#1 Instruction:
    1. Create a generator object that will produce values from 0 to 30. Assign the result to result and use num as the iterator variable in the generator expression.
    2. Print the first 5 values by using next() appropriately in print().
    3. Print the rest of the values by using a for loop to iterate over the generator object.

In [30]:
result = (num for num in range(0,31))
print(next(result))
print(next(result))
print(next(result))
print(next(result))
print(next(result))
print(next(result))
for value in result:
    print(value)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


                                    Changing the output in generator expressions
Great! At this point, you already know how to write a basic generator expression. In this exercise, you will push this idea a little further by adding to the output expression of a generator expression. Because generator expressions and list comprehensions are so alike in syntax, this should be a familiar task for you!

You are given a list of strings lannister and, using a generator expression, create a generator object that you will iterate over to print its values.

#1 Instructions:
    1. Write a generator expression that will generate the lengths of each string in lannister. Use person as the iterator variable. Assign the result to lengths.
    2. Supply the correct iterable in the for loop for printing the values in the generator object.

In [32]:
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']
lengths = (len(person) for person in lannister)
print(next(lengths))

6


                                                Build a generator
In previous exercises, you've dealt mainly with writing generator expressions, which uses comprehension syntax. Being able to use comprehension syntax for generator expressions made your work so much easier!

Now, recall from the video that not only are there generator expressions, there are generator functions as well. Generator functions are functions that, like generator expressions, yield a series of values, instead of returning a single value. A generator function is defined as you do a regular function, but whenever it generates a value, it uses the keyword yield instead of return.

In this exercise, you will create a generator function with a similar mechanism as the generator expression you defined in the previous exercise:

--------------------------------------

lengths = (len(person) for person in lannister)

--------------------------------------

#1 Instructions:
    1. Complete the function header for the function get_lengths() that has a single parameter, input_list.
    2. In the for loop in the function definition, yield the length of the strings in input_list.
    3. Complete the iterable part of the for loop for printing the values generated by the get_lengths() generator function. Supply the call to get_lengths(), passing in the list lannister.


In [34]:
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

def get_lengths(input_list):
    for person in input_list:
        yield person

for value in get_lengths(lannister):
    print(value)        

cersei
jaime
tywin
tyrion
joffrey


                                    List comprehensions for time-stamped data
                                    
You will now make use of what you've learned from this chapter to solve a simple data extraction problem. You will also be introduced to a data structure, the pandas Series, in this exercise. We won't elaborate on it much here, but what you should know is that it is a data structure that you will be working with a lot of times when analyzing data from pandas DataFrames. You can think of DataFrame columns as single-dimension arrays called Series.

In this exercise, you will be using a list comprehension to extract the time from time-stamped Twitter data. The pandas package has been imported as pd and the file 'tweets.csv' has been imported as the df DataFrame for your use.

#1 Instructions:
    1. Extract the column 'created_at' from df and assign the result to tweet_time. Fun fact: the extracted column in tweet_time here is a Series data structure!
    2. Create a list comprehension that extracts the time from each row in tweet_time. Each row is a string that represents a timestamp, and you will access the 12th to 19th characters in the string to extract the time. Use entry as the iterator variable and assign the result to tweet_clock_time. Remember that Python uses 0-based indexing!

In [40]:
import pandas as pd
df = pd.read_csv('tweets.csv')
tweet_time = df['created_at']
print(tweet_time.head())
time = [time[11:19] for time in tweet_time]
print(time)

0    Tue Mar 29 23:40:17 +0000 2016
1    Tue Mar 29 23:40:17 +0000 2016
2    Tue Mar 29 23:40:17 +0000 2016
3    Tue Mar 29 23:40:17 +0000 2016
4    Tue Mar 29 23:40:17 +0000 2016
Name: created_at, dtype: object
['23:40:17', '23:40:17', '23:40:17', '23:40:17', '23:40:17', '23:40:17', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:19', '23:40:18', '23:40:18', '23:40:18', '23:40:19', '23:40:

                        Conditional list comprehensions for time-stamped data
Great, you've successfully extracted the data of interest, the time, from a pandas DataFrame! Let's tweak your work further by adding a conditional that further specifies which entries to select.

In this exercise, you will be using a list comprehension to extract the time from time-stamped Twitter data. You will add a conditional expression to the list comprehension so that you only select the times in which entry[17:19] is equal to '19'. The pandas package has been imported as pd and the file 'tweets.csv' has been imported as the df DataFrame for your use.

#1 Instructions:
    1. Extract the column 'created_at' from df and assign the result to tweet_time.
    2. Create a list comprehension that extracts the time from each row in tweet_time. Each row is a string that represents a timestamp, and you will access the 12th to 19th characters in the string to extract the time. Use entry as the iterator variable and assign the result to tweet_clock_time. Additionally, add a conditional expression that checks whether entry[17:19] is equal to '19'

In [41]:
# Extract the created_at column from df: tweet_time
tweet_time = df["created_at"]

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] == "19"]

# Print the extracted times
print(tweet_clock_time)


['23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19']
