# List comprehensions vs. generators

You've seen from the videos that list comprehensions and generator expressions look very similar in their syntax, except for the use of parentheses () in generator expressions and brackets [] in list comprehensions.

In this exercise, you will recall the difference between list comprehensions and generators. To help with that task, the following code has been pre-loaded in the environment:



In [2]:
# List of strings
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# List comprehension
fellow1 = [member for member in fellowship if len(member) >= 7]

# Generator expression
fellow2 = (member for member in fellowship if len(member) >= 7)

print(fellow1)
print(fellow2)

['samwise', 'aragorn', 'legolas', 'boromir']
<generator object <genexpr> at 0x78f0b8b7c510>


**Write your own generator expressions**

You are familiar with what generators and generator expressions are, as well as its difference from list comprehensions. In this exercise, you will practice building generator expressions on your own.

Recall that generator expressions basically have the same syntax as list comprehensions, except that it uses parentheses () instead of brackets []; this should make things feel familiar! Furthermore, if you have ever iterated over a dictionary with .items(), or used the range() function, for example, you have already encountered and used generators before, without knowing it! When you use these functions, Python creates generators for you behind the scenes.

Now, you will start simple by creating a generator object that produces numeric values.


* Create a generator object that will produce values from 0 to 30. Assign the result to result and use num as the iterator variable in the generator expression.
* Print the first 5 values by using next() appropriately in print().
* Print the rest of the values by using a for loop to iterate over the generator object

In [6]:
# Create generator object: result
result = (num for num in range(31))

# Print the first 5 values
print(next(result))
print(next(result))
print(next(result))
print(next(result))
print(next(result))
print('----------------------')
# Print the rest of the values
for value in result:
    print(value)


0
1
2
3
4
----------------------
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


**Changing the output in generator expressions**

Great! At this point, you already know how to write a basic generator expression. In this exercise, you will push this idea a little further by adding to the output expression of a generator expression. Because generator expressions and list comprehensions are so alike in syntax, this should be a familiar task for you!

You are given a list of strings lannister and, using a generator expression, create a generator object that you will iterate over to print its values.


* Write a generator expression that will generate the lengths of each string in lannister. Use person as the iterator variable. Assign the result to lengths.
* Supply the correct iterable in the for loop for printing the values in the generator object.

In [8]:
# Create a list of strings: lannister
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey', 'ami']

# Create a generator object: lengths
lengths = (len(person) for person in lannister)

# Iterate over and print the values in lengths
for value in lengths:
    print(value)


6
5
5
6
7
3


**Build a generator**

In previous exercises, you've dealt mainly with writing generator expressions, which uses comprehension syntax. Being able to use comprehension syntax for generator expressions made your work so much easier!

Now, recall from the video that not only are there generator expressions, there are generator functions as well. Generator functions are functions that, like generator expressions, yield a series of values, instead of returning a single value. A generator function is defined as you do a regular function, but whenever it generates a value, it uses the keyword yield instead of return.

In this exercise, you will create a generator function with a similar mechanism as the generator expression you defined in the previous exercise:

lengths = (len(person) for person in lannister)

* Complete the function header for the function get_lengths() that has a single parameter, input_list.
* In the for loop in the function definition, yield the length of the strings in input_list.
* Complete the iterable part of the for loop for printing the values generated by the get_lengths() generator function. Supply the call to get_lengths(), passing in the list lannister.

In [9]:
# Create a list of strings
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Define generator function get_lengths
def get_lengths(input_list):
    """Generator function that yields the
    length of the strings in input_list."""

    # Yield the length of a string
    for person in input_list:
        yield len(person)

# Print the values generated by get_lengths()
for value in get_lengths(lannister):
    print(value)

6
5
5
6
7


**List comprehensions for time-stamped data**

You will now make use of what you've learned from this chapter to solve a simple data extraction problem. You will also be introduced to a data structure, the pandas Series, in this exercise. We won't elaborate on it much here, but what you should know is that it is a data structure that you will be working with a lot of times when analyzing data from pandas DataFrames. You can think of DataFrame columns as single-dimension arrays called Series.

In this exercise, you will be using a list comprehension to extract the time from time-stamped Twitter data. The pandas package has been imported as pd and the file 'tweets.csv' has been imported as the df DataFrame for your use.


* Extract the column 'created_at' from df and assign the result to tweet_time. Fun fact: the extracted column in tweet_time here is a Series data structure!
* Create a list comprehension that extracts the time from each row in tweet_time. Each row is a string that represents a timestamp, and you will access the 12th to 19th characters in the string to extract the time. Use entry as the iterator variable and assign the result to tweet_clock_time. Remember that Python uses 0-based indexing!

In [2]:
import pandas as pd

df = pd.read_csv('/kaggle/input/trump-tweets/trumptweets.csv')

# Extract the created_at column from df: tweet_time
tweet_time = df['date'] 

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time ]

# Print the extracted times
print(tweet_clock_time)


['20:54:25', '03:00:10', '15:38:08', '22:40:15', '16:07:28', '21:21:55', '19:38:28', '18:30:40', '16:13:13', '00:22:45', '17:00:03', '16:26:00', '19:43:39', '15:25:39', '00:29:47', '04:59:39', '18:28:34', '18:11:19', '16:42:01', '16:18:52', '20:03:34', '20:19:49', '20:21:37', '22:15:29', '16:25:36', '01:13:05', '15:26:53', '16:47:41', '15:40:38', '00:09:19', '18:51:16', '15:32:59', '01:19:56', '16:03:38', '15:40:35', '17:32:52', '17:50:31', '15:50:58', '16:50:47', '16:33:46', '15:43:22', '16:32:45', '23:12:37', '18:23:56', '15:55:34', '17:50:14', '17:28:23', '16:37:38', '16:13:17', '15:57:04', '16:31:48', '15:57:56', '22:06:10', '20:55:38', '20:39:09', '18:38:18', '19:05:08', '17:28:02', '17:58:43', '21:51:00', '17:54:42', '16:17:56', '20:57:36', '22:18:26', '15:51:32', '21:30:52', '19:21:52', '16:14:13', '19:17:47', '15:08:33', '17:26:23', '18:13:37', '21:31:25', '15:41:15', '20:51:10', '15:33:50', '16:34:20', '20:08:19', '21:23:18', '18:36:36', '19:37:43', '15:49:15', '22:49:24', '17

**Conditional list comprehensions for time-stamped data**

Great, you've successfully extracted the data of interest, the time, from a pandas DataFrame! Let's tweak your work further by adding a conditional that further specifies which entries to select.

In this exercise, you will be using a list comprehension to extract the time from time-stamped Twitter data. You will add a conditional expression to the list comprehension so that you only select the times in which entry[17:19] is equal to '19'. The pandas package has been imported as pd and the file 'tweets.csv' has been imported as the df DataFrame for your use.


* Extract the column 'created_at' from df and assign the result to tweet_time.
* Create a list comprehension that extracts the time from each row in tweet_time. Each row is a string that represents a timestamp, and you will access the 12th to 19th characters in the string to extract the time. Use entry as the iterator variable and assign the result to tweet_clock_time. Additionally, add a conditional expression that checks whether entry[17:19] is equal to '19'.

In [3]:
# Extract the created_at column from df: tweet_time
tweet_time = df['date']

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] == '19']

# Print the extracted times
print(tweet_clock_time)


['18:11:19', '00:09:19', '20:08:19', '20:55:19', '21:40:19', '21:03:19', '20:52:19', '21:07:19', '20:45:19', '21:00:19', '20:46:19', '20:51:19', '20:50:19', '18:55:19', '20:27:19', '23:10:19', '22:17:19', '23:08:19', '16:04:19', '19:14:19', '21:55:19', '21:39:19', '16:53:19', '19:20:19', '22:27:19', '19:31:19', '16:45:19', '19:22:19', '22:15:19', '18:22:19', '18:10:19', '22:50:19', '03:03:19', '23:39:19', '21:32:19', '22:40:19', '00:32:19', '16:32:19', '19:34:19', '17:07:19', '21:03:19', '18:13:19', '15:21:19', '16:29:19', '20:16:19', '15:02:19', '22:30:19', '17:42:19', '21:37:19', '22:38:19', '21:37:19', '18:00:19', '19:55:19', '15:14:19', '20:33:19', '16:40:19', '19:00:19', '21:29:19', '21:08:19', '18:59:19', '19:26:19', '22:15:19', '17:58:19', '22:18:19', '16:47:19', '16:12:19', '02:46:19', '20:47:19', '16:57:19', '22:54:19', '18:07:19', '19:08:19', '23:07:19', '22:40:19', '23:06:19', '14:57:19', '15:13:19', '16:46:19', '21:40:19', '05:43:19', '18:16:19', '23:05:19', '17:55:19', '18