# Iterators vs Iterables
Let's do a quick recall of what you've learned about iterables and iterators. Recall from the video that an iterable is an object that can return an iterator, while an iterator is an object that keeps state and produces the next value when you call next() on it. In this exercise, you will identify which object is an iterable and which is an iterator.

The environment has been pre-loaded with the variables flash1 and flash2. Try printing out their values with print() and next() to figure out which is an iterable and which is an iterator.

In [None]:
# Possible Answers
# -Both flash1 and flash2 are iterators.
# -Both flash1 and flash2 are iterables.
# -flash1 is an iterable and flash2 is an iterator.
# -Take Hint (-15xp)

'''
ANSWER
flash1 is an iterable and flash2 is an iterator.
'''

# Iterating over iterables (1)
Great, you're familiar with what iterables and iterators are! In this exercise, you will reinforce your knowledge about these by iterating over and printing from iterables and iterators.

You are provided with a list of strings flash. You will practice iterating over the list by using a for loop. You will also create an iterator for the list and access the values from the iterator.

In [None]:
# Create a list of strings: flash
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']

# Print each list item in flash using a for loop
for name in flash:
    print(name)


# Create an iterator for flash: superspeed
superhero = iter(flash)

# Print each item from the iterator
print(next(superhero))
print(next(superhero))
print(next(superhero))
print(next(superhero))

In [1]:
'''Iterating over iterables (2)

One of the things you learned about in this chapter is that not all iterables are actual lists.
A couple of examples that we looked at are strings and the use of the range() function. In this
exercise, we will focus on the range() function.
You can use range() in a for loop as if it's a list to be iterated over:
for i in range(5):
    print(i)
Recall that range() doesn't actually create the list; instead, it creates a range object with
an iterator that produces the values until it reaches the limit (in the example, until the value
4). If range() created the actual list, calling it with a value of 10^100 may not work, especially
since a number as big as that may go over a regular computer's memory. The value 10^100 is actually
what's called a Googol which is a 1 followed by a hundred 0s. That's a huge number!
Your task for this exercise is to show that calling range() with 10^100 won't actually pre-create
the list.
-Instructions
-Create an iterator object small_value over range(3) using the function iter().
-Using a for loop, iterate over range(3), printing the value for every iteration. Use num as the
loop variable.
-Create an iterator object googol over range(10 ** 100).
'''
# Create an iterator for range(3): small_value
small_value = iter(range(3))

# Print the values in small_value
print(next(small_value))
print(next(small_value))
print(next(small_value))

# Loop over range(3) and print the values
for num in range(3):
    print(num)

# Create an iterator for range(10 ** 100): googol
googol = iter(range(10 ** 100))

# Print the first 5 values from googol
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))

0
1
2
0
1
2
0
1
2
3
4


# Iterators as function arguments
You've been using the iter() function to get an iterator object, as well as the next() function to retrieve the values one by one from the iterator object.

There are also functions that take iterators and iterables as arguments. For example, the list() and sum() functions return a list and the sum of elements, respectively.

In this exercise, you will use these functions by passing an iterable from range() and then printing the results of the function calls

In [None]:
# Create a range object: values
values = range(10, 21)

# Print the range object
print(values)

# Create a list of integers: values_list
values_list = list(values)

# Print values_list
print(values_list)

# Get the sum of values: values_sum
values_sum = sum(values)

# Print values_sum
print(values_sum)

# Using enumerate
You're really getting the hang of using iterators, great job!

You've just gained several new ideas on iterators from the last video and one of them is the enumerate() function. Recall that enumerate() returns an enumerate object that produces a sequence of tuples, and each of the tuples is an index-value pair.

In this exercise, you are given a list of strings mutants and you will practice using enumerate() on it by printing out a list of tuples and unpacking the tuples using a for loop.

In [None]:
# Create a list of strings: mutants
mutants = ['charles xavier', 
            'bobby drake', 
            'kurt wagner', 
            'max eisenhardt', 
            'kitty pryde']

# Create a list of tuples: mutant_list
mutant_list = list(enumerate(mutants))

# Print the list of tuples
print(mutant_list)

# Unpack and print the tuple pairs
for index1, value1 in enumerate(mutants):
    print(index1, value1)

# Change the start index
for index2, value2 in enumerate(mutants, start=1):
    print(index2, value2)


# Using zip
Another interesting function that you've learned is zip(), which takes any number of iterables and returns a zip object that is an iterator of tuples. If you wanted to print the values of a zip object, you can convert it into a list and then print it. Printing just a zip object will not return the values unless you unpack it first. In this exercise, you will explore this for yourself.

Three lists of strings are pre-loaded: mutants, aliases, and powers. First, you will use list() and zip() on these lists to generate a list of tuples. Then, you will create a zip object using zip(). Finally, you will unpack this zip object in a for loop to print the values in each tuple. Observe the different output generated by printing the list of tuples, then the zip object, and finally, the tuple values in the for loop.

In [None]:
# Create a list of tuples: mutant_data
mutant_data = list(zip(mutants, aliases, powers))

# Print the list of tuples
print(mutant_data)

# Create a zip object using the three lists: mutant_zip
mutant_zip = zip(mutants, aliases, powers)

# Print the zip object
print(mutant_zip)

# Unpack the zip object and print the tuple values
for value1, value2, value3 in mutant_zip:
    print(value1, value2, value3)


# Using * and zip to 'unzip'
You know how to use zip() as well as how to print out values from a zip object. Excellent!

Let's play around with zip() a little more. There is no unzip function for doing the reverse of what zip() does. We can, however, reverse what has been zipped together by using zip() with a little help from *! * unpacks an iterable such as a list or a tuple into positional arguments in a function call.

In this exercise, you will use * in a call to zip() to unpack the tuples produced by zip().

Two tuples of strings, mutants and powers have been pre-loaded.

In [None]:
# Create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# Print the tuples in z1 by unpacking with *
print(*z1)

# Re-create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
result1, result2 = zip(*z1)

print(result1)
print(result2)

# Check if unpacked tuples are equivalent to original tuples
print(result1 == mutants)
print(result2 == powers)


# Processing large amounts of Twitter data
Sometimes, the data we have to process reaches a size that is too much for a computer's memory to handle. This is a common problem faced by data scientists. A solution to this is to process an entire data source chunk by chunk, instead of a single go all at once.

In this exercise, you will do just that. You will process a large csv file of Twitter data in the same way that you processed 'tweets.csv' in Bringing it all together exercises of the prequel course, but this time, working on it in chunks of 10 entries at a time.

If you are interested in learning how to access Twitter data so you can work with it on your own system, refer to Part 2 of the DataCamp course on Importing Data in Python.

The pandas package has been imported as pd and the file 'tweets.csv' is in your current directory for your use.

In [None]:
# Initialize an empty dictionary: counts_dict
counts_dict = {}

# Iterate over the file chunk by chunk
for chunk in pd.read_csv('tweets.csv', chunksize=10):

    # Iterate over the column in DataFrame
    for entry in chunk['lang']:
        if entry in counts_dict.keys():
            counts_dict[entry] += 1
        else:
            counts_dict[entry] = 1

# Print the populated dictionary
print(counts_dict)


# Extracting information for large amounts of Twitter data
Great job chunking out that file in the previous exercise. You now know how to deal with situations where you need to process a very large file and that's a very useful skill to have!

It's good to know how to process a file in smaller, more manageable chunks, but it can become very tedious having to write and rewrite the same code for the same task each time. In this exercise, you will be making your code more reusable by putting your work in the last exercise in a function definition.

The pandas package has been imported as pd and the file 'tweets.csv' is in your current directory for your use.

In [None]:
# Define count_entries()
def count_entries(csv_file, c_size, colname):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    
    # Initialize an empty dictionary: counts_dict
    counts_dict = {}

    # Iterate over the file chunk by chunk
    for chunk in pd.read_csv(csv_file, chunksize = c_size):

        # Iterate over the column in DataFrame
        for entry in chunk[colname]:
            if entry in counts_dict.keys():
                counts_dict[entry] += 1
            else:
                counts_dict[entry] = 1

    # Return counts_dict
    return counts_dict

# Call count_entries(): result_counts
result_counts = count_entries('tweets.csv', c_size=10, colname='lang')

# Print result_counts
print(result_counts)

In [2]:
# Write a basic list comprehension

# In this exercise, you will practice what you've learned from the video about writing
# list comprehensions. You will write a list comprehension and identify the output that
# will be produced.

# The following list has been pre-loaded in the environment.

# doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson']
# How would a list comprehension that produces a list of the first character of each string
# in doctor look like? Note that the list comprehension uses doc as the iterator variable.
# What will the output be?

# Possible Answers
# -The list comprehension is [for doc in doctor: doc[0]] and produces the list ['h', 'c', 'c', 't', 'w'].
# -The list comprehension is [doc[0] for doc in doctor] and produces the list ['h', 'c', 'c', 't', 'w'].
# -The list comprehension is [doc[0] in doctor] and produces the list ['h', 'c', 'c', 't', 'w'].

'''
ANSWER
The list comprehension is [doc[0] for doc in doctor] and produces the list ['h', 'c', 'c', 't', 'w'].
'''

"\nANSWER\nThe list comprehension is [doc[0] for doc in doctor] and produces the list ['h', 'c', 'c', 't', 'w'].\n"

In [3]:
# List comprehension over iterables

# You know that list comprehensions can be built over iterables. Given the following
# objects below, which of these can we build list comprehensions over?

# doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson']

# range(50)

# underwood = 'After all, we are nothing more or less than what we choose to reveal.'

# jean = '24601'

# flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']

# valjean = 24601
# Possible Answers
# -You can build list comprehensions over all the objects except the string of number characters jean.
# -You can build list comprehensions over all the objects except the string lists doctor and flash.
# -You can build list comprehensions over all the objects except range(50).
# -You can build list comprehensions over all the objects except the integer object valjean.

'''
ANSWER
You can build list comprehensions over all the objects except the integer object valjean.
'''

'\nANSWER\nYou can build list comprehensions over all the objects except the integer object valjean.\n'

# Writing list comprehensions
You now have all the knowledge necessary to begin writing list comprehensions! Your job in this exercise is to write a list comprehension that produces a list of the squares of the numbers ranging from 0 to 9.

In [4]:
# Create list comprehension: squares
squares = [i ** 2 for i in range(10)]

print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [5]:
'''
Nested list comprehensions

Great! At this point, you have a good grasp of the basic syntax of list comprehensions.
Let's push your code-writing skills a little further. In this exercise, you will be writing
a list comprehension within another list comprehension, or nested list comprehensions. It
sounds a little tricky, but you can do it!
Let's step aside for a while from strings. One of the ways in which lists can be used are
in representing multi-dimension objects such as matrices. Matrices can be represented as a
list of lists in Python. For example a 5 x 5 matrix with values 0 to 4 in each row can be
written as:
matrix = [[0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4]]
Your task is to recreate this matrix by using nested listed comprehensions. Recall that
you can create one of the rows of the matrix with a single list comprehension. To create
the list of lists, you simply have to supply the list comprehension as the output expression
of the overall list comprehension:
[[output expression] for iterator variable in iterable]
Note that here, the output expression is itself a list comprehension.
Instructions
-In the inner list comprehension - that is, the output expression of the nested list
comprehension - create a list of values from 0 to 4 using range(). Use col as the iterator
variable.
-In the iterable part of your nested list comprehension, use range() to count 5 rows - that is,
create a list of values from 0 to 4. Use row as the iterator variable; note that you won't be
needing this to create values in the list of lists.
'''

"\nNested list comprehensions\n\nGreat! At this point, you have a good grasp of the basic syntax of list comprehensions.\nLet's push your code-writing skills a little further. In this exercise, you will be writing\na list comprehension within another list comprehension, or nested list comprehensions. It\nsounds a little tricky, but you can do it!\nLet's step aside for a while from strings. One of the ways in which lists can be used are\nin representing multi-dimension objects such as matrices. Matrices can be represented as a\nlist of lists in Python. For example a 5 x 5 matrix with values 0 to 4 in each row can be\nwritten as:\nmatrix = [[0, 1, 2, 3, 4],\n          [0, 1, 2, 3, 4],\n          [0, 1, 2, 3, 4],\n          [0, 1, 2, 3, 4],\n          [0, 1, 2, 3, 4]]\nYour task is to recreate this matrix by using nested listed comprehensions. Recall that\nyou can create one of the rows of the matrix with a single list comprehension. To create\nthe list of lists, you simply have to supply

In [6]:
# Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(5)] for row in range(5)]

# Print the matrix
for row in matrix:
    print(row)


[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]


# Using conditionals in comprehensions (1)
You've been using list comprehensions to build lists of values, sometimes using operations to create these values.

An interesting mechanism in list comprehensions is that you can also create lists with values that meet only a certain condition. One way of doing this is by using conditionals on iterator variables. In this exercise, you will do exactly that!

Recall from the video that you can apply a conditional statement to test the iterator variable by adding an if statement in the optional predicate expression part after the for statement in the comprehension:

[ output expression for iterator variable in iterable if predicate expression ].

You will use this recipe to write a list comprehension for this exercise. You are given a list of strings fellowship and, using a list comprehension, you will create a list that only includes the members of fellowship that have 7 characters or more.

In [None]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship
new_fellowship = [member for member in fellowship if len(member) >= 7]

# Print the new list
print(new_fellowship)


# Using conditionals in comprehensions (2)
In the previous exercise, you used an if conditional statement in the predicate expression part of a list comprehension to evaluate an iterator variable. In this exercise, you will use an if-else statement on the output expression of the list.

You will work on the same list, fellowship and, using a list comprehension and an if-else conditional statement in the output expression, create a list that keeps members of fellowship with 7 or more characters and replaces others with an empty string. Use member as the iterator variable in the list comprehension.

In [7]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry',
              'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship
new_fellowship = [member if len(member) >= 7 else member.replace(
    member, '') for member in fellowship]

# Print the new list
print(new_fellowship)


['', 'samwise', '', 'aragorn', 'legolas', 'boromir', '']


# Dict comprehensions
Comprehensions aren't relegated merely to the world of lists. There are many other objects you can build using comprehensions, such as dictionaries, pervasive objects in Data Science. You will create a dictionary using the comprehension syntax for this exercise. In this case, the comprehension is called a dict comprehension.

Recall that the main difference between a list comprehension and a dict comprehension is the use of curly braces {} instead of []. Additionally, members of the dictionary are created using a colon :, as in <key> : <value>.

You are given a list of strings fellowship and, using a dict comprehension, create a dictionary with the members of the list as the keys and the length of each string as the corresponding values.

In [8]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create dict comprehension: new_fellowship
new_fellowship = {member: len(member) for member in fellowship}

# Print the new list
print(new_fellowship)

{'frodo': 5, 'samwise': 7, 'merry': 5, 'aragorn': 7, 'legolas': 7, 'boromir': 7, 'gimli': 5}


In [9]:
# List comprehensions vs generators


# You've seen from the videos that list comprehensions and generator expressions look
# very similar in their syntax, except for the use of parentheses () in generator expressions
# and brackets [] in list comprehensions.

# In this exercise, you will recall the difference between list comprehensions and generators.
# To help with that task, the following code has been pre-loaded in the environment:

# # List of strings
# fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# # List comprehension
# fellow1 = [member for member in fellowship if len(member) >= 7]

# # Generator expression
# fellow2 = (member for member in fellowship if len(member) >= 7)

# Try to play around with fellow1 and fellow2 by figuring out their types and printing
# out their values. Based on your observations and what you can recall from the video,
# select from the options below the best description for the difference between list
# comprehensions and generators.

# Possible Answers
# -List comprehensions and generators are not different at all; they are just different ways
# of writing the same thing.
# -A list comprehension produces a list as output, a generator produces a generator object.
# -A list comprehension produces a list as output that can be iterated over, a generator
# produces a generator object that can't be iterated over.


'''
ANSWER
A list comprehension produces a list as output, a generator produces a generator object.
'''

'\nANSWER\nA list comprehension produces a list as output, a generator produces a generator object.\n'

# Write your own generator expressions
You are familiar with what generators and generator expressions are, as well as its difference from list comprehensions. In this exercise, you will practice building generator expressions on your own.

Recall that generator expressions basically have the same syntax as list comprehensions, except that it uses parentheses () instead of brackets []; this should make things feel familiar! Furthermore, if you have ever iterated over a dictionary with .items(), or used the range() function, for example, you have already encountered and used generators before, without knowing it! When you use these functions, Python creates generators for you behind the scenes.

Now, you will start simple by creating a generator object that produces numeric values.

In [None]:
# Create generator object: result
result = (num for num in range(31))

# Print the first 5 values
print(next(result))
print(next(result))
print(next(result))
print(next(result))
print(next(result))

# Print the rest of the values
for value in result:
    print(value)

# Changing the output in generator expressions
Great! At this point, you already know how to write a basic generator expression. In this exercise, you will push this idea a little further by adding to the output expression of a generator expression. Because generator expressions and list comprehensions are so alike in syntax, this should be a familiar task for you!

You are given a list of strings lannister and, using a generator expression, create a generator object that you will iterate over to print its values.

In [None]:
# Create a list of strings: lannister
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Create a generator object: lengths
lengths = (len(person) for person in lannister)

# Iterate over and print the values in lengths
for value in lengths:
    print(value)


# Build a generator
In previous exercises, you've dealt mainly with writing generator expressions, which uses comprehension syntax. Being able to use comprehension syntax for generator expressions made your work so much easier!

Now, recall from the video that not only are there generator expressions, there are generator functions as well. Generator functions are functions that, like generator expressions, yield a series of values, instead of returning a single value. A generator function is defined as you do a regular function, but whenever it generates a value, it uses the keyword yield instead of return.

In this exercise, you will create a generator function with a similar mechanism as the generator expression you defined in the previous exercise:

In [None]:
# Create a list of strings
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Define generator function get_lengths
def get_lengths(input_list):
    """Generator function that yields the
    length of the strings in input_list."""

    # Yield the length of a string
    for person in input_list:
        yield len(person)
        
# Print the values generated by get_lengths()
for value in get_lengths(lannister):
    print(value)

# List comprehensions for time-stamped data
You will now make use of what you've learned from this chapter to solve a simple data extraction problem. You will also be introduced to a data structure, the pandas Series, in this exercise. We won't elaborate on it much here, but what you should know is that it is a data structure that you will be working with a lot of times when analyzing data from pandas DataFrames. You can think of DataFrame columns as single-dimension arrays called Series.

In this exercise, you will be using a list comprehension to extract the time from time-stamped Twitter data. The pandas package has been imported as pd and the file 'tweets.csv' has been imported as the df DataFrame for your use.

In [None]:
# Import packages
import pandas as pd

df = pd.read_csv('../_datasets/tweets.csv')

# Extract the created_at column from df: tweet_time
tweet_time = df['created_at']

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time]

# Print the extracted times
print(tweet_clock_time)



# Conditional list comprehensions for time-stamped data
Great, you've successfully extracted the data of interest, the time, from a pandas DataFrame! Let's tweak your work further by adding a conditional that further specifies which entries to select.

In this exercise, you will be using a list comprehension to extract the time from time-stamped Twitter data. You will add a conditional expression to the list comprehension so that you only select the times in which entry[17:19] is equal to '19'. The pandas package has been imported as pd and the file 'tweets.csv' has been imported as the df DataFrame for your use.

In [None]:
# Import packages
import pandas as pd

df = pd.read_csv('../_datasets/tweets.csv')

# Extract the created_at column from df: tweet_time
tweet_time = df['created_at']

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] == '19']

# Print the extracted times
print(tweet_clock_time)