# Python Data Science Toolbox (Part 2)

## Course Description
In this second course in the Python Data Science Toolbox, you'll continue to build your Python Data Science skills. First you'll enter the wonderful world of iterators, objects that you have already encountered in the context of for loops without having necessarily known it. You'll then learn about list comprehensions, which are extremely handy tools that form a basic component in the toolbox of all modern Data Scientists working in Python. You'll end the course by working through a case study in which you'll apply all of the techniques you learned both in this course as well as the prequel. If you're looking to make it as a Pythonista Data Science ninja, you have come to the right place.

# Iterators in Pythonland

Iterators, Iterables, List Comprehensions and Generators.  Essential components in the Pythonista Data Science toolbox

an iterable is an object that can return an iterator, while an iterator is an object that keeps state and produces the next value when you call next() on it. In this exercise, you will identify which object is an iterable and which is an iterator.



### Iterators vs. Iterables

Iterable
- Examples: List, Strings, Dictionaries, File connections
- An object with an associated iter() method
- applying iter() to an iterable creates an iterator

Iterator
- Produces next value with next ()
- use the splat *


#### Iterators in for loop

In [5]:
#iterator for loop

employees = ['nick', 'lore', 'huge']

for employee in employees:
    print(employee)

nick
lore
huge


In [6]:
for letter in 'Datacamp':
    print(letter)

D
a
t
a
c
a
m
p


In [7]:
for i in range(4):
    print(i)

0
1
2
3


Iter() and next()

To create an iterator from an interable, all we need to do is use the function iter() and pass it to the iterable.  

Once the iterator is defined, we pass it to the function next()

Calling next() again returns the next value, until it throws a 'stop iteration error'

Star or Splat Iterator

In [13]:
word = 'Da'
it = iter(word)

In [14]:
next (it)

'D'

In [15]:
next(it)

'a'

In [16]:
next(it)

StopIteration: 

### * star or splat iterator

Iterating at once with *

unpacks all elements of an iterator or an iterable

In [18]:
word = 'Data'
it = iter(word)
print(*it)

D a t a


### Iteratating over Dictionaries

To iterate over the key, value pairs in a dictionary, you need to Unpack them using the items() method.

In [22]:
pythonistas = {'huge': 'bowne-anderson', 
               'francis': 'castro'}

for key, value in pythonistas.items():
    print(key, value)

huge bowne-anderson
francis castro


### Iterating over File Connections

With files, you can see how to use the iter() and next() methods to return the lines from a file, file.txt

In [23]:
file = open('file.txt')

it = iter(file)

print(next(it))

print(next(it))

FileNotFoundError: [Errno 2] No such file or directory: 'file.txt'

In [2]:
# Create a list of strings: flash
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']

# Print each list item in flash using a for loop
for person in flash:
    print(person)


jay garrick
barry allen
wally west
bart allen


In [3]:
# Create a list of strings: flash
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']

# Print each list item in flash using a for loop
for person in flash:
    print(person)


# Create an iterator for flash: superspeed
superspeed = iter(flash)

# Print each item from the iterator
print(next(superspeed))
print(next(superspeed))
print(next(superspeed))
print(next(superspeed))

jay garrick
barry allen
wally west
bart allen
jay garrick
barry allen
wally west
bart allen


#### Iter for Range()

One of the things you learned about in this chapter is that not all iterables are actual lists. A couple of examples that we looked at are strings and the use of the range() function. In this exercise, we will focus on the range() function.

You can use range() in a for loop as if it's a list to be iterated over:

for i in range(5):
    print(i)
Recall that range() doesn't actually create the list; instead, it creates a range object with an iterator that produces the values until it reaches the limit (in the example, until the value 4). If range() created the actual list, calling it with a value of 10100
10
100
 may not work, especially since a number as big as that may go over a regular computer's memory. The value 10100
10
100
 is actually what's called a Googol which is a 1 followed by a hundred 0s. That's a huge number!
 
 Your task for this exercise is to show that calling range() with 10100
10
100
 won't actually pre-create the list

In [4]:
# Create an iterator for range(3): small_value
small_value = iter(range(3))

# Print the values in small_value
print(next(small_value))
print(next(small_value))
print(next(small_value))

# Loop over range(3) and print the values
for num in range(3):
    print(num)


# Create an iterator for range(10 ** 100): googol
googol = iter(range(10 ** 100))

# Print the first 5 values from googol
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))

0
1
2
0
1
2
0
1
2
3
4


#### Iters for List() and Sum() arguments
Iterators as function arguments

You've been using the iter() function to get an iterator object, as well as the next() function to retrieve the values one by one from the iterator object.

There are also functions that take iterators as arguments. For example, the list() and sum() functions return a list and the sum of elements, respectively.

In this exercise, you will use these functions by passing an iterator from range() and then printing the results of the function calls.

In [8]:
values = list(range(10, 20))
print(values)

[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]


In [9]:
list(values)

[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [10]:
# Create a range object: values
values = range(10, 21)

# Print the range object
print(values)

# Create a list of integers: values_list
values_list = list(values)

# Print values_list
print(values_list)

# Get the sum of values: values_sum
values_sum = sum(values)

# Print values_sum
print(values_sum)

range(10, 21)
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
165


# Enumerate()

Enumerate is a function that takes any iterable as argument, such as a list, and returns a special enumerate object, which consists of pairs

enumerate() returns an enumerate object that produces a sequence of tuples, and each of the tuples is an index-value pair.

It is default behaviour of enumerate to begin indexing at 0

In [23]:
# Returns a special enumerage object, 
# which consists of pairs containing the elements
# of the orginal iterable, along with their index
# within the iterable

avengers = ['hawkeye', 'iron', 'thor', 'quicksilver']

e = enumerate(avengers)

print(type(e))

<class 'enumerate'>


In [24]:
# use the function list to turn this enumerate object
# into a list of tupes and print it

e_list = list(e)
print(e_list)

[(0, 'hawkeye'), (1, 'iron'), (2, 'thor'), (3, 'quicksilver')]


In [25]:
# enumerage object itself is also an iterable
# and we can loop over it
# while unpacking its elements 
# using the clause for index, value in 
# enumerage(avengers)

avengers = ['hawkeye', 'iron', 'thor', 'quicksilver']

for index, value in enumerate(avengers):
    print(index, value)

0 hawkeye
1 iron
2 thor
3 quicksilver


In [26]:
# change the index value from 0 to something else
# use the start argument in the function

avengers = ['hawkeye', 'iron', 'thor', 'quicksilver']

for index, value in enumerate(avengers, start=10):
    print(index, value)
    
    

10 hawkeye
11 iron
12 thor
13 quicksilver


In [1]:
# Create a list of strings: mutants
mutants = ['charles xavier', 
            'bobby drake', 
            'kurt wagner', 
            'max eisenhardt', 
            'kitty pryde']

# Create a list of tuples: mutant_list
mutant_list = list(enumerate(mutants))

# Print the list of tuples
print(mutant_list)

# Unpack and print the tuple pairs
for index1, value1 in enumerate(mutants):
    print(index1, value1)

# Change the start index
for index2, value2 in enumerate(mutants, start=1):
    print(index2, value2)

[(0, 'charles xavier'), (1, 'bobby drake'), (2, 'kurt wagner'), (3, 'max eisenhardt'), (4, 'kitty pryde')]
0 charles xavier
1 bobby drake
2 kurt wagner
3 max eisenhardt
4 kitty pryde
1 charles xavier
2 bobby drake
3 kurt wagner
4 max eisenhardt
5 kitty pryde


# Using Zip()

accepts an arbitrary number of iterables and returns an iterator of tuples.

Using 2 Lists

Another interesting function that you've learned is zip(), which takes any number of iterables and returns a zip object that is an iterator of tuples. If you wanted to print the values of a zip object, you can convert it into a list and then print it. Printing just a zip object will not return the values unless you unpack it first. In this exercise, you will explore this for yourself



In [29]:
# Zipping two lists creates a zip object
# which is an iterator of tuples

avengers = ['hawkeye', 'iron', 'thor', 'quicksilver']

names = ['barton', 'stark', 'odinson', 'maximoff']

z = zip(avengers, names)

print(type(z))

<class 'zip'>


In [30]:
# turn the zip object into a list
# and then print the list

z_list = list(z)
print(z_list)

[('hawkeye', 'barton'), ('iron', 'stark'), ('thor', 'odinson'), ('quicksilver', 'maximoff')]


In [31]:
# use for loop to iterate over the zip object 
# and print the tuples

avengers = ['hawkeye', 'iron', 'thor', 'quicksilver']

names = ['barton', 'stark', 'odinson', 'maximoff']

for z1, z2 in zip(avengers, names):
    print(z1, z2)

hawkeye barton
iron stark
thor odinson
quicksilver maximoff


In [33]:
# use *splat operator to print all the elements

avengers = ['hawkeye', 'iron', 'thor', 'quicksilver']

names = ['barton', 'stark', 'odinson', 'maximoff']

z = zip(avengers, names)

print(*z)

('hawkeye', 'barton') ('iron', 'stark') ('thor', 'odinson') ('quicksilver', 'maximoff')


In [2]:
mutants = ['charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pryde']
aliases = ['prof x', 'iceman', 'nightcrawler', 'magneto', 'shadowcat']
powers = ['telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility']

# Create a list of tuples: mutant_data
mutant_data = list(zip(mutants, aliases, powers))

# Print the list of tuples
print(mutant_data)

# Create a zip object using the three lists: mutant_zip
mutant_zip = zip(mutants, aliases, powers)

# Print the zip object
print(mutant_zip)

# Unpack the zip object and print the tuple values
for value1, value2, value3 in zip(mutants, aliases, powers):
    print(value1, value2, value3)

[('charles xavier', 'prof x', 'telepathy'), ('bobby drake', 'iceman', 'thermokinesis'), ('kurt wagner', 'nightcrawler', 'teleportation'), ('max eisenhardt', 'magneto', 'magnetokinesis'), ('kitty pryde', 'shadowcat', 'intangibility')]
<zip object at 0x112904908>
charles xavier prof x telepathy
bobby drake iceman thermokinesis
kurt wagner nightcrawler teleportation
max eisenhardt magneto magnetokinesis
kitty pryde shadowcat intangibility


#### "Unzip()"
Let's play around with zip() a little more. There is no unzip function for doing the reverse of what zip() does. We can, however, reverse what has been zipped together by using zip() with a little help from *! * unpacks an iterable such as a list or a tuple into positional arguments in a function call.

In this exercise, you will use * in a call to zip() to unpack the tuples produced by zip()

'Unzip' the tuples in z1 by unpacking them into positional arguments using the * operator in a zip() call. Assign the results to result1 and result2, in that order.

The last print() statements prints the output of comparing result1 to mutants and result2 to powers. Click Submit Answer to see if the unpacked result1 and result2 are equivalent to mutants and powers, respectively.

In [3]:
# Create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# Print the tuples in z1 by unpacking with *
print(*z1)

# Re-create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
result1, result2 = zip(*z1)

# Check if unpacked tuples are equivalent to original tuples
print(result1 == mutants)
print(result2 == powers)

('charles xavier', 'telepathy') ('bobby drake', 'thermokinesis') ('kurt wagner', 'teleportation') ('max eisenhardt', 'magnetokinesis') ('kitty pryde', 'intangibility')
False
False


In [5]:
z1 = zip(mutants, powers)
result1, result2 = zip(*z1)

# Check if unpacked tuples are equivalent to original tuples
print(result1)
print(result2)

('charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pryde')
('telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility')


# Using Iterators to Load Large Files into Memory

Dealing with large data: Loading data into chunks.  Perform operations, store the results, discard the chunk, then load the next chunk of data.  Iterators will be helpful!

use Panadas read.csv function to read in chunks
specify the Chunk: chunksize

Iterators to the rescue!

Things get really cool:  you're going to use an iterator to load Twitter data in chunks and perform a similar computation that you did in the prequel to this course.  Then write a function that does the same.

In [11]:
# object created by the read_csv() call is an iterable
# so i can iterate over it, using a for loop,
# in which each chunk will be a dataframe
# Compute the operation, in this case sum of column X
# append it to the list results


In [4]:
import pandas as pd

results = []

for chunk in pd.read_csv('./data/tweets.csv', chunksize=1000):
    results.append(sum(chunk['x']))
    
total = sum(results)

print(total)

KeyError: 'x'

In [13]:
import pandas as pd

total = 0

for chunk in pd.read_csv('data.csv', chunksize=1000):
    total += sum(chunk['x'])
    
print(total)

FileNotFoundError: File b'data.csv' does not exist

In this exercise, you will do just that. You will process a large csv file of Twitter data in the same way that you processed 'tweets.csv' in Bringing it all together exercises of the prequel course, but this time, working on it in chunks of 10 entries at a time.

If you are interested in learning how to access Twitter data so you can work with it on your own system, refer to Part 2 of the DataCamp course on Importing Data in Python.

In [3]:
import pandas as pd

# Initialize an empty dictionary: counts_dict
counts_dict = {}

# Iterate over the file chunk by chunk
for chunk in pd.read_csv('./data/tweets.csv', chunksize=10):

    # Iterate over the column in DataFrame
    for entry in chunk['lang']:
        if entry in counts_dict.keys():
            counts_dict[entry] += 1
        else:
            counts_dict[entry] = 1

# Print the populated dictionary
print(counts_dict)

{'en': 97, 'et': 1, 'und': 2}


#### Extracting information for large amounts of Twitter data

Great job chunking out that file in the previous exercise. You now know how to deal with situations where you need to process a very large file and that's a very useful skill to have!

It's good to know how to process a file in smaller, more manageable chunks, but it can become very tedious having to write and rewrite the same code for the same task each time. In this exercise, you will be making your code more reusable by putting your work in the last exercise in a function definition.

The pandas package has been imported as pd and the file 'tweets.csv' is in your current directory for your use.

In [4]:
# Define count_entries()
def count_entries(csv_file, c_size, colname):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    
    # Initialize an empty dictionary: counts_dict
    counts_dict = {}

    # Iterate over the file chunk by chunk
    for chunk in pd.read_csv(csv_file, chunksize=c_size):

        # Iterate over the column in DataFrame
        for entry in chunk[colname]:
            if entry in counts_dict.keys():
                counts_dict[entry] += 1
            else:
                counts_dict[entry] = 1

    # Return counts_dict
    return counts_dict

# Call count_entries(): result_counts
result_counts = count_entries('./data/tweets.csv', 10, 'lang')

# Print result_counts
print(result_counts)

{'en': 97, 'et': 1, 'und': 2}


# List Comprehensions

Creating new list of numbers
for loops are inefficient
Do this in one line of coat

The more you use this, the more you get use to reading list comprehensions.  

List Comprehension

Power of List comprehension isn't limited to just Lists.  You can write a list comprehension over any iterable, like Range()

List Comprehensions collapse for loops for buildings lists into a single line.  And the required components are 
- an iterable, 
- an iterator variable that represents the memebers of the iterable, and 
- an output expression

You know that list comprehensions can be built over iterables. All objects below can be used for list comprehsions.  Except for integers, valjean

doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson']

range(50)

underwood = 'After all, we are nothing more or less than what we choose to reveal.'

jean = '24601'

flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']

valjean = 24601

#### General Formula for List Comprehensions

[values to create for clause of original list  ]

In [28]:
nums = [12, 8, 21, 3, 16]

new_nums = [num + 1 for num in nums]

print(new_nums)

[13, 9, 22, 4, 17]


In [29]:
# List Comprehension with Range()

result = [num for num in range(11)]

print(result)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [30]:
# Nested loops using For Loops

pairs_1 = []

for num1 in range(0, 2):
    for num2 in range(6, 8):
        pairs_1.append([num1, num2])

        
print(pairs_1)

[[0, 6], [0, 7], [1, 6], [1, 7]]


In [31]:
# Using List Comprehension
# Tradeoff is readability


pairs_2 = [(num1, num2) for num1 in range(0, 2) for num2 in range(6, 8)]

print(pairs_2)

[(0, 6), (0, 7), (1, 6), (1, 7)]


How would a list comprehension that produces a list of the first character of each string in doctor look like? Note that the list comprehension uses doc as the iterator variable. What will the output be?

In [34]:
doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson']

[doc[0] for doc in doctor]

['h', 'c', 'c', 't', 'w']

In [37]:
squares = [i**2 for i in range(0, 10)]

print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


#### Nested List Comprehension - Matrix Example

One of the ways in which lists can be used are in representing multi-dimension objects such as matrices. Matrices can be represented as a list of lists in Python. For example a 5 x 5 matrix with values 0 to 4 in each row can be written as:

matrix = [[0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4]]

Your task is to recreate this matrix by using nested listed comprehensions. Recall that you can create one of the rows of the matrix with a single list comprehension. To create the list of lists, you simply have to supply the list comprehension as the output expression of the overall list comprehension:

1. In the inner list comprehension - that is, the output expression of the nested list comprehension - create a list of values from 0 to 4 using range(). Use col as the iterator variable.

2. In the iterable part of your nested list comprehension, use range() to count 5 rows - that is, create a list of values from 0 to 4. Use row as the iterator variable; note that you won't be needing this to create values in the list of lists.


### [[output expression] for iterator variable in iterable]

In [38]:
[col for col in range(0,5)]

[0, 1, 2, 3, 4]

In [39]:
# Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(0,5)] for row in range(0,5)]

# Print the matrix
for row in matrix:
    print(row)


[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]


## Conditions in Comprehensions

#### Conditions on the Iterable

An interesting mechanism in list comprehensions is that you can also create lists with values that meet only a certain condition. One way of doing this is by using conditionals on iterator variables. 

apply a conditional statement to test the iterator variable by adding an if statement in the optional predicate expression part after the for statement in the comprehension:

[ output expression for iterator variable in iterable if predicate expression ]

In [5]:
# Filter output of a list using a condition

[num **2 for num in range(10) if num %2 == 0]

[0, 4, 16, 36, 64]

In [24]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship
new_fellowship = [member for member in fellowship if len(member)>=7]

# Print the new list
print(new_fellowship)


['samwise', 'aragorn', 'legolas', 'boromir']


#### Conditions on the list comprehension on the output expression

Here, you will use an if-else statement on the output expression of the list.



In [6]:
# Using If/Else clause on the output

[num ** 2 if num %2 ==0 else 0 for num in range(10)]

[0, 0, 4, 0, 16, 0, 36, 0, 64, 0]

Example: if-else conditional statement in the output expression, create a list that keeps members of fellowship with 7 or more characters and replaces others with an empty string. Use member as the iterator variable in the list comprehension.

In [26]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship
new_fellowship = [member if len(member)>=7 else '' for member in fellowship]

# Print the new list
print(new_fellowship)

['', 'samwise', '', 'aragorn', 'legolas', 'boromir', '']


#### Create Dictionary Comprehensions

Comprehensions aren't relegated merely to the world of lists. There are many other objects you can build using comprehensions, such as dictionaries, pervasive objects in Data Science. You will create a dictionary using the comprehension syntax

- Uses Curly Braces {}
- key and value separated by ':'

Recall that the main difference between a list comprehension and a dict comprehension is the use of curly braces {} instead of []. Additionally, members of the dictionary are created using a colon :, as in <key> : <value>.



In [10]:
# create dictionary of positive:negative integers

pos_neg = {num: -num for num in range(9)}

print(pos_neg)

{0: 0, 1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7, 8: -8}


Create a dict comprehension where the key is a string in fellowship and the value is the length of the string. Remember to use the syntax <key> : <value> in the output expression part of the comprehension to create the members of the dictionary. Use member as the iterator variable.

In [28]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create dict comprehension: new_fellowship
new_fellowship = {member: len(member) for member in fellowship}

# Print the new list
print(new_fellowship)

{'frodo': 5, 'samwise': 7, 'merry': 5, 'aragorn': 7, 'legolas': 7, 'boromir': 7, 'gimli': 5}


# Generators

- List comprehension returns a list. [] 
- Generators returns a generator object ()
- Both can be iterated over

Geneator is like List Comprehension.
- except it does not store the list in memory
- it does not construct the list
- is an object we can iterate over to produce elements of the list required.
- Use () instead of []



In [40]:
# List Comprehension
[num * 2 for num in range(10)]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [41]:
# List Generator
(num * 2 for num in range(10))

<generator object <genexpr> at 0x10617be08>

In [42]:
# looping over a geneator expression produces the elements of the analogous list

result = (num for num in range(10))

for num in result:
    print(num)

0
1
2
3
4
5
6
7
8
9


#### Passing a generator to the function **list()**

In [43]:
result = (num for num in range(6))

print(list(result))

[0, 1, 2, 3, 4, 5]


#### Passing a geneator to the function **next()** in order to iterate through its elements

A form of lazy evaluation whereby the evaluation of the expression is delayed until its value is needed

This can help with large sequences as you don't want to store the entire list in memory, which is what comprehensions would do.  You want to generate elements of the sequence on the fly

In [44]:
result = (num for num in range(6))

print(next(result))

0


In [45]:
print(next(result))

1


In [46]:
print(next(result))

2


Iterate over large numbers, such as 0 - 10^million, or at least wanted to do so until another condition was satisfied

In [47]:
#  This would break the server
# [num for num in range(10**10000000)]

In [49]:
# this won't break the computer since generators don't store lists to memory
# But it does take a while to generate
# (num for num in range(10**10000000))

<generator object <genexpr> at 0x1061bb3b8>

You can apply filters and conditionals just like lists

In [50]:
even_nums = (num for num in range(10) if num %2 == 0)
print(list(even_nums))

[0, 2, 4, 6, 8]


## Generator Functions
Geneator Functions are a powerful and customizable way to create generators


Functions that produce generator objects.  
- Produces generator objects when called
- Defined like a regular function - def
- Instead of returning values using keyword 'return', Yields a sequence of values instead of returning a single value
- Generates a value with 'yield' keyword


Generator functions are functions that, like generator expressions, yield a series of values, instead of returning a single value. A generator function is defined as you do a regular function, but whenever it generates a value, it uses the keyword yield instead of return.

In [59]:
# while loop is true until i == n, 
# and the generator ceases to yield values

def num_sequences(n):
    '''Generate values from 0 to n'''
    i = 0
    while i < n:
        yield i
        i += 1

In [53]:
test = num_sequences(3)

In [54]:
print(next(test))

0


In [55]:
print(next(test))

1


In [56]:
print(next(test))

2


In [60]:
print(next(test))

StopIteration: 

In [64]:
result = num_sequences(5)

print(type(result))
for item in result:
    print(item)

<class 'generator'>
0
1
2
3
4


In [14]:
# Create a list of strings
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Define generator function get_lengths
def get_lengths(input_list):
    """Generator function that yields the
    length of the strings in input_list."""

    # Yield the length of a string
    for person in input_list:
        yield len(person)

# Print the values generated by get_lengths()
for value in get_lengths(lannister):
    print(value)


6
5
5
6
7


#### Exercise for Generator Expressions



In this exercise, you will recall the difference between list comprehensions and generators. To help with that task, the following code has been pre-loaded in the environment:

Answer-
A list comprehension produces a list as output, a generator produces a generator object.


In [1]:

# List of strings
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# List comprehension
fellow1 = [member for member in fellowship if len(member) >= 7]

# Generator expression
fellow2 = (member for member in fellowship if len(member) >= 7)


Create a generator object that will produce values from 0 to 30. Assign the result to result and use num as the iterator variable in the generator expression.
Print the first 5 values by using next() appropriately in print().
Print the rest of the values by using a for loop to iterate over the generator object

In [10]:
result = (num for num in range(31))

# Print the first 5 values
print(next(result))
print(next(result))
print(next(result))
print(next(result))
print(next(result))

# Print the rest of the values
for value in result:
    print(value)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


In [13]:
# Create a list of strings: lannister
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Create a generator object: lengths
lengths = (len(person) for person in lannister)

# Iterate over and print the values in lengths
for value in lengths:
    print(value)


6
5
5
6
7


## Wrap up List Comprehensions

Use comprehensions to use real world tweets

Structure and syntax:

###### Basic:

**[output expression for iterator variable in iterable]**

###### Advanced:

**[output expression + conditional on output for iterator variable in iterable + conditional on iterable]**



In this exercise, you will be using a list comprehension to extract the time from time-stamped Twitter data. The pandas package has been imported as pd and the file 'tweets.csv' has been imported as the df DataFrame for your use.


In [33]:

# import pandas and read tweets.csv file
import pandas as pd

df = pd.read_csv('./data/tweets.csv')

In [34]:
# Extract the created_at column from df: tweet_time
tweet_time = df.loc[:, 'created_at']

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time]

# Print the extracted times
print(tweet_clock_time)

['23:40:17', '23:40:17', '23:40:17', '23:40:17', '23:40:17', '23:40:17', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:17', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:18', '23:40:19', '23:40:18', '23:40:18', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:18', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23

In [26]:
tweet_time.head()

0    Tue Mar 29 23:40:17 +0000 2016
1    Tue Mar 29 23:40:17 +0000 2016
2    Tue Mar 29 23:40:17 +0000 2016
3    Tue Mar 29 23:40:17 +0000 2016
4    Tue Mar 29 23:40:17 +0000 2016
Name: created_at, dtype: object

In [36]:
[entry[11:19] for entry in tweet_time]

['23:40:17',
 '23:40:17',
 '23:40:17',
 '23:40:17',
 '23:40:17',
 '23:40:17',
 '23:40:18',
 '23:40:17',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:17',
 '23:40:18',
 '23:40:18',
 '23:40:17',
 '23:40:18',
 '23:40:18',
 '23:40:17',
 '23:40:18',
 '23:40:17',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:17',
 '23:40:18',
 '23:40:18',
 '23:40:17',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:19',
 '23:40:18',
 '23:40:18',
 '23:40:18',
 '23:40:19',
 '23:40:19',
 '23:40:19',
 '23:40:18',
 '23:40:19',
 '23:40:19',
 '23:40:19',
 '23:40:18',
 '23:40:19',
 '23:40:19',
 '23:40:19',
 '23:40:18',
 '23:40:19',

In this exercise, you will be using a list comprehension to extract the time from time-stamped Twitter data. You will add a conditional expression to the list comprehension so that you only select the times in which entry[17:19] is equal to '19'. The pandas package has been imported as pd and the file 'tweets.csv' has been imported as the df DataFrame for your use.

In [37]:
# Extract the created_at column from df: tweet_time
tweet_time = df['created_at']

# Extract the clock time: tweet_clock_time
tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] == '19']

# Print the extracted times
print(tweet_clock_time)

['23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19', '23:40:19']
