# Iterators vs. iterables: 

$\underline{Iterables}$

Examples = lists, strings, dictionaries, file connections

Def: An object with an associated **iter()** method. 

*Applying **iter()** to an interable creates an iterator*

$\underline{Iterators}$

Produces next value with **next()**

We create an iterator by calling the function **iter()**. From there, whenever we call **next()**, it returns to us the "next" value in the iterator until we get to the end and it gives us a StopIteration error.  

We can also simultaneously return elements of an iterator by using the ***** operator. See below. 







In [2]:
#Playing with a fast iterable.

word = "DATA"

it = iter(word)

print(next(it))

print(*it)

D
A T A


In [3]:
#More fun: 

# Create a list of strings: flash
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']

# Print each list item in flash using a for loop
for person in flash: 
    print(person)


# Create an iterator for flash: superspeed
superspeed = iter(flash)



# Print each item from the iterator
print(next(superspeed))
print(next(superspeed))
print(next(superspeed))
print(next(superspeed))

jay garrick
barry allen
wally west
bart allen
jay garrick
barry allen
wally west
bart allen


# Range Objects: 

Recall that range() doesn't actually create the list; instead, it creates a range object with an iterator that produces the values until it reaches the limit (in the example, until the value 4). If range() created the actual list, calling it with a value of 10100 may not work, especially since a number as big as that may go over a regular computer's memory. The value 10100

is actually what's called a Googol which is a 1 followed by a hundred 0s. That's a huge number!

Your task for this exercise is to show that calling range() with 10100
won't actually pre-create the list.

In [4]:
# Create an iterator for range(3): small_value
small_value = iter(range(3))

# Print the values in small_value
print(next(small_value))
print(next(small_value))
print(next(small_value))

# Loop over range(3) and print the values
for num in range(3):
    print(num)

# Create an iterator for range(10 ** 100): googol
googol = iter(range(10 ** 100))

# Print the first 5 values from googol
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))


0
1
2
0
1
2
0
1
2
3
4


In [None]:
# Create a range object: values
values = range(10,21)

# Print the range object
print(values)

# Create a list of integers: values_list
values_list = list(values)

# Print values_list
print(values_list)

# Get the sum of values: values_sum
values_sum = sum(values)

# Print values_sum
print(values_sum)

# Playing with iterators: 

$\underline{enumeratre()$

**enumerate()** is  a function that takes any iterable as input. Returns an enumerate objet containing consisting of pairs containing elements of iterable and their indices. 

You can turn an enumerate object into a list, but keep in mind that it is also itself an iterable. 

The default starting index of enumeratre, by the way, is 0. However, you can change it by the kwarg "start". 

$\underline{zip()$

Zipping creates an iterator of tuples. This can be turned into a list.

Let's practice a bit below 

In [7]:
# Create a list of strings: mutants
mutants = ['charles xavier', 
            'bobby drake', 
            'kurt wagner', 
            'max eisenhardt', 
            'kitty pryde']


# Create a list of tuples: mutant_list
mutant_list = list(enumerate(mutants))

# Print the list of tuples
print(mutant_list)

print("---------")

# Unpack and print the tuple pairs
for index1, value1 in enumerate(mutants):
    print(index1, value1)

print("---------")
# Change the start index
for index2, value2 in enumerate(mutants, start=1):
    print(index2, value2)

[(0, 'charles xavier'), (1, 'bobby drake'), (2, 'kurt wagner'), (3, 'max eisenhardt'), (4, 'kitty pryde')]
---------
0 charles xavier
1 bobby drake
2 kurt wagner
3 max eisenhardt
4 kitty pryde
---------
1 charles xavier
2 bobby drake
3 kurt wagner
4 max eisenhardt
5 kitty pryde


In [None]:
# #You can zip together as many iterables as you want, by the way. Check this out, even though
# #the lists are aonly 

# # Create a list of tuples: mutant_data
# mutant_data = list(zip(mutants,aliases,powers))

# # Print the list of tuples
# print(mutant_data)

# # Create a zip object using the three lists: mutant_zip
# mutant_zip = zip(mutants,aliases,powers)

# # Print the zip object
# print(mutant_zip)

# # Unpack the zip object and print the tuple values
# for value1,value2,value3 in mutant_zip:
#     print(value1, value2, value3)


# Unzipping?

Let's play around with zip() a little more. There is no unzip function for doing the reverse of what zip() does. We can, however, reverse what has been zipped together by using zip() with a little help from *! * unpacks an iterable such as a list or a tuple into positional arguments in a function call.

In this exercise, you will use * in a call to zip() to unpack the tuples produced by zip().

Two tuples of strings, mutants and powers have been pre-loaded.

In [None]:
# # Create a zip object from mutants and powers: z1
# z1 = zip(mutants, powers)

# # Print the tuples in z1 by unpacking with *
# print(*z1)

# # Re-create a zip object from mutants and powers: z1
# z1 = zip(mutants, powers)

# # 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
# result1, result2 = zip(*z1)

# # Check if unpacked tuples are equivalent to original tuples
# print(result1 == mutants)
# print(result2 == powers)


# Using iterators to load large files into memory: 

Sometimes, the data we have to process reaches a size that is too much for a computer's memory to handle. This is a common problem faced by data scientists. A solution to this is to process an entire data source chunk by chunk, instead of a single go all at once.

In this exercise, you will do just that. You will process a large csv file of Twitter data in the same way that you processed 'tweets.csv' in Bringing it all together exercises of the prequel course, but this time, working on it in chunks of 10 entries at a time.

If you are interested in learning how to access Twitter data so you can work with it on your own system, refer to Part 2 of the DataCamp course on Importing Data in Python.

The pandas package has been imported as pd and the file 'tweets.csv' is in your current directory for your use. Go for it!

In [None]:
# # Initialize an empty dictionary: counts_dict
# counts_dict = {}

# # Iterate over the file chunk by chunk
# for chunk in pd.read_csv('tweets.csv',chunksize=10):

#     # Iterate over the column in DataFrame
#     for entry in chunk['lang']:
#         if entry in counts_dict.keys():
#             counts_dict[entry] += 1
#         else:
#             counts_dict[entry] = 1

# # Print the populated dictionary
# print(counts_dict)


In [None]:
# #Do the same as above, but with a function definition: 

# # Define count_entries()
# def count_entries(csv_file, c_size, colname):
#     """Return a dictionary with counts of
#     occurrences as value for each key."""
    
#     # Initialize an empty dictionary: counts_dict
#     counts_dict = {}

#     # Iterate over the file chunk by chunk
#     for chunk in pd.read_csv(csv_file, chunksize=c_size):

#         # Iterate over the column in DataFrame
#         for entry in chunk[colname]:
#             if entry in counts_dict.keys():
#                 counts_dict[entry] += 1
#             else:
#                 counts_dict[entry] = 1

#     # Return counts_dict
#     return counts_dict

# # Call count_entries(): result_counts
# result_counts = count_entries('tweets.csv', 10, 'lang')

# # Print result_counts
# print(result_counts)


#Out: {'und':2,'en':97,'et':1}

# List comprehensions: 

List comprehensions can help you replace complex for loops. 

It is important to know that you can write a list comprehension over any iterable! The components are as followS: 

1) Iterable

2) Iterator variable (represent members of iterable)

3) Output expressions

See below: 



In [9]:
nums  = [12,8,21,3,16]
new_nums = [num+1 for num in nums]
print(new_nums)

[13, 9, 22, 4, 17]


# You can also use list comprehensions in place of nested for loops. Eg: 

Check it out below. The only real problem starting off is readability. 



In [12]:
#Nested for loop way of doing things: 

pairs_1 = []
for num1 in range(0,2):
    for num2 in range(6,8):
        pairs_1.append((num1,num2))
        
#OR, with a list comprehension!

pairs_2 = [(num1,num2 )for num1 in range(0,2) for num2 in range(6,8)]
            
print(pairs_1)
            
print(pairs_2)

[(0, 6), (0, 7), (1, 6), (1, 7)]
[(0, 6), (0, 7), (1, 6), (1, 7)]


In [14]:
test = [col for col in range(5)]
print(test)

[0, 1, 2, 3, 4]


Let's step aside for a while from strings. One of the ways in which lists can be used are in representing multi-dimension objects such as matrices. Matrices can be represented as a list of lists in Python. For example a 5 x 5 matrix with values 0 to 4 in each row can be written as:

matrix = [[0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4]]

Your task is to recreate this matrix by using nested listed comprehensions. Recall that you can create one of the rows of the matrix with a single list comprehension. To create the list of lists, you simply have to supply the list comprehension as the output expression of the overall list comprehension:

[[output expression] for iterator variable in iterable]

Note that here, the output expression is itself a list comprehension.

In [17]:

matrix = [[col for col in range(5)] for row in range(5)]
#TRANSLATION: for each of my five rows, spit out the list [0,1,2,3,4]

# Print the matrix
for row in matrix:
    print(row)

[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]


# Advanced Comprehensions: 

We can run conditionals on the iterable and dalso run dictionary comprehensions. Let's check these out below:

In [18]:
#conditional 1: on the iterable

[num**2 for num in range(10) if num % 2 ==0]

[0, 4, 16, 36, 64]

In [21]:
#conditional 2: on the output expression

[num**2 if num %2 ==0 else 0 for num in range(10)]



[0, 0, 4, 0, 16, 0, 36, 0, 64, 0]

In [22]:
# A dictionary comprehension: 

#The differences here are as follows: 
#1) We use curly braces {} instead of brackets [] 
#2) We separate keys and values with a colon. Eg: 

pos_neg = {num:-num for num in range(9)}
print(pos_neg)

{0: 0, 1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7, 8: -8}


You've been using list comprehensions to build lists of values, sometimes using operations to create these values.

An interesting mechanism in list comprehensions is that you can also create lists with values that meet only a certain condition. One way of doing this is by using conditionals on iterator variables. In this exercise, you will do exactly that!

Recall from the video that you can apply a conditional statement to test the iterator variable by adding an if statement in the optional predicate expression part after the for statement in the comprehension:

[ output expression for iterator variable in iterable if predicate expression ].

You will use this recipe to write a list comprehension for this exercise. You are given a list of strings fellowship and, using a list comprehension, you will create a list that only includes the members of fellowship that have 7 characters or more.

In [23]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship
new_fellowship = [member for member in fellowship if len(member)>=7]

# Print the new list
print(new_fellowship)

['samwise', 'aragorn', 'legolas', 'boromir']


In the previous exercise, you used an if conditional statement in the predicate expression part of a list comprehension to evaluate an iterator variable. In this exercise, you will use an if-else statement on the output expression of the list.

You will work on the same list, fellowship and, using a list comprehension and an if-else conditional statement in the output expression, create a list that keeps members of fellowship with 7 or more characters and replaces others with an empty string. Use member as the iterator variable in the list comprehension.

In [24]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship
new_fellowship = [member if len(member)>=7 else '' for member in fellowship]

# Print the new list
print(new_fellowship)

['', 'samwise', '', 'aragorn', 'legolas', 'boromir', '']


Dict comprehensions

Comprehensions aren't relegated merely to the world of lists. There are many other objects you can build using comprehensions, such as dictionaries, pervasive objects in Data Science. You will create a dictionary using the comprehension syntax for this exercise. In this case, the comprehension is called a dict comprehension.

Recall that the main difference between a list comprehension and a dict comprehension is the use of curly braces {} instead of []. Additionally, members of the dictionary are created using a colon :, as in <key> : <value>.

You are given a list of strings fellowship and, using a dict comprehension, create a dictionary with the members of the list as the keys and the length of each string as the corresponding values.

In [25]:
# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create dict comprehension: new_fellowship
new_fellowship = {member:len(member) for member in fellowship}

# Print the new list
print(new_fellowship)


{'frodo': 5, 'samwise': 7, 'merry': 5, 'aragorn': 7, 'legolas': 7, 'boromir': 7, 'gimli': 5}


# Introduction to Generator Expressions:

If you simply replace the square brackets in your list comprehension with round parentheses, you create something called a generator object. 

What the hell is this doing? A generator is like a list comprehension, but it does not store the list in memory. It does not construct the list but it is an object we can loop over in order to produce required objects. 

Let's say that we wanted to loop over an absolutely enormous list. You may not be able to actually create and store the list, but if you use a generator, you will indeed be able to iterate over it a later point. 

**Generator functions**

These produce generator objects when called and are defined like a regular function. **BUT**, they yield a sequence of values instead of returning a single value. And they generate each successive value with the keyword **yield**. 

Let's check out an example of a generator function below: 



In [30]:
def num_sequence(n):
    """Generate values from 0 to n"""
    i = 0
    while i < n:
        yield i 
        i += 1
result = num_sequence(5)
print(type(result))

print("----------")

[print(item) for item in result ]



<class 'generator'>
----------
0
1
2
3
4


[None, None, None, None, None]

In [32]:
# List of strings
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# List comprehension
fellow1 = [member for member in fellowship if len(member) >= 7]

# Generator expression
fellow2 = (member for member in fellowship if len(member) >= 7)

print(type(fellow1))
print(type(fellow2))

<class 'list'>


In [36]:
# Create generator object: result
result = (num for num in range(0,31))

# Print the first 5 values
print(next(result))
print(next(result))
print(next(result))
print(next(result))
print(next(result))

print("-------------")

# Print the rest of the values
for value in result:
    print(value)

0
1
2
3
4
-------------
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


In [37]:
# Create a list of strings: lannister
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Create a generator object: lengths
lengths = (len(person) for person in lannister)

# Iterate over and print the values in lengths
for value in lengths:
    print(value)

6
5
5
6
7


In [39]:
#Playing with generator functions:

# Create a list of strings
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Define generator function get_lengths
def get_lengths(input_list):
    """Generator function that yields the
    length of the strings in input_list."""

    # Yield the length of a string
    for person in input_list:
        yield(len(person))

# Print the values generated by get_lengths()

for value in get_lengths(lannister):
    print(value)

6
5
5
6
7


{1: [4, 3], 3: 5, 4: 6, 5: 7}
