## Importing modules

In [1]:
import pandas as pd

# Introduction to Iterators

## Iterators vs Iterables

* An iterator is an object which can have the 'next' method called on it. An object through which you can cycle
* An iterable is an object that can be turned into an iterator - a list, dictionary etc.


* Iterable
    * Examples: lists, strings, dictionaries, file connections
    * An object with an associated iter()  method
    * Applying iter() to an iterable creates an iterator
* Iterator
    * Produces next value with next()

## Iterating over iterables (1)

Great, you're familiar with what iterables and iterators are! In this exercise, you will reinforce your knowledge about these by iterating over and printing from iterables and iterators.

You are provided with a list of strings flash. You will practice iterating over the list by using a for loop. You will also create an iterator for the list and access the values from the iterator.

In [4]:
# Create a list of strings: flash
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']

# Print each list item in flash using a for loop
for item in flash:
    print(item)

print('\n')

superspeed = iter(flash) # Create an iterator for flash: superspeed
print(next(superspeed))
print(next(superspeed))
print(next(superspeed))
print(next(superspeed))


jay garrick
barry allen
wally west
bart allen


jay garrick
barry allen
wally west
bart allen


## Iterating over iterables (2)

One of the things you learned about in this chapter is that not all iterables are actual lists. A couple of examples that we looked at are strings and the use of the range() function. In this exercise, we will focus on the range() function.

You can use range() in a for loop as if it's a list to be iterated over:

    for i in range(5):
        print(i)

Recall that range() doesn't actually create the list; instead, it creates a range object with an iterator that produces the values until it reaches the limit (in the example, until the value 4). If range() created the actual list, calling it with a value of 10<sup>100</sup> may not work, especially since a number as big as that may go over a regular computer's memory. The value 10<sup>100</sup> is actually what's called a Googol which is a 1 followed by a hundred 0s. That's a huge number!

Your task for this exercise is to show that calling range() with 10<sup>100</sup>
won't actually pre-create the list.

In [13]:
# Create an iterator for range(3): small_value
small_value = iter(range(3))

# Print the values in small_value
print(next(small_value))
print(next(small_value))
print(next(small_value))

# Loop over range(3) and print the values
for i in range(3):
    print(i)


# Create an iterator for range(10 ** 100): googol
googol = iter(range(10**100))

# Print the first 5 values from googol
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))
print(next(googol))

0
1
2
0
1
2
0
1
2
3
4


In [2]:
# Create an iterator for range(3): small_value
small_value = iter(range(3))

# Print the values in small_value
print(next(small_value))
print(next(small_value))
print(next(small_value))

0
1
2


## Iterators as function arguments

You've been using the iter() function to get an iterator object, as well as the next() function to retrieve the values one by one from the iterator object.

There are also functions that take iterators as arguments. For example, the list() and sum() functions return a list and the sum of elements, respectively.

In this exercise, you will use these functions by passing an iterator from range() and then printing the results of the function calls.

In [6]:
# Create a range object that would produce the values from 10 to 20: values
values = range(10,21)

# Print the range object
print(values)

# Use the list() function to create a list of values from the range object values: values_list
values_list = list(values)

# Print values_list
print(values_list)

# Use the sum() function to get the sum of the values from 10 to 20 from the range object values: values_sum
values_sum = sum(values)

# Print values_sum
print(values_sum)

range(10, 21)
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
165


# Playing with iterators

## Using enumerate

You're really getting the hang of using iterators, great job!

You've just gained several new ideas on iterators from the last video and one of them is the enumerate() function. Recall that enumerate() returns an enumerate object that produces a sequence of tuples, and each of the tuples is an index-value pair.

In this exercise, you are given a list of strings mutants and you will practice using enumerate() on it by printing out a list of tuples and unpacking the tuples using a for loop.

* [enumerate()](https://docs.python.org/3/library/functions.html#enumerate)

In [7]:
# Create a list of strings: mutants
mutants = ['charles xavier', 
            'bobby drake', 
            'kurt wagner', 
            'max eisenhardt', 
            'kitty pride']

mutant_list = list(enumerate(mutants)) # Create a list of tuples: mutant_list
print(mutant_list)
print()

for index1, value1 in enumerate(mutants): # Unpack and print the tuple pairs
    print(index1, value1)
print()

for index2, value2 in enumerate(mutants, start=1):
    print(index2, value2)

[(0, 'charles xavier'), (1, 'bobby drake'), (2, 'kurt wagner'), (3, 'max eisenhardt'), (4, 'kitty pride')]

0 charles xavier
1 bobby drake
2 kurt wagner
3 max eisenhardt
4 kitty pride

1 charles xavier
2 bobby drake
3 kurt wagner
4 max eisenhardt
5 kitty pride


## Using zip

Another interesting function that you've learned is zip(), which takes any number of iterables and returns a zip object that is an iterator of tuples. If you wanted to print the values of a zip object, you can convert it into a list and then print it. Printing just a zip object will not return the values unless you unpack it first. In this exercise, you will explore this for yourself.

Three lists of strings are pre-loaded: mutants, aliases, and powers. First, you will use list() and zip() on these lists to generate a list of tuples. Then, you will create a zip object using zip(). Finally, you will unpack this zip object in a for loop to print the values in each tuple. Observe the different output generated by printing the list of tuples, then the zip object, and finally, the tuple values in the for loop.

* [zip()](https://docs.python.org/3/library/functions.html#zip)

In [8]:
aliases = ['prof x', 'iceman', 'nightcrawler', 'magneto', 'shadowcat']
mutants = ['charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pride']
powers = ['telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility']

mutant_data = list(zip(mutants))
print(mutant_data)
print()

mutant_zip = zip(mutants, aliases, powers)
print(mutant_zip)
print()

for value1, value2, value3 in mutant_zip:
    print(value1, value2, value3)

[('charles xavier',), ('bobby drake',), ('kurt wagner',), ('max eisenhardt',), ('kitty pride',)]

<zip object at 0x000001A32379D7C8>

charles xavier prof x telepathy
bobby drake iceman thermokinesis
kurt wagner nightcrawler teleportation
max eisenhardt magneto magnetokinesis
kitty pride shadowcat intangibility


## Using * and zip to 'unzip'

You know how to use zip() as well as how to print out values from a zip object. Excellent!

Let's play around with zip() a little more. There is no unzip function for doing the reverse of what zip() does. We can, however, reverse what has been zipped together by using zip() with a little help from *! * unpacks an iterable such as a list or a tuple into positional arguments in a function call.

In this exercise, you will use * in a call to zip() to unpack the tuples produced by zip().

Two tuples of strings, mutants and powers have been pre-loaded.

In [13]:
mutants = ('charles xavier',  'bobby drake',  'kurt wagner',  'max eisenhardt',  'kitty pride')
powers = ('telepathy',  'thermokinesis',  'teleportation',  'magnetokinesis',  'intangibility')

# Create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# Print the tuples in z1 by unpacking with *
print(*z1)
print()

# Re-create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
result1, result2 = zip(*z1)
print(result1)
print(result2)
print()

# Check if unpacked tuples are equivalent to original tuples
print(result1 == mutants)
print(result2 == powers)


('charles xavier', 'telepathy') ('bobby drake', 'thermokinesis') ('kurt wagner', 'teleportation') ('max eisenhardt', 'magnetokinesis') ('kitty pride', 'intangibility')

('charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pride')
('telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility')

True
True


# Using iterators to load large files into memory

## Processing large amounts of Twitter data

Sometimes, the data we have to process reaches a size that is too much for a computer's memory to handle. This is a common problem faced by data scientists. A solution to this is to process an entire data source chunk by chunk, instead of a single go all at once.

In this exercise, you will do just that. You will process a large csv file of Twitter data in the same way that you processed 'tweets.csv' in _[Bringing it all together](https://campus.datacamp.com/courses/python-data-science-toolbox-part-1/writing-your-own-functions?ex=12)_ exercises of the prequel course, but this time, working on it in chunks of 10 entries at a time.

If you are interested in learning how to access Twitter data so you can work with it on your own system, refer to [Part 2](https://www.datacamp.com/courses/importing-data-in-python-part-2) of the DataCamp course on Importing Data in Python.

The pandas package has been imported as pd and the file 'tweets.csv' is in your current directory for your use. Go for it!

In [16]:
# Initialize an empty dictionary: counts_dict
counts_dict = {}

# Iterate over the file chunk by chunk
for chunk in pd.read_csv('tweets.csv', chunksize=10):

    # Iterate over the column 'lang' in DataFrame
    for entry in chunk['lang']:
        if entry in counts_dict.keys():
            counts_dict[entry] += 1
        else:
            counts_dict[entry] = 1

# Print the populated dictionary
print(counts_dict)

{'en': 97, 'et': 1, 'und': 2}


## Extracting information for large amounts of Twitter data

Great job chunking out that file in the previous exercise. You now know how to deal with situations where you need to process a very large file and that's a very useful skill to have!

It's good to know how to process a file in smaller, more manageable chunks, but it can become very tedious having to write and rewrite the same code for the same task each time. In this exercise, you will be making your code more _reusable_ by putting your work in the last exercise in a _function definition_.

In [18]:
# Define count_entries()
def count_entries(csv_file, c_size, colname):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    
    # Initialize an empty dictionary: counts_dict
    counts_dict = {}

    # Iterate over the file chunk by chunk
    for chunk in pd.read_csv(csv_file, chunksize=c_size):

        # Iterate over the column in DataFrame
        for entry in chunk[colname]:
            if entry in counts_dict.keys():
                counts_dict[entry] += 1
            else:
                counts_dict[entry] = 1

    # Return counts_dict
    return counts_dict

# Call count_entries(): result_counts
result_counts = count_entries('tweets.csv',10, 'lang')

# Print result_counts
print(result_counts)

{'en': 97, 'et': 1, 'und': 2}


## Appendix: Methods
* [enumerate()](https://docs.python.org/3/library/functions.html#enumerate)
* [zip()](https://docs.python.org/3/library/functions.html#zip)