<h1>Iterators</h1>

The iterables are objects that are associated with the iter() method.
    Appling the iter() method to an iterable creates an iterator.


<h2>Iterator:</h2>
Are objects that have the next() method which produces the consecutive value
We can go one by one or get all values of an iterator using the '*' operator

Example:

    word = 'Data'
    it = iter(word)
    print(*it) #it will print D a t a

Once we have gotten to the end of our iterator, we have to redifine it in order to traverse it again.

<h2>Difference between an iterable and an Iterator</h2>

Iterable --> Object that can return an iterator

iterator --> Object that keeps state and produces the next value when the next() method is called.

In [1]:
example = ['test1', 'test2', 'test3', 'test4', 'test5', 'test6']
iterator = iter(example)
for i in range(5):
    print(next(iterator))

test1
test2
test3
test4
test5


range() method only creates a range object with an iterator that produces the values until it reaches the limit

<hr>

<h1>Enumerate function</h1>

To add a counter to any iterable. It takes any iterable as an argument
It returns pairs containing the elements of the original iterable, along with their index within the iterable.
The enumerate object is an iterable itself
We can modify the beginning of the index with the start argument.
Each of the tuples is an index-value pair.

    Example:
    enumerate(avengers, start=10)

<h1>Zip function</h1>
This will allow us to stitch together an arbitrary number of iterables.
It accepts an arbitrary number of iterables and returns an iterator of tuples
The first element is a tuple containing all the firsts elements in the iterables sent to the zip function

In [2]:
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver']
names = ['barton', 'stark', 'odinson', 'maximoff']
zipExample = zip(avengers, names)
print(type(zipExample))

exampleToList = list(zipExample)
print(exampleToList)

for x1, x2 in zip(avengers, names):
    print(x1, x2)

z = zip(avengers, names)
print(*z)

<class 'zip'>
[('hawkeye', 'barton'), ('iron man', 'stark'), ('thor', 'odinson'), ('quicksilver', 'maximoff')]
hawkeye barton
iron man stark
thor odinson
quicksilver maximoff
('hawkeye', 'barton') ('iron man', 'stark') ('thor', 'odinson') ('quicksilver', 'maximoff')


In [4]:
mutants = ('charles xavier', 'bobby drake', 'kurt wagner', 'max eisenhardt', 'kitty pryde')
powers = ('telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis', 'intangibility')

# Create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# Print the tuples in z1 by unpacking with *
print(*z1)
# Re-create a zip object from mutants and powers: z1
z1 = zip(mutants, powers)

# 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
result1, result2 = zip(*z1)

# Check if unpacked tuples are equivalent to original tuples
print(result1 == mutants)
print(result2 == powers)

('charles xavier', 'telepathy') ('bobby drake', 'thermokinesis') ('kurt wagner', 'teleportation') ('max eisenhardt', 'magnetokinesis') ('kitty pryde', 'intangibility')
True
True


<h1>Using iterators to load large files into memory</h1>

Use data chunks when the data to collect is too big to hold in the memory.
In order to use chunks while reading a file, we can use a pandas function called read_csv(chunksize = desired Size).
    The object created by read_csv is a iterable.

In [21]:
import pandas as pd
# Define count_entries()
def count_entries(csv_file, c_size, colname):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    
    # Initialize an empty dictionary: counts_dict
    counts_dict = {}

    # Iterate over the file chunk by chunk
    for chunk in pd.read_csv(csv_file, chunksize=c_size):

        # Iterate over the column in DataFrame
        for entry in chunk[colname]:
            if entry in counts_dict.keys():
                counts_dict[entry] += 1
            else:
                counts_dict[entry] = 1

    # Return counts_dict
    return counts_dict

# Call count_entries(): result_counts
result_counts = count_entries('tweets.csv', 10, 'lang')
# Print result_counts
print(result_counts)


{'en': 97, 'et': 1, 'und': 2}


<hr>

<h1>Optimizing list usage</h1>

It's done by list comprehension to operate a complete list wihout multiple lines.

Syntax for list comprehension:

[output expression for n in n]

    output expression --> the values that we want to create, it's the n in the for loop

<h2>Components for list comprehension</h2>
<ul>
 <li>Iterable</li>
 <li>Iterator variable (members of iterable)</li>
 <li>Output expression</li>
</ul>

In [4]:
#Example with the traditional form
nums = [12, 8, 29, 4, 13]
newNums = []

#Add 1 to every element in the nums list and append it to newNums
for num in nums:
    newNums.append(num+1)

print(newNums)

#Example with list comprehension

optimizedNums = [num + 1 for num in nums]

print(optimizedNums)
print(newNums == optimizedNums) #the output is the same


[13, 9, 30, 5, 14]
[13, 9, 30, 5, 14]
True


<h1>List comprehension and nested loops</h1>

We can use nested for loops with list comprehension but the code reading becomes a little difficult

In [6]:
#Example with the traditional form
newNums = []

#Add 1 to every element in the nums list and append it to newNums
for num in range(0, 2):
    for num2 in range(6, 8):
        newNums.append((num, num2))
print(newNums)

#Example with list comprehension

optimizedNums = [(num, num2) for num in range(0,2) for num2 in range(6,8)]

print(optimizedNums)
print(newNums == optimizedNums) #the output is the same


[(0, 6), (0, 7), (1, 6), (1, 7)]
[(0, 6), (0, 7), (1, 6), (1, 7)]
True


<h1>Creating a matrix as a list of lists from list comprehension</h1>

In [2]:
# Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(0, 5)] for row in range(0,5)]

# Print the matrix
for row in matrix:
    print(row)


[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
