# Before your start:
- Read the README.md file
- Comment as much as you can and use the resources in the README.md file
- Happy learning!

In [3]:
# importing libraries
import numpy as np
import pandas as pd

# Challenge 1 - Iterators, Generators and `yield`. 

A iterator in Python is an object that represents a stream of data. However, iterators contain a countable number of values. We traverse through the iterator and return one value at a time. All iterators support a `next` function that allows us to traverse through the iterator. We can create an iterator using the `iter` function that comes with the base package of Python. Below is an example of an iterator.

In [4]:
# We first define our iterator:

iterator = iter([1,2,3])

# We can now iterate through the object using the next function

print(next(iterator))

1


In [5]:
# We continue to iterate through the iterator.

print(next(iterator))

2


In [6]:
print(next(iterator))

3


In [7]:
# After we have iterated through all elements, we will get a StopIteration Error

print(next(iterator))

StopIteration: 

In [8]:
# We can also iterate through an iterator using a for loop like this:
# Note: we cannot go back directly in an iterator once we have traversed through the elements. 
# This is why we are redefining the iterator below

iterator = iter([1,2,3])

for i in iterator:
    print(i)

1
2
3


In the cell below, write a function that takes an iterator and returns the first element in the iterator and returns the first element in the iterator that is divisible by 2. Assume that all iterators contain only numeric data. If we have not found a single element that is divisible by 2, return zero.

In [9]:
def divisible2(iterator):
    # This function takes an iterable and returns the first element that is divisible by 2 and zero otherwise
    # Input: Iterable
    # Output: Integer
    
    # Sample Input: iter([1,2,3])
    # Sample Output: 2
    
    # Your code here:
    even_iter = iter([i for i in iterator if i%2==0])
    return(next(even_iter))
    

In [10]:
divisible2(iter([1,2,3]))

2

### Generators

It is quite difficult to create your own iterator since you would have to implement a `next` function. Generators are functions that enable us to create iterators. The difference between a function and a generator is that instead of using `return`, we use `yield`. For example, below we have a function that returns an iterator containing the numbers 0 through n:

In [11]:
def firstn(n):
     number = 0
     while number < n:
         yield number
         number = number + 1

If we pass 5 to the function, we will see that we have a iterator containing the numbers 0 through 4.

In [12]:
iterator = firstn(5)

for i in iterator:
    print(i)

0
1
2
3
4


In the cell below, create a generator that takes a number and returns an iterator containing all even numbers between 0 and the number you passed to the generator.

In [13]:
def even_iterator(n):
    # This function produces an iterator containing all even numbers between 0 and n
    # Input: integer
    # Output: iterator
    
    # Sample Input: 5
    # Sample Output: iter([0, 2, 4])
    
    # Your code here:
    iter_n = firstn(n)
    even_numbers = iter([i for i in iter_n if i%2==0])
    return(even_numbers)

In [14]:
for i in even_iterator(5):
    print(i)

0
2
4


# Challenge 2 - Applying Functions to DataFrames

In this challenge, we will look at how to transform cells or entire columns at once.

First, let's load a dataset. We will download the famous Iris classification dataset in the cell below.

In [15]:
# note the data is storaged in a website, not localy 
file_path = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

In [16]:
# the file_path is a string, let's check it
type(file_path)

str

In [17]:
# the object column is a list!
columns = ['sepal_length', 'sepal_width', 'petal_length',
           'petal_width','iris_type']

In [18]:
# let's check it out
type(columns)

list

In [19]:
# import the iris object using the read_csv function from pandas
iris = pd.read_csv(
    # the first argument receives the data to be imported
    file_path, 
    # the second argument receives the columns' list
    names=columns)

In [20]:
# checking the type of iris
type(iris)

pandas.core.frame.DataFrame

In [21]:
# after importing data using pd.read_csv, it returs a pandas object

Let's look at the dataset using the `head` function.

In [22]:
# Your code here:
iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,iris_type
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [23]:
# the head function can also receive an argumento
# let's check only the first iris' column
iris.head(1)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,iris_type
0,5.1,3.5,1.4,0.2,Iris-setosa


In [24]:
# we can also import the same data using different methods, such as genfromtxt from numpy
# import data using numpy
iris_np = np.genfromtxt(file_path, 
                        names=True,
                       delimiter=',',
                       dtype=None)

  


In [25]:
# have you noticed that the format is differnt?
iris_np

array([(4.9, 3. , 1.4, 0.2, b'Iris-setosa'),
       (4.7, 3.2, 1.3, 0.2, b'Iris-setosa'),
       (4.6, 3.1, 1.5, 0.2, b'Iris-setosa'),
       (5. , 3.6, 1.4, 0.2, b'Iris-setosa'),
       (5.4, 3.9, 1.7, 0.4, b'Iris-setosa'),
       (4.6, 3.4, 1.4, 0.3, b'Iris-setosa'),
       (5. , 3.4, 1.5, 0.2, b'Iris-setosa'),
       (4.4, 2.9, 1.4, 0.2, b'Iris-setosa'),
       (4.9, 3.1, 1.5, 0.1, b'Iris-setosa'),
       (5.4, 3.7, 1.5, 0.2, b'Iris-setosa'),
       (4.8, 3.4, 1.6, 0.2, b'Iris-setosa'),
       (4.8, 3. , 1.4, 0.1, b'Iris-setosa'),
       (4.3, 3. , 1.1, 0.1, b'Iris-setosa'),
       (5.8, 4. , 1.2, 0.2, b'Iris-setosa'),
       (5.7, 4.4, 1.5, 0.4, b'Iris-setosa'),
       (5.4, 3.9, 1.3, 0.4, b'Iris-setosa'),
       (5.1, 3.5, 1.4, 0.3, b'Iris-setosa'),
       (5.7, 3.8, 1.7, 0.3, b'Iris-setosa'),
       (5.1, 3.8, 1.5, 0.3, b'Iris-setosa'),
       (5.4, 3.4, 1.7, 0.2, b'Iris-setosa'),
       (5.1, 3.7, 1.5, 0.4, b'Iris-setosa'),
       (4.6, 3.6, 1. , 0.2, b'Iris-setosa'),
       (5.

In [26]:
# let's check this data type
type(iris_np)

numpy.ndarray

In [27]:
# importing data with different ways also means that we should use different methods to explore the object
# in the case below, we can't use the .head() function, from the library pandas, at a numpy array object
iris_np.head()

AttributeError: 'numpy.ndarray' object has no attribute 'head'

Let's start off by using built-in functions. Try to apply the numpy mean function and describe what happens in the comments of the code.

In [28]:
# despite we can't use the method head in a pandas dataframe, 
# we still able to call the function np.mean passing iris as an argument
np.mean(iris)

sepal_length    5.843333
sepal_width     3.054000
petal_length    3.758667
petal_width     1.198667
dtype: float64

Next, we'll apply the standard deviation function in numpy (`np.std`). Describe what happened in the comments.

In [29]:
# Your code here:
np.std(iris)

sepal_length    0.825301
sepal_width     0.432147
petal_length    1.758529
petal_width     0.760613
dtype: float64

In [30]:
# np.mean is usefull, but we can also use describe in get more metrics such as the mean, 
# standard deviation, median, and quartiles in a simple code line
iris.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


The measurements are in centimeters. Let's convert them all to inches. First, we will create a dataframe that contains only the numeric columns. Assign this new dataframe to `iris_numeric`.

In [31]:
# first method: import data again, passing the numeric columns as an argument 
numeric_columns = ['sepal_length', 'sepal_width', 'petal_length','petal_width']


In [32]:
# import the iris object using the read_csv function from pandas
iris_numeric = pd.read_csv(
    # the first argument receives the data to be imported 
    file_path, 
    # the second argument receives the columns' list -- this time only the numeric columns
    names=numeric_columns)

In [33]:
iris_numeric.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa


In [34]:
# second method: drop the categorical column 'iris_type' from pre-imported data 'iris'
iris_numeric = iris.drop(
    # select the column to be dropped
    'iris_type',
    # specify the column' axis = 1 -- default is the row axis = 0
    axis=1
)

In [35]:
iris_numeric.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [36]:
# third method: slice the data with brackets
iris_numeric = iris[numeric_columns]

In [37]:
iris_numeric.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


Next, we will write a function that converts centimeters to inches in the cell below. Recall that 1cm = 0.393701in.

In [38]:
def cm_to_in(x):
    # This function takes in a numeric value in centimeters and converts it to inches
    # Input: numeric value
    # Output: float
    
    # Sample Input: 1.0
    # Sample Output: 0.393701
    
    # Your code here:
    return(x *0.393700787)

Now convert all columns in `iris_numeric` to inches in the cell below. We like to think of functional transformations as immutable. Therefore, save the transformed data in a dataframe called `iris_inch`.


#### WARNING: the following 'for loop' code is not a good practice, we're only using it as a pedagogical  way to connect the dots from what we have learn from for loop.

##### So, let's try to iterate through each pandas's object row using a for loop:


In [40]:
# first method: iterate through each column using for loop
# note that each column from a pandas data frame is an array, and so we can use the for loop to transform each element

iris_inch_sepal_length = []
for i in iris.sepal_length:
    iris_inch_sepal_length.append(cm_to_in(i))

# we can also use for loop in a comprehension list method
iris_inch_sepal_width = [cm_to_in(i) for i in iris.sepal_width]

iris_inch_petal_length = [cm_to_in(i) for i in iris.petal_length]

iris_inch_petal_width = [cm_to_in(i) for i in iris.petal_width]

##### So we we had to iterate through each array using a for loop.
Now, we're going to create a list of lists with each for loop output

In [43]:
# then we should create a list of lists with the converted results
iris_lists = [iris_inch_sepal_length, 
               iris_inch_sepal_width, 
               iris_inch_petal_length, 
               iris_inch_petal_width]


##### Let's then create a dataframe with iris_lists
Now, we're going to create a list of lists with each for loop output

In [46]:
# after that transform the list of lists into a data frame again
df_iris = pd.DataFrame(iris_lists)

In [47]:
df_iris.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,140,141,142,143,144,145,146,147,148,149
0,2.007874,1.929134,1.850394,1.811024,1.968504,2.125984,1.811024,1.968504,1.732283,1.929134,...,2.637795,2.716535,2.283465,2.677165,2.637795,2.637795,2.480315,2.559055,2.440945,2.322835
1,1.377953,1.181102,1.259843,1.220472,1.417323,1.535433,1.338583,1.338583,1.141732,1.220472,...,1.220472,1.220472,1.062992,1.259843,1.299213,1.181102,0.984252,1.181102,1.338583,1.181102
2,0.551181,0.551181,0.511811,0.590551,0.551181,0.669291,0.551181,0.590551,0.551181,0.590551,...,2.204724,2.007874,2.007874,2.322835,2.244094,2.047244,1.968504,2.047244,2.125984,2.007874
3,0.07874,0.07874,0.07874,0.07874,0.07874,0.15748,0.11811,0.07874,0.07874,0.03937,...,0.944882,0.905512,0.748031,0.905512,0.984252,0.905512,0.748031,0.787402,0.905512,0.708661


##### As we can see, the imported data has a differente shape compared to the original dataset

* In order to follow to correct the problem, we have to transpose the values, as we can do at Excel
* To invert the data we use the method T

In [49]:
# transpose the data 
df_iris = df_iris.T

In [51]:
# showing the data transposed
df_iris.head()

Unnamed: 0,0,1,2,3
0,2.007874,1.377953,0.551181,0.07874
1,1.929134,1.181102,0.551181,0.07874
2,1.850394,1.259843,0.511811,0.07874
3,1.811024,1.220472,0.590551,0.07874
4,1.968504,1.417323,0.551181,0.07874


In [52]:
# and finally rename the columns
df_iris.columns = numeric_columns

In [53]:
df_iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,2.007874,1.377953,0.551181,0.07874
1,1.929134,1.181102,0.551181,0.07874
2,1.850394,1.259843,0.511811,0.07874
3,1.811024,1.220472,0.590551,0.07874
4,1.968504,1.417323,0.551181,0.07874


#### The following method is quite simple and more and a good practice to iterate through each pandas dataframe row

In [120]:
# method 2: use the apply function
iris_inch = iris_numeric.apply(cm_to_in)

In [121]:
iris_inch.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,2.007874,1.377953,0.551181,0.07874
1,1.929134,1.181102,0.551181,0.07874
2,1.850394,1.259843,0.511811,0.07874
3,1.811024,1.220472,0.590551,0.07874
4,1.968504,1.417323,0.551181,0.07874


We have just found that the original measurements were off by a constant. Define the global constant `error` and set it to 2. Write a function that uses the global constant and adds it to each cell in the dataframe. Apply this function to `iris_numeric` and save the result in `iris_constant`.

In [122]:
# Define constant below:
error = 2

def add_constant(x):
    # This function adds a global constant to our input.
    # Input: numeric value
    # Output: numeric value
    
    # Your code here:
    return(x + 2)

In [125]:
# now that you know that 'apply' function is able to iterate through each dataframe columns,
# would you prefere to use the first or the second method from previous exercise?


In [126]:
iris_constant = iris_numeric.apply(add_constant)

In [127]:
iris_constant.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,7.1,5.5,3.4,2.2
1,6.9,5.0,3.4,2.2
2,6.7,5.2,3.3,2.2
3,6.6,5.1,3.5,2.2
4,7.0,5.6,3.4,2.2


## What is a generator?

### df.iterrows()
### df.iteritems()
### df.itertuples()

* https://medium.com/@rtjeannier/pandas-101-cont-9d061cb73bfc
* https://stackoverflow.com/questions/7837722/what-is-the-most-efficient-way-to-loop-through-dataframes-with-pandas
* https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6
* https://realpython.com/fast-flexible-pandas/
* https://towardsdatascience.com/different-ways-to-iterate-over-rows-in-a-pandas-dataframe-performance-comparison-dc0d5dcef8fe
* https://towardsdatascience.com/how-to-use-pandas-the-right-way-to-speed-up-your-code-4a19bd89926d
* https://data36.com/python-for-loops-explained-data-science-basics-5/

In [57]:
list_methods = [iris.iteritems(), 
                iris.iterrows(),
                iris.itertuples()
               ]

In [60]:
type_list_methods = [type(i) for i in list_methods]

In [61]:
type_list_methods

[generator, generator, map]

# Bonus Challenge - Applying Functions to Columns

Read more about applying functions to either rows or columns [here](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html) and write a function that computes the maximum value for each row of `iris_numeric`

In [137]:
# Your code here:

iris_numeric.apply(max)

sepal_length    7.9
sepal_width     4.4
petal_length    6.9
petal_width     2.5
dtype: float64

Compute the combined lengths for each row and the combined widths for each row using a function. Assign these values to new columns `total_length` and `total_width`.

In [162]:
# Your code here:
total_lenght = iris_numeric.apply(len)

In [163]:
total_lenght

sepal_length    150
sepal_width     150
petal_length    150
petal_width     150
dtype: int64

In [164]:
total_width = iris_numeric.apply(len, axis=1)

In [165]:
total_width

0      4
1      4
2      4
3      4
4      4
5      4
6      4
7      4
8      4
9      4
10     4
11     4
12     4
13     4
14     4
15     4
16     4
17     4
18     4
19     4
20     4
21     4
22     4
23     4
24     4
25     4
26     4
27     4
28     4
29     4
      ..
120    4
121    4
122    4
123    4
124    4
125    4
126    4
127    4
128    4
129    4
130    4
131    4
132    4
133    4
134    4
135    4
136    4
137    4
138    4
139    4
140    4
141    4
142    4
143    4
144    4
145    4
146    4
147    4
148    4
149    4
Length: 150, dtype: int64