# Before your start:
- Read the README.md file
- Comment as much as you can and use the resources in the README.md file
- Happy learning!

In [1]:
# importing libraries
import numpy as np
import pandas as pd

# Challenge 1 - Iterators, Generators and `yield`. 

A iterator in Python is an object that represents a stream of data. However, iterators contain a countable number of values. We traverse through the iterator and return one value at a time. All iterators support a `next` function that allows us to traverse through the iterator. We can create an iterator using the `iter` function that comes with the base package of Python. Below is an example of an iterator.

In [2]:
# We first define our iterator:

iterator = iter([1,2,3])

# We can now iterate through the object using the next function

print(next(iterator))

1


In [3]:
# We continue to iterate through the iterator.

print(next(iterator))

2


In [4]:
print(next(iterator))

3


In [5]:
# After we have iterated through all elements, we will get a StopIteration Error

print(next(iterator))

StopIteration: 

In [6]:
# We can also iterate through an iterator using a for loop like this:
# Note: we cannot go back directly in an iterator once we have traversed through the elements. 
# This is why we are redefining the iterator below

iterator = iter([1,2,3])

for i in iterator:
    print(i)

1
2
3


In the cell below, write a function that takes an iterator and returns the first element in the iterator and returns the first element in the iterator that is divisible by 2. Assume that all iterators contain only numeric data. If we have not found a single element that is divisible by 2, return zero.

In [17]:
# This function takes an iterable and returns the first element that is divisible by 2 and zero otherwise
# Input: Iterable
# Output: Integer
    
# Sample Input: iter([1,2,3])
# Sample Output: 2
    
# Your code here:
    
iterator = iter([1,2,3])

def divisible2(iterator):
    for i in iterator:
        if i%2==0:
            return(i)
        
print(divisible2(iterator))

2


### Generators

It is quite difficult to create your own iterator since you would have to implement a `next` function. Generators are functions that enable us to create iterators. The difference between a function and a generator is that instead of using `return`, we use `yield`. For example, below we have a function that returns an iterator containing the numbers 0 through n:

In [23]:
def firstn(n):
    number = 0
    while number < n:
        yield number
        number = number + 1

If we pass 5 to the function, we will see that we have a iterator containing the numbers 0 through 4.

In [24]:
iterator = firstn(5)

for i in iterator:
    print(i)

0
1
2
3
4


In the cell below, create a generator that takes a number and returns an iterator containing all even numbers between 0 and the number you passed to the generator.

In [31]:
def even_iterator(n):
    # This function produces an iterator containing all even numbers between 0 and n
    # Input: integer
    # Output: iterator
    
    # Sample Input: 5
    # Sample Output: iter([0, 2, 4])
    
    # Your code here:
    number = 0
    while number < n:
        if number%2==0:
            yield number
        number = number + 1

In [32]:
iterator = even_iterator(5)

for i in iterator:
    print(i)

0
2
4


# Challenge 2 - Applying Functions to DataFrames

In this challenge, we will look at how to transform cells or entire columns at once.

First, let's load a dataset. We will download the famous Iris classification dataset in the cell below.

In [45]:
# note the data is storaged in a website, not localy 
file_path = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

In [46]:
# import the iris object using the read_csv function from pandas
data = pd.read_csv(file_path)

In [48]:
# after importing data using pd.read_csv, it returns a pandas object

In [49]:
data.columns = ['sepal_length', 'sepal_width', 'petal_length',
           'petal_width','iris_type']

In [59]:
# checking the type of iris
data['iris_type'].head()

0    Iris-setosa
1    Iris-setosa
2    Iris-setosa
3    Iris-setosa
4    Iris-setosa
Name: iris_type, dtype: object

Let's look at the dataset using the `head` function.

In [60]:
# Your code here:
data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,iris_type
0,4.9,3.0,1.4,0.2,Iris-setosa
1,4.7,3.2,1.3,0.2,Iris-setosa
2,4.6,3.1,1.5,0.2,Iris-setosa
3,5.0,3.6,1.4,0.2,Iris-setosa
4,5.4,3.9,1.7,0.4,Iris-setosa


Next, we will write a function that converts centimeters to inches in the cell below. Recall that 1cm = 0.393701in.

In [61]:
def cm_to_in(cm):
    # This function takes in a numeric value in centimeters and converts it to inches
    # Input: numeric value
    # Output: float
    
    # Sample Input: 1.0
    # Sample Output: 0.393701
    
    # Your code here:
    inches=cm*0.393701
    return inches

In [65]:
cm_to_in(1.0)

0.393701

Now convert all columns in `iris_numeric` to inches in the cell below. We like to think of functional transformations as immutable. Therefore, save the transformed data in a dataframe called `iris_inch`.

In [93]:
iris_numeric = data[['sepal_length', 'sepal_width', 'petal_length',
           'petal_width']].apply(cm_to_in)
iris_numeric
    

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,1.929135,1.181103,0.551181,0.078740
1,1.850395,1.259843,0.511811,0.078740
2,1.811025,1.220473,0.590552,0.078740
3,1.968505,1.417324,0.551181,0.078740
4,2.125985,1.535434,0.669292,0.157480
...,...,...,...,...
144,2.637797,1.181103,2.047245,0.905512
145,2.480316,0.984253,1.968505,0.748032
146,2.559057,1.181103,2.047245,0.787402
147,2.440946,1.338583,2.125985,0.905512


## What is a generator?

### df.iterrows()
### df.iteritems()
### df.itertuples()

* https://medium.com/@rtjeannier/pandas-101-cont-9d061cb73bfc
* https://stackoverflow.com/questions/7837722/what-is-the-most-efficient-way-to-loop-through-dataframes-with-pandas
* https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6
* https://realpython.com/fast-flexible-pandas/
* https://towardsdatascience.com/different-ways-to-iterate-over-rows-in-a-pandas-dataframe-performance-comparison-dc0d5dcef8fe
* https://towardsdatascience.com/how-to-use-pandas-the-right-way-to-speed-up-your-code-4a19bd89926d
* https://data36.com/python-for-loops-explained-data-science-basics-5/

# Challenge 3 - Applying Functions to Columns

Read more about applying functions to either rows or columns [here](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html) and write a function that computes the maximum value for each row of `iris_numeric`

In [95]:
#para cada linha:
def maximum_value(iris_numeric):
    iris_numeric2 = iris_numeric.max(axis=1)
    return iris_numeric2
maximum_value(iris_numeric)

0      1.929135
1      1.850395
2      1.811025
3      1.968505
4      2.125985
         ...   
144    2.637797
145    2.480316
146    2.559057
147    2.440946
148    2.322836
Length: 149, dtype: float64

In [86]:
#para cada coluna
iris_numeric3 = iris_numeric.max()
print (iris_numeric3)

sepal_length    3.110238
sepal_width     1.732284
petal_length    2.716537
petal_width     0.984253
dtype: float64


Compute the combined lengths for each row and the combined widths for each row using a function. Assign these values to new columns `total_length` and `total_width`.

In [97]:
# Your code here:
def funcao(iris_numeric):
    iris_numeric['total_length'] = iris_numeric[['sepal_length', 'petal_length']].sum(axis=1)
    iris_numeric['total_width'] = iris_numeric[['sepal_width', 'petal_width']].sum(axis=1)
    return iris_numeric

funcao(iris_numeric)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,total_length,total_width
0,1.929135,1.181103,0.551181,0.078740,2.480316,1.259843
1,1.850395,1.259843,0.511811,0.078740,2.362206,1.338583
2,1.811025,1.220473,0.590552,0.078740,2.401576,1.299213
3,1.968505,1.417324,0.551181,0.078740,2.519686,1.496064
4,2.125985,1.535434,0.669292,0.157480,2.795277,1.692914
...,...,...,...,...,...,...
144,2.637797,1.181103,2.047245,0.905512,4.685042,2.086615
145,2.480316,0.984253,1.968505,0.748032,4.448821,1.732284
146,2.559057,1.181103,2.047245,0.787402,4.606302,1.968505
147,2.440946,1.338583,2.125985,0.905512,4.566932,2.244096
