# Building a script with functions

#### The Iris Dataset (famous testing dataset): 

1. sepal length in cm 
2. sepal width in cm 
3. petal length in cm 
4. petal width in cm 
5. class names

#### Our task: 
- Read in the [Iris dataset](https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data)
- Figure out how many classes of Iris are found in the file
- Figure out the class name and count of each Iris in the file
- Figure out the average sepal length for each class

## Psuedocode
Use commented blocks called psuedocode to help outline what you need to do.
- Start at a high level
- Then refine to be closer and closer to programming syntax
- Add more and more detail as you
- Can be used as comments on your code, saving you a step later on

### Step 1: High level outline

In [1]:
# read in file
# find unique count of iris class column
# track the name of each iris class and count occurences
# calculate average sepal length

### Step 2: If easy, solve

In [39]:
# get user argument for iris file path
import sys

in_file = "./iris.csv"

In [40]:
# read in file
import csv

f = open(in_file)  
my_file = csv.reader(f, delimiter=',')

### Step 3: If it's not easy, break it into smaller chunks until you can solve it

In [None]:
# find unique count of iris class column
    # iris class = row[4]
    # loop through each row
    # add row[4] to a list
    # take set of that list
    
# track the name of each iris class and count occurences
    # use a dictionary to store name and count info
    # loop through dictionary to track name and count

# calculate average sepal length
    # average = sum of list/length of list
    # sepal length = row[0]
    # pull out sepal length and create as list
    # get list sum
    # get list length
    # divide them
    
# turn all the above into functions

### Step 4: Solve Each Piece

In [7]:
# find unique count of iris class column
# iris class = row[4]

class_list = []
# loop through each row
for row in my_file:
    # add row[4] to a list
    class_list.append(row[4])

# take set of that list
unique_classes = set(class_list)
print(len(unique_classes))

3


In [11]:
# track the name of each iris class and count occurences  
# use a dictionary to store name and count info
iris_class_dict = {}
    
# loop through dictionary to track name and count
for row in my_file:
    class_name = row[4]

    # for every row, check if the class_name is in the dictionary
    if class_name in iris_class_dict:
        iris_class_dict[class_name] = iris_class_dict[class_name] + 1

    # if not, add it, and set the count to 1
    else:
        iris_class_dict[class_name] = 1

In [12]:
print(iris_class_dict)

{'Iris-virginica': 50, 'Iris-setosa': 50, 'Iris-versicolor': 50}


In [41]:
# calculate average sepal length
# average = sum of list/length of list
# sepal length = row[0]
# pull out sepal length and create as list
sepal_length_list = []
# loop through each row
for row in my_file:
    # add row[0] to a list
    sepal_length_list.append(float(row[0]))

# get list sum
sepal_sum = sum(sepal_length_list)
# get list length
sepal_list_length = len(sepal_length_list)
# divide them
sepal_avg = sepal_sum/sepal_list_length

In [44]:
print(sepal_sum)
print(sepal_list_length)
print(sepal_avg)

876.5000000000002
150
5.843333333333335


### Step 5: Turn into functions
- General Rule: Each function should perform a different task
    - Easier to troubleshoot
    - Easier to re-use
- Don't forget: functions need to get input parameters and return values
- Later, we will see how you can create a class to hold functions that all work for a similar purpose

In [None]:
# get user argument for iris file path
import sys

in_file = "./iris.csv"

# read in file
import csv

f = open(in_file)  
my_file = csv.reader(f, delimiter=',')

In [54]:
import sys
import csv

def read_in_csv(file_path):
    f = open(file_path)
    my_file = csv.reader(f)
    return my_file

In [55]:
now_my_file = read_in_csv("./iris.csv")

In [None]:
# find unique count of iris class column
# iris class = row[4]

class_list = []
# loop through each row
for row in my_file:
    # add row[4] to a list
    class_list.append(row[4])

# take set of that list
unique_classes = set(class_list)
print(len(unique_classes))

In [52]:
def get_column_unique(in_file, col_number):
    col_list = []
    my_file = read_in_csv(in_file)
    for row in my_file:
        col_list.append(row[col_number])
    unique_vals = set(col_list)
    unique_length = len(unique_vals)
    return unique_length    

In [64]:
get_column_unique("./iris.csv", 4)

3

In [56]:
# track the name of each iris class and count occurences  
# use a dictionary to store name and count info
iris_class_dict = {}
    
# loop through dictionary to track name and count
for row in my_file:
    class_name = row[4]

    # for every row, check if the class_name is in the dictionary
    if class_name in iris_class_dict:
        iris_class_dict[class_name] = iris_class_dict[class_name] + 1

    # if not, add it, and set the count to 1
    else:
        iris_class_dict[class_name] = 1

In [62]:
def count_things(in_file, col_number):
    my_dict = {}
    my_file = read_in_csv(in_file)
    
    # loop through dictionary to track name and count
    for row in my_file:
        key_name = row[col_number]
        
        # for every row, check if the class_name is in the dictionary
        if key_name in my_dict:
            my_dict[key_name] = my_dict[key_name] + 1
        
        # if not, add it, and set the count to 1
        else:
            my_dict[key_name] = 1
    
    return my_dict

In [63]:
count_things("./iris.csv", 4)

{'Iris-setosa': 50, 'Iris-versicolor': 50, 'Iris-virginica': 50}

## Put it all together
[Let's see how this all fits together in a script!](iris_script.py)