## Penguins Predictor

Welcome to your first ipynb (interactive python notebook)

In this ipynb, you'll learn how to load data from a dataset, visualize it, and make some predictions about it.
Then, we will measure how accurate your predictions are and you can try to improve your algorithm from there.

TODO 1: Fill out your name and period
- Name: Tanner Young
- Period: 2

### Setup
To get started, we will need some libraries. Click the play button below to import our libraries. Type `pip install pandas` and `pip install plotly` in the terminal if they are not installed yet. 

In [364]:
import pandas as pd # pandas is a Python library for data analysis and manipulation
import plotly.express as px # plotly is a Python library for creating interactive and customizable data visualizations

Next, we will load our dataset. Update the dataset name below. Your dataset must be in the same directory as your ipynb.

In [365]:
data = pd.read_csv("penguins.csv") #loads our dataset
print(data.head()) #shows the first 5 rows of our data

  species     island  culmen_length_mm  culmen_depth_mm  flipper_length_mm  \
0  Adelie  Torgersen              39.1             18.7              181.0   
1  Adelie  Torgersen              39.5             17.4              186.0   
2  Adelie  Torgersen              40.3             18.0              195.0   
3  Adelie  Torgersen               NaN              NaN                NaN   
4  Adelie  Torgersen              36.7             19.3              193.0   

   body_mass_g     sex  
0       3750.0    MALE  
1       3800.0  FEMALE  
2       3250.0  FEMALE  
3          NaN     NaN  
4       3450.0  FEMALE  


# The Challenge
Predicting Peguins

Our problem: given some unidentified penguin, can we determine which species it is from the data we have?

For example - we find an unknown penguin from Dream island, with a body_mass_g of 4000g, and a culmen_length of 45. Which species is it?

## Step 1: Analyzing the Data

Let's see some information about our data now. Here are some useful commands to try

- data.columns → list of column names
- data.head()        → shows the first 5 rows
- data.tail()        → shows the last 5 rows
- data.info()        → summary of columns, types, and missing values
- data.describe()    → quick stats (mean, std, min, quartiles, max) for numeric columns
- data.shape  → number of rows, columns


In [366]:
# TODO 3 Try out the commands above
print(data.columns)


Index(['species', 'island', 'culmen_length_mm', 'culmen_depth_mm',
       'flipper_length_mm', 'body_mass_g', 'sex'],
      dtype='object')


In [367]:
#TODO 6: Find the following information about our dataset using the commands listed above 

#1. Total number of features? (aka # columns)
num_features = 7
#2. What types of data are there? (int, float, null, etc)
data_types = ["int", "string", "null"]
#3. Minimum and maximum values?
maximum_body_mass_g = 6300
minimum_body_mass_g = 2700

maximum_culmen_depth_mm = 21.5
minimum_culmen_depth_mm = 13.1

#4. How much data is there? (aka # rows)
num_samples = 344

# Finding patterns in our data

Data visualization helps us understand the full picture our data by representing it in a visual way and showing us patterns that we might not be able to see by reading through each row. 

Fill in the missing code below to generate a scatter plot of our penguins dataset.

In [368]:
#TODO 7 choose features (such as body_mass_g, island, ect for x, y, color and symbol)
fig = px.scatter(
    data, x="body_mass_g", y="flipper_length_mm", color="species"
)
fig.show()

Plotly supports a large variety of graphs. Check out their [documentation](https://plotly.com/python/) and choose another graph to visualize below

In [369]:
# TODO 8 Look at the Basic and Statistical Charts in the plotly docs https://plotly.com/python/ and choose a new graph to display. Display it below. 


fig = px.scatter(data, x="culmen_length_mm", y="culmen_depth_mm", color="species")
fig.show()

# Step 2 Choosing an Algorithm (your HW)

Now its time to create our "naieve" algorithm to predict the species of an unknown penguin.

In [370]:
 # TODO 9: Implement the function predict_penguin, which takes in features of a penguin and returns a predicted species.
 # You do not have to use ALL of the features included in the parameters. Just choose a couple to start with. 
#  You will be able to test your algorithm in the next code block.
def predict_penguin(island, culmen_length_mm, culmen_depth_mm, flipper_length_mm, body_mass_g, sex):
    if island == "Biscoe" and flipper_length_mm > 206 and body_mass_g > 3950 and culmen_depth_mm < 17.3 and culmen_length_mm > 40.9:
        return "Gentoo"
    else:
        if island == "Dream" and culmen_length_mm > 45:
            return "Chinstrap"
        else:
            return "Adelie"





# Step 3 Training (skip this time)

In this version of our naive penguins predictor we will skip this step bc we are using your algorithm instead of a real ML algorithm. 

# Step 4 Testing and Tuning

In the next step we will compute our accuracy. To compute accuracy, we divide the number of correct predictions by the number of total predictions. 

In [371]:
"""
TODO 10 Implement compute_accuracy, which takes in a list of predictions and list of correct answers. 
Comput the total number of correct predictions. 
Then compute accuracy by dividing the correct predictions by the total number of samples. 
Return the accuracy as a percentage rounded to 2 decimal points.

You may assume that the length of predictions matches the length of answers.

for example: compute_accuracy(["Gentoo", "Gentoo"], ["Gentoo","Chinstrap"]) -> 50.00
"""

def compute_accuracy(predictions, answers):
    correct = 0
    incorrect = 0

    for i in range(len(predictions)):
        if predictions[i] == answers[i]:
            correct += 1
        else:
            incorrect += 1
    
    accuracy = (correct / (correct + incorrect)) * 100
    return round(accuracy, 2)

    


In [372]:
# Let's make sure compute_accuracy is implemented correctly. If this block passes, you got it correct.
assert compute_accuracy(["Gentoo", "Gentoo"], ["Gentoo", "Chinstrap"]) == 50.00
assert compute_accuracy(["Gentoo", "Adelie", "Chinstrap"], ["Chinstrap", "Adelie", "Chinstrap"]) == 66.67

In [373]:
# Run this final block to predict the final accuracy of your algorithm
predicted_species = [predict_penguin(
            row["island"], 
            row["culmen_length_mm"], 
            row["culmen_depth_mm"], 
            row["flipper_length_mm"], 
            row["body_mass_g"], 
            row["sex"]
        ) for _, row in data.iterrows()]
actual_species = data["species"]


print(f"Your algorithm is {compute_accuracy(predicted_species, actual_species)} % accurate")

Your algorithm is 96.51 % accurate


# Reflection
Answer the three questions below in this markdown block.

Question 0: What was your final accuracy?
96.51%



Question 1: What was your strategy for predicting the penguin species?
I used the max and min ranges of the penguins to try and put the penguins in ranges. I then tweaked each individual variable to find the most optimal values.




Question 2: What are the strengths and limitations of your strategy?
The strengths is that it generally finds the best values to use for each variable, but the process takes a while due to having to guess and check consistenty.


