# Zybooks includes a case study on the Palmer Penguin dataset. Be sure to complete the interactive reading in your Zybooks before completing this notebook.

### Steps: <br>
1. Complete the Palmer Penguin Case Study (Interactive) in your Zybooks.
2. Follow the Instructions in this notebook. Click on execution arrow to the left of the code cells to execute code.
3. Questions that you need to answer will appear in a markdown or text cell. Place your answer in the cell (double click the cell to open).
4. Questions that require code will have a code cell immediately below the markdown or text cell. Enter and execute your code in the code cell, adding additional blocks for code if needed. Draw on the knowledge you have gained in Datacamp and in Zybooks to complete the code.
5. Save your work in your Google Drive (File . . . Save a copy to Drive) or you can save the notebook (File . . . Download .ipynb). Notebooks have the extension .ipynb, just the python code without the markdown can be saved as a python file with the extension .py but you will lose the markdown.
6. TURN IN A PDF: Generate a PDF by selecting File . . . Print . . . and change the destination to .PDF.

NOTE: students can experiment with generating code with AI, a feature provided in Google Colab. Be careful! You need to be able to verify the code that is generated as it is not always accurate! Be sure to leave in the documentation that shows that the code was generated.

Reference:

https://pypi.org/project/palmerpenguins/

https://github.com/allisonhorst/palmerpenguins

Pandas for Python Cheat Sheet:

https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf


### Import Necessary Libraries

In [None]:
%matplotlib inline
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import warnings
warnings.filterwarnings('ignore')
sns.set()
sns.set_style('whitegrid')

## Load the Penguins Data

In [None]:
penguins = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-28/penguins.csv')

In [None]:
# view the shape
penguins.shape

(344, 8)

In [None]:
# write the file to csv
# click on the folder in the left sidebar to see the file
# select the three dots to download the file locally
penguins.to_csv('penguins2.csv')

In [None]:
# view the first 10 rows
penguins.head(10)

In [None]:
# use value counts to count the number of rows with each unique value
penguins.species.value_counts()

# Question 1:
Use value counts to count the number of rows with unique values for the "island" column.

In [None]:
#Question 1 code


In [None]:
# count missing values
print(penguins.isna().sum())

In [None]:
# use info to count missing values
penguins.info()

In [None]:
# use describe to get basic statistical information on the dataframe
penguins.describe()

In [None]:
# select a subset of the dataframe
island_sex = penguins[["island", "sex"]]
island_sex.head()

In [None]:
# select rows 3 and 4, just the bill_length_mm and bill_depth_mm columns
penguins[['bill_length_mm','bill_depth_mm']][3:5]

# Question 2:
Enter code below to select rows 10, 11 and 12, just the island and sex.

In [None]:
# Question 2 Code


In [None]:
# Filter records based on a condition
penguins[penguins['body_mass_g'] > 6000]

# Question 3:
Enter code below to filter just the rows where island is equal to Biscoe.

In [None]:
# Question 3 code


In [None]:
# Filter with && and == operators
bodymass = penguins["body_mass_g"] < 3400
sexm = penguins["sex"] == "male"
penguins[bodymass & sexm]

In [None]:
# What percentage of penguins are on Island "Dream"?
penguins['island'].value_counts(normalize=True)*100

In [None]:
# use "Group by" to get the mean flipper_length_mm by sex and species
penguins.groupby(["sex", "species"])["flipper_length_mm"].mean()

# Question 4:
Enter the code below to use "Group by" to get the mean bill_length_mm by Island and species

In [None]:
# Question 4 Code


In [None]:
# use Group By with describe
penguins.groupby(['sex','island']).describe()

# Question 5:
Place the code in the cell below to Use group by with describe to gain insight on the year and island

In [None]:
# Question 5 code


# DATA VISUALIZATION
Examine the examples below for data visualization of the penguins data. Review examples in DataCamp as well. Question 6 will ask you to generate your own interesting data visualizations for the penguin data.

In [None]:
plt.figure(figsize = [10,7])
g = sns.boxplot(x = 'island',
            y ='body_mass_g',
            hue = 'species',
            data = penguins,
            palette=['#FF8C00','#159090','#A034F0'],
            linewidth=0.3)
g.set_xlabel('Island')
g.set_ylabel('Body Mass')



In [None]:
g = sns.lmplot(x="flipper_length_mm",
               y="body_mass_g",
               hue="species",
               height=7,
               data=penguins,
               palette=['#FF8C00','#159090','#A034F0'])
g.set_xlabels('Flipper Length')
g.set_ylabels('Body Mass')

In [None]:
# heat map of the penguins data
sns.heatmap(penguins.corr(), annot=True)


# Question 6:
Using the code cells below labed Visualization 1 through Visualization 5, create 5 additional visualizations for the Penguin data. The final question asks for a summary of the findings for the Penguins data based on the exporation in this notebook and your visualizations. Present five findings in narrative form, for example, "Based on body mass and flipper length, Adelie and Chinstrap are similar, where Gentoo tends to have a larger body mass and flipper length."

In [None]:
# Question 6 Visualization 1


In [None]:
# Question 6 Visualization 2


In [None]:
# Question 6 Visualization 3


In [None]:
# Question 6 Visualization 4


In [None]:
# Question 6 Visualization 5


####Question 6 Narrative:
In this markdown or text cell, explain what you have learned about the Penguin data based on the exploration in this notebook.




# Finishing Up and Submitting Your Work:
1) Save your work - you can download the .ipynb file (it can be reopened), and save it to your google drive.
2) Use File . . .Print . . PDF to generate a PDF version of your notebook (make sure all cells have been executed and show output). Turn in the PDF version of your notebook for our class assignment.

This notebook can be added to a Github repo that showcases your work for class.