# Palmer Penguins

The Palmer penguins dataset by Allison Horst, Alison Hill, and Dr. Kristen Gorman contains a number of measurements for three different species of penguins located in the Palmer Archipelago, Antartica. Dr. Kristen Gorman gathered the data between 2007 and 2009 with the Palmer Station Long Term Ecological Research Program. More information about the dataset and its official documentation is available [here](https://allisonhorst.github.io/palmerpenguins/index.html).

This is my analaysis of the Palmer penguins dataset using python.

![Palmer Station, Antartica](https://storage.googleapis.com/plos-corpus-prod/10.1371/journal.pone.0090081/1/pone.0090081.g001.PNG_M?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=wombat-sa%40plos-prod.iam.gserviceaccount.com%2F20240427%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240427T182357Z&X-Goog-Expires=86400&X-Goog-SignedHeaders=host&X-Goog-Signature=accc3d97e65db9ffb837f0283e2db3adfe6c5a4b13c2785475d476095451ecc9a4079a0fee22797464d184907552e72fae0361067207d5e8ebd037cb98b2025f714567505de1a96830f488a433f78e084c0f11d3c82109e2c15e85d09c8c33d0794846bf868cff83e93512c7c884740d532c2621d2a67872e52dd42a41bad8f79fd73d4a237a7f5b6d97c998d762131f943c25a85511b1297767ec7099d4e9b3e14bd15b5536f1e3ff1218ea9241c10675341ab4178d9a237850cf0ba4bb8eb4e5044abf81a1d43f34a97a1e918652894666cae58b62d9be7cbb2664d5eda93cc9270b91a5bed0b2f2c4c7ac25e1f1598bff1e875ec0fa24791bb1b756243b8d)

## Import the Python Modules

I have used python along with the python modules [pandas](https://pandas.pydata.org/), [numpy](https://numpy.org/) and [matplotlib.pyplot](https://matplotlib.org/stable/tutorials/pyplot.html) to analyse and plot the data.

In [None]:
# Let's import the python modules.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

: 

## Load the Data
The dataset I am using for my analysis is sourced from [Michael Waskom's seaborn-data repository on GitHub](https://github.com/mwaskom/seaborn-data/blob/master/penguins.csv).

In [None]:
# Let's load the penguins dataset.
penguins = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")

: 

## First Look at the Data

Now that the data is loaded, we can take our first look at what it contains.

From below we can see that the dataset consists of 344 rows and 7 columns. Each row representing an individual penguin and each column representing a different variable.

There are three categorical variables, including the species (Adelie, Chinstrap, Gentoo), the island (Torgersen, Biscoe, Dream) and the sex. And there are four numerical variables, including the bill length, bill depth and flipper length, each measured in millimeters, and the body mass, measured in grams.

In [None]:
# Let's take a look at the dataset.
penguins

: 

In [None]:
# Let's take a look at the different species and islands.
species_names = penguins['species'].unique()
island_names = penguins['island'].unique()

# Show.
print(f"Species: {species_names} \n Islands: {island_names}")

: 

Below shows some decriptive statistics of the numerical values in the dataset.

In [None]:
# Describe the dataset
penguins.describe()

: 

## Data Types

In [None]:
# Data types
penguins.dtypes

: 

In [None]:
# Count the number of penguins in each species.
penguins['species'].value_counts()

: 

In [None]:
# Bar chart of the count of penguins in each species.
species = penguins['species'].value_counts()

colours = ['tab:blue', 'tab:green', 'tab:orange']

fig, ax = plt.subplots()
ax.bar(species.index, species, color=colours)

plt.show()

: 

In [None]:
# Count the number of penguins in each island.
penguins['island'].value_counts()

: 

In [None]:
# Bar chart of the count of penguins in each island.
island = penguins['island'].value_counts()

fig, ax = plt.subplots()
ax.bar(island.index, island)

plt.show()

: 

In [None]:
# Count the number of penguins in each sex.
penguins['sex'].value_counts()

: 

In [None]:
# Bar chart of the count of penguins in each sex.
sex = penguins['sex'].value_counts()

fig, ax = plt.subplots()
ax.bar(sex.index, sex)

plt.show()

: 

: 

In [None]:
x = penguins.groupby(['island', 'species'])['species'].value_counts()
biscoe = penguins[penguins['island'] == "Biscoe"]['species'].value_counts()
dream = penguins[penguins['island'] == "Dream"]['species'].value_counts()
torgersen = penguins[penguins['island'] == "Torgersen"]['species'].value_counts()

print(x, biscoe, dream, torgersen)

: 

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1,3,sharey=True,sharex=True)

ax1.bar(biscoe.index, biscoe)
ax1.set_ylabel("Number of penguins")
ax1.set_title("Biscoe")

ax2.bar(dream.index, dream)
ax2.set_xlabel("Species")
ax2.set_title("Dream")

ax3.bar(torgersen.index, torgersen)
ax3.set_title("Torgersen")

fig.set_figwidth(8)
#fig.tight_layout()
fig.align_xlabels()
fig.suptitle("Number of penguin species per Island")

plt.show()

: 

In [None]:
body_mass_male = penguins[penguins['sex'] == "MALE"]['body_mass_g'].to_numpy()
body_mass_female = penguins[penguins['sex'] == "FEMALE"]['body_mass_g'].to_numpy()


fig, ax = plt.subplots()
ax.hist(body_mass_male, edgecolor='black')
ax.hist(body_mass_female, edgecolor='black')
plt.show()


: 

In [None]:
body_mass_species = penguins.groupby(['species'])['body_mass_g'].mean()

body_mass_species

: 

In [None]:
body_mass_adelie = penguins[penguins['species'] == "Adelie"]['body_mass_g'].to_numpy()
body_mass_chinstrap = penguins[penguins['species'] == "Chinstrap"]['body_mass_g'].to_numpy()
body_mass_gentoo = penguins[penguins['species'] == "Gentoo"]['body_mass_g'].to_numpy()

fig, ax = plt.subplots()
ax.hist(body_mass_adelie, edgecolor='black')
ax.hist(body_mass_chinstrap, edgecolor='black')
ax.hist(body_mass_gentoo, edgecolor='black')
plt.show()

: 

In [None]:
bill_length = penguins["bill_length_mm"].to_numpy()
bill_depth = penguins["bill_depth_mm"].to_numpy()
flipper_length = penguins["flipper_length_mm"].to_numpy()
body_mass = penguins["body_mass_g"].to_numpy()

fig, ax = plt.subplots()

ax.plot(bill_length, bill_depth, "x")

plt.show()


: 

In [None]:
sns.pairplot(penguins, hue = 'species')

: 

***

### End