
# Analysing Palmer Penguins Data Set


***

![Palmer](img/penguins.png)

[Artwork by @allison_horst](https://allisonhorst.github.io/palmerpenguins/articles/art.html)

The data set is available [on GitHub](https://allisonhorst.github.io/palmerpenguins/).



## Introduction
***
This notebook contains my analysis of the famous palmer Palmer Penguins dataset. 

Palmer Penguins data set contains real-world body size measurements for three Pygoscelis penguin species that breed throughout the Western Antarctic Peninsula region, made available through the United States Long-Term Ecological Research (US LTER) Network. This data set is used as a great resource for study cases in statistics and data science education.[Source: 1]

The dataset contains data about 344 penguins collected between 2007 - 2009 by [Dr Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php) and [The Palmer Station, Antarctica LTER](https://pallter.marine.rutgers.edu/) a member of the [Long Term Ecological Research Network](https://lternet.edu/).[Source: 2]


## Imports

***

The following are the  libraries we will use to explore this dataset:
- pandas for the DataFrame data structure.
- matplotlib for plotting data.
- numpy for working with arrays
Theses libraries will allows us to investigate CSV files, amongst other features.

In [6]:
# Data frame.
import pandas as pd
# Plotting.
import matplotlib.pyplot as plt 
# Numerical arrays.
import numpy as np 

## Load Data
***
Load the palmer penguins data set from a URL

In [7]:
# Load the penguins data set
# https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv
# creating a variable 'df' to store the data from the csv file we want to read.
df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")

The data is now loaded and we can inspect it. First we check that the file is loaded correctly

In [8]:
# Checking that the file is loaded correctly.
df

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE
...,...,...,...,...,...,...,...
339,Gentoo,Biscoe,,,,,
340,Gentoo,Biscoe,46.8,14.3,215.0,4850.0,FEMALE
341,Gentoo,Biscoe,50.4,15.7,222.0,5750.0,MALE
342,Gentoo,Biscoe,45.2,14.8,212.0,5200.0,FEMALE


As we can see, from the data displayed above, this dataset has has 344 rows and 7 columns. Displayed is the header of the file, the first five rows and the last five rows. the dots in the middle tell us that more data is stored in this file.

## Inspect Data

***

Lets start by checking what data type is stored in this file.

In [9]:
# checking the data types.
df.dtypes

species               object
island                object
bill_length_mm       float64
bill_depth_mm        float64
flipper_length_mm    float64
body_mass_g          float64
sex                   object
dtype: object

In [5]:
# Look at the first row
df.iloc[0]

species                 Adelie
island               Torgersen
bill_length_mm            39.1
bill_depth_mm             18.7
flipper_length_mm        181.0
body_mass_g             3750.0
sex                       MALE
Name: 0, dtype: object

In [6]:
# Sex of penguins
df['sex']


0        MALE
1      FEMALE
2      FEMALE
3         NaN
4      FEMALE
        ...  
339       NaN
340    FEMALE
341      MALE
342    FEMALE
343      MALE
Name: sex, Length: 344, dtype: object

In [6]:
# Count the number of penguins of each sex.
df['sex'].value_counts()

sex
MALE      168
FEMALE    165
Name: count, dtype: int64

In [7]:
# Describe the data set
df.describe()

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
count,342.0,342.0,342.0,342.0
mean,43.92193,17.15117,200.915205,4201.754386
std,5.459584,1.974793,14.061714,801.954536
min,32.1,13.1,172.0,2700.0
25%,39.225,15.6,190.0,3550.0
50%,44.45,17.3,197.0,4050.0
75%,48.5,18.7,213.0,4750.0
max,59.6,21.5,231.0,6300.0


# This will be big

You *might* have a **paragraph**. 
Paragraphs have sentences. 
You might have a paragraph. 
Paragraphs have sentences.
You might have a paragraph. Paragraphs have sentences.You might have a paragraph. Paragraphs have sentences.You might have a paragraph. Paragraphs have sentences.You might have a paragraph. Paragraphs have sentences.

If you want a second paragraph, leave a blank line!You might have a paragraph. Paragraphs have sentences.You might have a paragraph. Paragraphs have sentences.You might have a paragraph. Paragraphs have sentences.You might have a paragraph. Paragraphs have sentences.You might have a paragraph. Paragraphs have sentences.You might have a paragraph. Paragraphs have sentences.You might have a paragraph. Paragraphs have sentences.

## This will be slightly smaller

- Bullet 
- Point
- Lists

### This will be slightly smaller again

1. Numbered bullets
2. Just use numbers.
3. What a mess.



## Tables

***

|Species     |Bill Length  (mm)|Body Mass (g)|
|------------|----------------:|------------:|
|Adelie      |             38.8|         3701|
|Chinstrap   |             48.8|         3733|
|Gentoo      |             47.5|         5076|


## Math

***

$f(x) = x^2$

$\sum_{i=0}^{n-1} i$

$\bar{x} = \frac{\sum_{i=0}^{n-1} x_i}{n}$

## Reference:

- [Palmer Archipelago Penguins Data][1]

- [Palmer Penguins - Allison Horst](https://allisonhorst.github.io/palmerpenguins/)

[1]: https://journal.r-project.org/articles/RJ-2022-020/

***
## End
