## Palmer Penguins
***
This notebook contains my analysis of the famous Palmer Penguins dataset.

The data set is available [on GitHub](https://allisonhorst.github.io/palmerpenguins/).

![Penguins](https://allisonhorst.github.io/palmerpenguins/reference/figures/lter_penguins.png)

source: 

***

### <b>1. Import Libraries</b>

We will use pandas for the DataFrame data structure.

It allows us to investigate CSV files amongst other features.

In [3]:
# Data frames.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

***

### <b>2. Load Data</b>

We will now load the palmer penguin data set, using a URL.

In [4]:
# Load the penguins data set.
df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")

The data is now loaded and we can inspect it.

In [5]:
# Let's have a look at the first 10 rows of data
df.head(10)

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE
5,Adelie,Torgersen,39.3,20.6,190.0,3650.0,MALE
6,Adelie,Torgersen,38.9,17.8,181.0,3625.0,FEMALE
7,Adelie,Torgersen,39.2,19.6,195.0,4675.0,MALE
8,Adelie,Torgersen,34.1,18.1,193.0,3475.0,
9,Adelie,Torgersen,42.0,20.2,190.0,4250.0,


***

### <b>3. Inspect Data</b>

#### <i>3.1. Data Types</i>

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 344 entries, 0 to 343
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   species            344 non-null    object 
 1   island             344 non-null    object 
 2   bill_length_mm     342 non-null    float64
 3   bill_depth_mm      342 non-null    float64
 4   flipper_length_mm  342 non-null    float64
 5   body_mass_g        342 non-null    float64
 6   sex                333 non-null    object 
dtypes: float64(4), object(3)
memory usage: 18.9+ KB


#### <i>3.2. Count Values</i>

In [7]:
df.count()

species              344
island               344
bill_length_mm       342
bill_depth_mm        342
flipper_length_mm    342
body_mass_g          342
sex                  333
dtype: int64

This count allows us to quickly check if there's 'null' or missing values associated to any of the penguins attributes.

#### <i>3.3. First Row</i>

In [8]:
# We can now look at the first row.
df.iloc[0]

species                 Adelie
island               Torgersen
bill_length_mm            39.1
bill_depth_mm             18.7
flipper_length_mm        181.0
body_mass_g             3750.0
sex                       MALE
Name: 0, dtype: object

#### <i>3.4. Sex</i>

In [9]:
# Count the number of penguins of each sex.
df['sex'].value_counts()

sex
MALE      168
FEMALE    165
Name: count, dtype: int64

#### <i>3.5. Descriptive Statistics</i>

In [10]:
# Describe the data set.
df.describe()

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
count,342.0,342.0,342.0,342.0
mean,43.92193,17.15117,200.915205,4201.754386
std,5.459584,1.974793,14.061714,801.954536
min,32.1,13.1,172.0,2700.0
25%,39.225,15.6,190.0,3550.0
50%,44.45,17.3,197.0,4050.0
75%,48.5,18.7,213.0,4750.0
max,59.6,21.5,231.0,6300.0


#### <i>3.6. Counting Unique Values</i>

Now we will inspect the values in each of the columns, and get a count

In [11]:
for val in df:
    print(df[val].value_counts())
    print()

species
Adelie       152
Gentoo       124
Chinstrap     68
Name: count, dtype: int64

island
Biscoe       168
Dream        124
Torgersen     52
Name: count, dtype: int64

bill_length_mm
41.1    7
45.2    6
39.6    5
50.5    5
50.0    5
       ..
35.6    1
36.8    1
43.1    1
38.5    1
49.9    1
Name: count, Length: 164, dtype: int64

bill_depth_mm
17.0    12
18.6    10
17.9    10
15.0    10
18.5    10
        ..
13.2     1
14.9     1
21.5     1
20.2     1
17.4     1
Name: count, Length: 80, dtype: int64

flipper_length_mm
190.0    22
195.0    17
187.0    16
193.0    15
210.0    14
191.0    13
215.0    12
197.0    10
196.0    10
185.0     9
220.0     8
198.0     8
208.0     8
216.0     8
212.0     7
186.0     7
181.0     7
189.0     7
230.0     7
192.0     7
184.0     7
199.0     6
213.0     6
188.0     6
214.0     6
217.0     6
222.0     6
201.0     6
219.0     5
209.0     5
218.0     5
221.0     5
203.0     5
194.0     5
180.0     5
178.0     4
225.0     4
228.0     4
202.0     4
200.

***

### <b>4. Sorting Data</b>

We can sort the source data by specific variables to make it easy to read and scan.

In [15]:
# Sort by 'Island' (display first 5 rows)
df.sort_values("island").head(5)

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
343,Gentoo,Biscoe,49.9,16.1,213.0,5400.0,MALE
108,Adelie,Biscoe,38.1,17.0,181.0,3175.0,FEMALE
109,Adelie,Biscoe,43.2,19.0,197.0,4775.0,MALE
110,Adelie,Biscoe,38.1,16.5,198.0,3825.0,FEMALE
111,Adelie,Biscoe,45.6,20.3,191.0,4600.0,MALE


### <b>5. Summary Statistics</b>

IMPORTANT! Need to come back to this as it's returning an error

***

### <b>6. Subset Variables</b>

Selecting a subset of the 'Palmer Penguin' dataset.

In [18]:
island_sex = df[["island", "sex"]]

In [19]:
island_sex

Unnamed: 0,island,sex
0,Torgersen,MALE
1,Torgersen,FEMALE
2,Torgersen,FEMALE
3,Torgersen,
4,Torgersen,FEMALE
...,...,...
339,Biscoe,
340,Biscoe,FEMALE
341,Biscoe,MALE
342,Biscoe,FEMALE


***

# Headline 1

*Lorem ipsum dolor sit amet*, consectetur adipiscing elit. Quisque purus lacus, consectetur ut dui ac, porta ultricies nunc. Curabitur vulputate ex eu dictum pretium. 

**Cras dignissim urna et orci tempus gravida vitae non ligula**. Duis ut fringilla massa, quis porta sem. Donec nisl massa, malesuada id elit a, rutrum consequat nunc. Ut sodales, ipsum non lobortis suscipit, ex sapien elementum mauris, ut tempor justo mi vel eros. 

## Bulleted List

- <b>Lorem ipsum dolor sit amet</b>, consectetur adipiscing elit.
- <b>Nulla vel quam vel lectus</b> scelerisque tincidunt eget id lectus.
- <b>Phasellus facilisis</b> sapien ornare ornare lobortis.
- <b>Nulla id elit ultrices</b>, volutpat turpis pellentesque, pulvinar dolor.

### Numbered List

1. <u>Morbi et sem vitae</u> lorem volutpat interdum.
2. <u>Etiam vehicula purus</u> sit amet placerat ullamcorper.

#### Quotations

> Things get done only if the data we gather can inform and inspire those in a position to make a difference. **(Dr. Mike Schmoker, Author)**

## Tables

***

| Species   |Bill Length (mm) | Body Mass (g) |
|-----------|----------------:|--------------:|
| Adelie    |             38.8|          3701 |
| Chinstrap |             48.8|          3733 |
| Gentoo    |             47.5|          5076 |

## Math
***
$f(x) - x^2$

$\sum_{i=0}^{n-1} i$

$\bar{x} = \frac{\sum_{i=0}^{n-1} x_i}{n}$

***

### End
