## Exploring Palmer Penguins Dataset
Records of 344 penguins, collected from 3 islands in the Palmer Archipelago, Antarctica
***
![](https://stihi.ru/pics/2013/04/05/7822.jpg)

"...one more day has passed, but the penguins still couldn’t figure out who was stealing their fish..."
***
### About Dataset

**Palmer Penguins Dataset**

The Palmer Penguins dataset contains information on 344 individual penguins that lived on three islands (Biscoe, Dream or Torgersen) between the years 2007 and 2009. There are 3 different species of penguins in this dataset, collected from 3 islands in the Palmer Archipelago, Antarctica. Originally there were 3 separate datasets within the Palmer Station Long-Term Ecological Research data system. Each of these separate datasets corresponds to one of the three species of penguin: Adélie (152 penguins), Gentoo (124 penguins), and Chinstrap (68 penguins), that were combined into the single Palmer penguins dataset. 

The purpose of collected data was as part of research to study Antarctic penguins’ natural behavior and its relationship with environmental variability. This dataset presented as an alternative to widly known Iris dataset and useful for teaching data exploration/visualization.

**Open Source article**

For more in-depth information on this dataset : [Ecological sexual dimorphism and environmental variability within a community of Antarctic penguins (genus Pygoscelis)](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0090081)
By Kristen B. Gorman, Tony D. Williams, William R. Fraser, 2014 - *Published in PLoS ONE 9(3):e90081*

More about Gentoo penguins : [What makes the Gentoo penguin the world's fastest swimming bird?](https://www.earth.com/news/what-makes-the-gentoo-penguin-the-worlds-fastest-swimming-bird/) Chrissy Sexton, staff writer, www.earth.com

**Data citations**

Adélie penguins :
[Palmer Station Antarctica LTER and K. Gorman, 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Adélie penguins (Pygoscelis adeliae) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative.](https://doi.org/10.6073/pasta/98b16d7d563f265cb52372c8ca99e60f)

Gentoo penguins :
[Palmer Station Antarctica LTER and K. Gorman, 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Gentoo penguin (Pygoscelis papua) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative.](https://doi.org/10.6073/pasta/7fca67fb28d56ee2ffa3d9370ebda689)

Chinstrap penguins :
[Palmer Station Antarctica LTER and K. Gorman, 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Chinstrap penguin (Pygoscelis antarcticus) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 6. Environmental Data Initiative.](https://doi.org/10.6073/pasta/c14dfcfada8ea13a17536e73eb6fbe9e)

**Header image**

Photo by Unknown athour

### Dataset typology

For reviewing the Palmer Penguin Dataset I use Python and Pandas library. I load a dataset in CSV format.

In [3]:
# Pandas library
import pandas as pd

In [4]:
# Load the penguins dataset
df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")

**Variables**

Dataset consists of 7 colummns that contain all relevant information about 344 penguins.

In [5]:
# Dataset typology 
df

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE
...,...,...,...,...,...,...,...
339,Gentoo,Biscoe,,,,,
340,Gentoo,Biscoe,46.8,14.3,215.0,4850.0,FEMALE
341,Gentoo,Biscoe,50.4,15.7,222.0,5750.0,MALE
342,Gentoo,Biscoe,45.2,14.8,212.0,5200.0,FEMALE


There are 3 categorical variables :
- species : Adélie, Chinstrap, Gentoo
- island : Biscoe, Dream, Torgersen
- sex : female, male

There are 4 numerical variables :
- bill_length_mm : bill length (millimeters)
- bill_depth_mm : bill depth (millimeters)
- flipper_length_mm : flipper length (millimeters)
- body_mass_g : body mass (grams)


In [6]:
# look at the first row
df.iloc[0]

species                 Adelie
island               Torgersen
bill_length_mm            39.1
bill_depth_mm             18.7
flipper_length_mm        181.0
body_mass_g             3750.0
sex                       MALE
Name: 0, dtype: object

In [7]:
# Sex of penguin
df['sex']

0        MALE
1      FEMALE
2      FEMALE
3         NaN
4      FEMALE
        ...  
339       NaN
340    FEMALE
341      MALE
342    FEMALE
343      MALE
Name: sex, Length: 344, dtype: object

In [8]:
# Count the number of penguins of each sex
df['sex'].value_counts()

sex
MALE      168
FEMALE    165
Name: count, dtype: int64

In [9]:
# Describe the data set
df.describe()

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
count,342.0,342.0,342.0,342.0
mean,43.92193,17.15117,200.915205,4201.754386
std,5.459584,1.974793,14.061714,801.954536
min,32.1,13.1,172.0,2700.0
25%,39.225,15.6,190.0,3550.0
50%,44.45,17.3,197.0,4050.0
75%,48.5,18.7,213.0,4750.0
max,59.6,21.5,231.0,6300.0


***

### End