In [1]:
import pandas as pd
import numpy as np
from scipy.stats import f_oneway

In [2]:
df = pd.read_csv("penguins.csv")

<h1>Data Cleaning Procedure</h1>

In [3]:
fill_values = {
    'body_mass_g': df['body_mass_g'].mean(),
    'bill_length_mm': df['bill_length_mm'].mean(),
    'bill_depth_mm': df['bill_depth_mm'].mean(),
    'flipper_length_mm': df['flipper_length_mm'].mean(),
    'sex': df['sex'].mode()[0]
}

In [4]:
cleaned_df = df.fillna(value=fill_values)

<h1>Question No.1: What are the average bill length, bill depth, flipper length, and body mass for each unique species?
</h1>

In [5]:
cleaned_df.groupby('species')[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']].mean()

Unnamed: 0_level_0,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Adelie,38.825144,18.338495,190.025758,3703.95891
Chinstrap,48.833824,18.420588,195.823529,3733.088235
Gentoo,47.475983,14.999606,217.055768,5068.965761


<h3>Insight No.1: </h3>
<ul>
    <li><strong>Gentoo penguins are the largest of the three species.</li>
    <li><strong>Chinstrap penguins have the longest average bill length.</li>
    <li><strong>Adelie penguins have the shortest average flipper length and bill length.</strong> </li>
    <li><strong>Gentoo penguins have the narrowest bills.</strong></li>
    <li><strong>Adelie and Chinstrap penguins have very similar bill depths.</strong></li>
    <li><strong>There is a notable difference in the average body mass between the species.</strong> </li>
</ul>

<h1>Question No.2: Rank the 3 species' overall body size based on their body mass and flipper length </h1>

In [6]:
averages = cleaned_df.groupby('species')[['flipper_length_mm', 'body_mass_g']].mean()

In [7]:
averages.sort_values(by=['flipper_length_mm', 'body_mass_g'], ascending=False)

Unnamed: 0_level_0,flipper_length_mm,body_mass_g
species,Unnamed: 1_level_1,Unnamed: 2_level_1
Gentoo,217.055768,5068.965761
Chinstrap,195.823529,3733.088235
Adelie,190.025758,3703.95891


<h3>Insight No. 2: </h3>
<p>Gentoo penguins are the biggest species due to their longer flippers and body mass in comparison to two other species</p>

<h1>Question No. 3: What is the average body mass for each sex for each species?</h1>

In [8]:
cleaned_df[['species', 'sex', 'body_mass_g']].groupby(['species', 'sex'])['body_mass_g'].mean().reset_index()

Unnamed: 0,species,sex,body_mass_g
0,Adelie,female,3368.835616
1,Adelie,male,4013.629802
2,Chinstrap,female,3527.205882
3,Chinstrap,male,3938.970588
4,Gentoo,female,4679.741379
5,Gentoo,male,5411.01143


<h3>Insight No. 3: </h3>
<ul>
    <li>Gentoo penguins are the heaviest species, with both males and females having the highest average body mass.</li>
    <li>Chinstrap penguins are the lightest species, with both males and females having the lowest average body mass.</li>
    <li>Adelie penguins are in the middle, with their average body masses falling between the Chinstrap and Gentoo species.</li>
</ul>


<h1>Question No. 4: What is the difference in average flipper length between male and female penguins?<h1>

In [9]:
mean_lengths = cleaned_df.groupby('sex')['flipper_length_mm'].mean().reset_index()
male_mean = mean_lengths[mean_lengths['sex'] == 'male']['flipper_length_mm'].iloc[0]
female_mean = mean_lengths[mean_lengths['sex'] == 'female']['flipper_length_mm'].iloc[0]
male_mean - female_mean



np.float64(6.825360336680546)

<h3>Insight No. 4</h3>
<p>Male penguins generally have longer flippers on average than females.</p>
<p>The difference (about 7 mm in the Palmer Penguins dataset) is consistent with biological expectations since males tend to be slightly larger.</p>




<h1>Question No.5: What island has the highest average body mass among penguins?</h1>

In [10]:
cleaned_df.groupby('island')['body_mass_g'].mean().reset_index()


Unnamed: 0,island,body_mass_g
0,Biscoe,4712.956871
1,Dream,3712.903226
2,Torgersen,3715.899123


<h3>Insight No. 5: </h3>
<ul>
    <li>The Biscoe island has the highest average penguin body mass, at approximately 4712.96 grams.</li>
    <li>The Dream and Torgersen islands have very similar average body masses, both significantly lower than Biscoe.

</li>
        <li>Dream's average is approximately 3712.90 grams.</li>
        <li>Torgersen's average is approximately 3715.90 grams.</li>
</ul>


<h1>Question No. 6: What is the distribution of sex on each island?</h1>

In [11]:
cleaned_df[['sex', 'island', 'species']].groupby(['sex', 'island']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,species
sex,island,Unnamed: 2_level_1
female,Biscoe,80
female,Dream,61
female,Torgersen,24
male,Biscoe,88
male,Dream,63
male,Torgersen,28


<h3>Insight No.6: </h3>
<ul>
    <li>Biscoe has the highest number of individuals, with 88 males and 80 females.</li>
    <li>Dream has a moderate population, with 63 males and 61 females.</li>
    <li>Torgersen has the lowest population, with 28 males and 24 females.</li>
</ul>

<h1>Question No. 7: What species–island combination shows the largest average bill length?</h1>

In [12]:
cleaned_df.groupby(['species', 'island'])['bill_length_mm'].mean().reset_index(name='bill length')

Unnamed: 0,species,island,bill length
0,Adelie,Biscoe,38.975
1,Adelie,Dream,38.501786
2,Adelie,Torgersen,39.046576
3,Chinstrap,Dream,48.833824
4,Gentoo,Biscoe,47.475983


<h3>Insight No. 7: </h3>
<p>The Chinstrap-Dream combination shows the largest average bill length at 48.833824. This suggests that Chinstrap penguins on Dream island may have a distinct physical characteristic, possibly influenced by environmental or genetic factors, compared to other species-island combinations.</p>

<h1>Question No. 8: What is the distribution of penguin species across the three years (2007–2009)?</h1>

In [13]:
cleaned_df[['year', 'species']].groupby(['year', 'species']).size().reset_index(name='count')

Unnamed: 0,year,species,count
0,2007,Adelie,50
1,2007,Chinstrap,26
2,2007,Gentoo,34
3,2008,Adelie,50
4,2008,Chinstrap,18
5,2008,Gentoo,46
6,2009,Adelie,52
7,2009,Chinstrap,24
8,2009,Gentoo,44


<h3>Insight No. 8:</h3>
<p>The number of penguin species (Adelie, Chinstrap, and Gentoo) remain  stable from 2007 to 2009 with minor difference, where Adelie penguins consistently show the highest numbers (50 in 2007, 50 in 2008, 52 in 2009), suggesting a stable or slightly growing population, while Chinstrap penguins have the lowest counts with a drop from 26 in 2007 to 18 in 2008 before rising to 24 in 2009, and Gentoo penguins exhibit a moderate count increasing from 34 in 2007 to 46 in 2008 then slightly decreasing to 44 in 2009; overall, the total population across all three species ranges between 100-120 penguins per year, indicating a balanced ecosystem with no significant changes over the three-year period.</p>

<h1>Question No. 9: What year had the highest recorded average bill depth across all species?</h1>

In [14]:
cleaned_df.groupby('year')['bill_depth_mm'].mean().reset_index()

Unnamed: 0,year,bill_depth_mm
0,2007,17.425011
1,2008,16.914035
2,2009,17.125426


<h3>Insight No. 9:</h3>
<p> 2007 has the highest recorded average bill depth with an average of 17.43mm per penguin</p>

<h1>Question No.10: Is there a significant change in average body mass or bill length of penguins across the years 2007–2009?</h1>

In [15]:
df_2007 = cleaned_df[cleaned_df['year'] == 2007]
df_2008 = cleaned_df[cleaned_df['year'] == 2008]
df_2009 = cleaned_df[cleaned_df['year'] == 2009]
f_oneway(df_2007['body_mass_g'], df_2008['body_mass_g'], df_2009['body_mass_g'])
f_oneway(df_2007['bill_length_mm'], df_2008['bill_length_mm'], df_2009['bill_length_mm'])

F_onewayResult(statistic=np.float64(0.8997826656141361), pvalue=np.float64(0.40762128193132396))

<h2>Insight No. 10: </h2>
<p>There is no significant change in both bil length and body mass across the years 2007, 2008, 2009 </p>

<h1>Question No. 11: What is the relationship between flipper length and body mass (correlation) for penguins overall?</h1>

In [16]:
cleaned_df['flipper_length_mm'].corr(cleaned_df['body_mass_g'])

np.float64(0.8712017673060114)

<h3>Insight No. 11: </h3>
<p>The correlation between flipper length and body mass is approximately 0.87, indicating a strong positive relationship. This suggests that as penguins' flipper length increases, their body mass also tends to increase.</p>

<h1>Question No.12: What is the average body mass for each species of penguin, broken down by island and sex?</h1>

In [19]:
cleaned_df.groupby(['species', 'island', 'sex'])['body_mass_g'].mean()

species    island     sex   
Adelie     Biscoe     female    3369.318182
                      male      4050.000000
           Dream      female    3344.444444
                      male      4008.620690
           Torgersen  female    3395.833333
                      male      3990.241228
Chinstrap  Dream      female    3527.205882
                      male      3938.970588
Gentoo     Biscoe     female    4679.741379
                      male      5411.011430
Name: body_mass_g, dtype: float64

<h3>Insight No. 12: </h3>
<p>Male penguins generally have a higher average body mass than female penguins across all species and islands. Also, Gentoo penguins have a significantly larger average body mass compared to Adelie and Chinstrap penguins.</p>