# Bonus: Temperature Analysis I

In [1]:
import pandas as pd
from datetime import datetime as dt

In [2]:
# "tobs" is "temperature observations"
df = pd.read_csv("Resources/hawaii_measurements.csv")
df.head()

Unnamed: 0,station,date,prcp,tobs
0,USC00519397,2010-01-01,0.08,65
1,USC00519397,2010-01-02,0.0,63
2,USC00519397,2010-01-03,0.0,74
3,USC00519397,2010-01-04,0.0,76
4,USC00519397,2010-01-06,,73


In [3]:
# Convert the date column format from string to datetime
df['date'] = pd.to_datetime(df['date'])

In [4]:
# Set date column as DataFrame index & Drop date column
df.set_index('date', inplace=True, drop=True)
df.head()

Unnamed: 0_level_0,station,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-01-01,USC00519397,0.08,65
2010-01-02,USC00519397,0.0,63
2010-01-03,USC00519397,0.0,74
2010-01-04,USC00519397,0.0,76
2010-01-06,USC00519397,,73


In [5]:
## Check DatetimeIndex
type(df.index)

pandas.core.indexes.datetimes.DatetimeIndex

### Compare June and December data across all years 

In [6]:
from scipy import stats

In [7]:
# Filter data for desired months
june_loc = df.index.month == 6
dec_loc = df.index.month == 12

In [8]:
# Identify the average temperature for June
tavg_jun = df['tobs'].loc[june_loc].mean()
print(f"Average Temperature (June): {tavg_jun:.2f} °F")

Average Temperature (June): 74.94 °F


In [9]:
# Identify the average temperature for December
tavg_dec = df['tobs'].loc[dec_loc].mean()
print(f"Average Temperature (December): {tavg_dec:.2f} °F")

Average Temperature (December): 71.04 °F


In [10]:
# Create collections of temperature data
june_temp = list(df['tobs'].loc[june_loc].values)
dec_temp = list(df['tobs'].loc[dec_loc].values)

In [16]:
## Perform t-test for the means of two independent samples of temperatures (unpaired t-test)
## & does not assume equal population variance (Welch's t-test)
statistic, pvalue = stats.ttest_ind(june_temp, dec_temp, equal_var=False)
print(f"p-value: {pvalue:.4e}")

p-value: 4.1935e-187


### Analysis

This test is performed to find if there is a meaningful difference between June and December temperatures in Hawaii, using sample data.

#### T-test parameters
- Two-sided test for the null hypothesis.
- Compare means of two independent samples of temperatures (unpaired t-test).
- This test does not assume equal population variance (Welch's t-test)

#### Null hypothesis
"June and December temperatures have identical average values."

#### T-test results
Since p-value is smaller than 0.01, the null hypothesis of identical average temperatures can be rejected. It can then be affirmed (with 99% confidence) that temperature averages in Hawaii are not equal.

This means there is enough statistical evidence to conclude that June and December daily temperatures in Hawaii have a meaningful difference.