# Bonus: Temperature Analysis I

In [1]:
import pandas as pd
from datetime import datetime as dt

In [2]:
# "tobs" is "temperature observations"
df = pd.read_csv('hawaii_measurements.csv')
df.head()

Unnamed: 0,station,date,prcp,tobs
0,USC00519397,2010-01-01,0.08,65
1,USC00519397,2010-01-02,0.0,63
2,USC00519397,2010-01-03,0.0,74
3,USC00519397,2010-01-04,0.0,76
4,USC00519397,2010-01-06,,73


In [3]:
# Convert the date column format from string to datetime
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')

In [4]:
# Set the date column as the DataFrame index
df_date_index = df.set_index('date')
df_date_index.head()

Unnamed: 0_level_0,station,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-01-01,USC00519397,0.08,65
2010-01-02,USC00519397,0.0,63
2010-01-03,USC00519397,0.0,74
2010-01-04,USC00519397,0.0,76
2010-01-06,USC00519397,,73


### Compare June and December data across all years 

In [5]:
from scipy import stats
import numpy as np

In [7]:
# Filter data for desired months
december = df.loc[df['date'].dt.month == 12]
june = df.loc[df['date'].dt.month == 6]

#reset index
june = june.reset_index()
december = december.reset_index()

In [8]:
# Identify the average temperature for June
june_avg_temp = round(june['tobs'].mean(),1)
print(f'June average temp = {june_avg_temp} F')

June average temp = 74.9 F


In [9]:
# Identify the average temperature for December
dec_avg_temp = round(december['tobs'].mean(),1)
print(f'December average temp = {dec_avg_temp} F')

December average temp = 71.0 F


In [10]:
#check if december and june dataframes have the same shape
print(december.shape)
print(june.shape)

(1517, 5)
(1700, 5)


In [11]:
#randomly remove 183 rows from June so that df shapes are equivalent
 
n=183
drop_indices = np.random.choice(june.index, n, replace=False)
june_lite = june.drop(drop_indices)

#check if data frame shapes are equivalent
print(f'December df shape: {december.shape}')
print(f'June df shape: {june_lite.shape}')

December df shape: (1517, 5)
June df shape: (1517, 5)


In [12]:
#check for variance in mean tobs results
june_var = np.var(june['tobs'])
december_var = np.var(december['tobs'])

print(f'June Variance = {june_var}')
print(f'December Variance = {december_var}')

June Variance = 10.604524221453236
December Variance = 14.022665558302293


In [13]:
# Run unpaired t-test
stats.ttest_ind(june_lite['tobs'], december['tobs'])

Ttest_indResult(statistic=30.500141857462513, pvalue=1.9741834576198397e-178)

### Analysis

In Hawaii, the average temperatures for the months of June and December were 74.9 and 71.0 F, respectively. These averages were computed using data from 2010 - 2017 for June and 2010 - 2016 for December. A unpaired t-test was run to determine if the average temperature was different between the two months. The p-value for this test was 1.97 x 10^-178, which allows us to reject the null hypothesis (the mean temperature is the same in both months) in favor of the alternative hypothesis (there is a difference in mean temperature between months).

The paired t-test was appropriate for two reasons.
1. The sample variances were approximatley equivalent. June and December variances were 10.6 and 14.0, respectively.
2. The samples are independent (i.e. June and December have independent samples).

The June sample had 1700 temperature observations, compared to 1517 in the December sample. Equivalent sample sizes are required to run a paired t-test. To accomplish this, 183 temperature observations in the June sample were randomly selected and removed from the sample. The paired t-test was then run using the modified June sample.