# Preparation of weather dataset for subsequent analyses

### Import historical weather data obtained from_?_
* "85008_history.csv.csv"

### Reformat data
* Drop columns not required for analysis.
* Reformat date column to display years, e.g., 2012.
* Drop rows not needed for analysis (i.e., not in 2013-2020)

### Prep night-time low data
* Select days where the night-time low was at least 90 F.
* Count number of days with a low of at least 90 F in each year.
* Export dataset: "Nights_above_90_ready_for_analysis.csv"

### Prep precipitation data
* Calculate total precipitation for two timespans of interest.
* Export dataset: "Total_precip_per_timespan_ready_for_analysis.csv"

### Import historical weather data

In [7]:
# Dependencies
import matplotlib.pyplot as plt
import pandas as pd

# Path to raw weather data
csv_path = "85008_history.csv"

# Read in and store into Pandas data frame
weather_data = pd.read_csv(csv_path)

weather_data.head()

Unnamed: 0,Name,Date time,Maximum Temperature,Minimum Temperature,Temperature,Wind Chill,Heat Index,Precipitation,Snow Depth,Wind Speed,Wind Gust,Visibility,Cloud Cover,Relative Humidity,Conditions
0,"85008, USA",1/1/12,80.0,47.0,63.3,46.0,,0.0,,15.0,,9.9,17.1,36.14,Clear
1,"85008, USA",1/2/12,79.1,56.9,66.7,,,0.0,,17.2,,9.9,1.3,22.38,Clear
2,"85008, USA",1/3/12,74.9,53.9,63.1,,,0.0,,10.3,,9.9,38.3,29.66,Partially cloudy
3,"85008, USA",1/4/12,79.1,48.8,62.1,47.0,,0.0,,9.2,,9.9,17.5,33.66,Clear
4,"85008, USA",1/5/12,77.1,50.1,62.0,,,0.0,,12.8,,9.9,29.2,28.54,Partially cloudy


### Reformat data

In [8]:
# List all the columns in the table
weather_data.columns

Index(['Name', 'Date time', 'Maximum Temperature', 'Minimum Temperature',
       'Temperature', 'Wind Chill', 'Heat Index', 'Precipitation',
       'Snow Depth', 'Wind Speed', 'Wind Gust', 'Visibility', 'Cloud Cover',
       'Relative Humidity', 'Conditions'],
      dtype='object')

In [31]:
# We only want Date, Minimum Temperature and Percipitation, so create a new table that takes that information only
new_data = weather_data[["Date time", "Minimum Temperature", "Precipitation"]]

new_data

Unnamed: 0,Date time,Minimum Temperature,Precipitation
0,1/1/12,47.0,0.0
1,1/2/12,56.9,0.0
2,1/3/12,53.9,0.0
3,1/4/12,48.8,0.0
4,1/5/12,50.1,0.0
...,...,...,...
3206,10/11/20,69.1,0.0
3207,10/12/20,70.0,0.0
3208,10/13/20,70.0,0.0
3209,10/14/20,69.0,0.0


In [33]:
# Parse year out of "Date time" column, then concatenate "20" to each year to get correct format, e.g., 2012
# Drop Date time column no longer needed.
new_data["Year"] = new_data["Date time"].str.extract(r'(..$)')

year = "20" + new_data["Year"]
new_data["Year"] = year

year_reformat_df = new_data[["Year", "Minimum Temperature", "Precipitation"]]

year_reformat_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_data["Year"] = new_data["Date time"].str.extract(r'(..$)')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_data["Year"] = year


Unnamed: 0,Year,Minimum Temperature,Precipitation
0,2012,47.0,0.0
1,2012,56.9,0.0
2,2012,53.9,0.0
3,2012,48.8,0.0
4,2012,50.1,0.0
...,...,...,...
3206,2020,69.1,0.0
3207,2020,70.0,0.0
3208,2020,70.0,0.0
3209,2020,69.0,0.0


In [35]:
# Drop rows not needed for analysis; keep (2013-2020).
formatted_weather_data_df = year_reformat_df.loc[new_data["Year"] >= "2013"]

formatted_weather_data_df

Unnamed: 0,Year,Minimum Temperature,Precipitation
366,2013,35.9,0.0
367,2013,38.0,0.0
368,2013,41.1,0.0
369,2013,41.1,0.0
370,2013,38.0,0.0
...,...,...,...
3206,2020,69.1,0.0
3207,2020,70.0,0.0
3208,2020,70.0,0.0
3209,2020,69.0,0.0


### Prep night-time low data

In [47]:
# Select days where the low temperature was at least 90 F and keep only the Year and Minimum Temperature columns.
nights_above_90 = formatted_weather_data_df.loc[new_data["Minimum Temperature"] >= 90 , ["Year", "Minimum Temperature"]]

nights_above_90

Unnamed: 0,Year,Minimum Temperature
545,2013,91.1
546,2013,91.1
547,2013,91.1
550,2013,91.1
553,2013,91.1
...,...,...
3150,2020,91.1
3152,2020,91.2
3153,2020,91.9
3158,2020,92.0


In [50]:
# Count nights above 90 for each year, rename Minimum Temperature field to represent data it now contains,
# reset index, and convert back to DataFrame.
count_nights_above_90_gb = nights_above_90.groupby(["Year"])["Minimum Temperature"].count()
count_nights_above_90_df = pd.DataFrame(count_gb).reset_index()
count_nights_above_90_df.rename(columns = {"Minimum Temperature": "Nights Above 90"}, inplace = True)
count_nights_above_90_df

Unnamed: 0,Year,Nights Above 90
0,2013,13
1,2014,4
2,2015,7
3,2016,6
4,2017,7
5,2018,11
6,2019,10
7,2020,23


In [51]:
# Export nights above 90 data to csv for subsequent analysis
count_nights_above_90_df.to_csv("Nights_above_90_ready_for_analysis.csv")

### Prep precipitation data

In [None]:
# List all the columns in the table
weather_data.columns
8:39
# We only want Date, Minimum Temperature and Percipitation, so create a new table that takes that information only
new_data = weather_data[["Month", "Day", "Year", "Minimum Temperature", "Precipitation"
                        ]]
new_data
8:39
total_precip = new_data.groupby(["Year"])
total_precip["Precipitation"].sum()
precip_list = list(total_precip["Precipitation"].sum())
precip_list
8:39
from statistics import mean
group_1 = mean(precip_list[1:3])
group_2 = mean(precip_list[3:9])
print(group_1)
print(group_2)
8:39
# Create a BarGraph Visualizaion for the Precipitation by year
years = ["2013 to 2014", "2015 to 2019"]
rain = [8.36, 6.22]
x_axis = np.arange(0, len(years))
tick_locations = []
for x in x_axis:
    tick_locations.append(x)
plt.title("Total Average Precipitation")
plt.xlabel("Year")
plt.ylabel("Amount of Rain")
plt.xlim(-0.75, len(years)-.25)
plt.ylim(0, max(rain) + 5)
plt.bar(x_axis, rain, facecolor="blue", alpha=0.75, align="center")
plt.xticks(tick_locations, years)
plt.show()