***
## Import the Required Python Packages and Methods

In [1]:
# Import the required packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import matplotlib.dates as mdates
import seaborn as sns
import plotly as py
import plotly.graph_objs as go
from plotly import tools
from IPython.display import IFrame

py.offline.init_notebook_mode(connected=True)

***
## Finding and Evaluating Historical Regional Weather Data
The United States National Centers for Environmental Information (NOAA), formerly known as the National Climatic Data Center (NCDC), has a good website where I acquired this data using their online searchable database:

https://www.ncdc.noaa.gov/cdo-web/

I followed the hyperlink to their historical **Data Tools**: https://www.ncdc.noaa.gov/cdo-web/datatools

I then followed the link to their **Find a Station** tool: https://www.ncdc.noaa.gov/cdo-web/datatools/findstation

Once at this website, I then tried to find a land based weather station that is near the regions shown on the following .png image provided by the the USDA/NASS for 2016.

In [2]:
IFrame("NASS_USDA/SC-PR-RGBChor.png", width=760, height=587)

***
## Finding and Evaluating Historical Weather Data for Louisiana
Inside the NOAA Climate Data Online, Data Tools, Find a Station website, I then selected the following categorical data parameters:
    1. Enter Location:    Louisiana, USA
    2. Select Dataset:    Daily Summaries
    3. Select Date Range: 2000-01-01 to 2017-12-31
    4. Data Categories:   [Air Temperature, Precipitation, Wind]
    
I then zoomed down into the website's provided Google map feature to focus into the region where the majority of Louisiana's sugarcane is farmed, producing the follow Google search area with weather stations depicted as cell towers.

In [3]:
IFrame("NCDC_NOAA/Louisiana/Capture.png", width=622, height=554)

***
## Evaluating Historical Weather Data for Lafayette Regional Airport, LA
Once the desired weather station nearest the sugarcane farming growing region is located, I clicked on the cell tower icon and added the dataset to my cart (free data).

I then selected the output format to be CUSTOM GHCN-Daily CSV and selected the date range to go as far back as possible for the station in question to 2017-12-31.

    1893-01-01 to 2017-12-31

I then selected the available data categories for the station in question:
    1. Station Name
    2. Geographic Location
    3. TMAX - Maximum Air Temperature
    4. TMIN - Minimum Air Temperature
    5. PRCP - Precipitation
    6. SNOW - Snowfall
    7. AWND - Average wind speed
    8. WSF2 - Fastest 2-minute wind speed
    9. WSF5 - Fastest 5-second wind speed
    
Upon this request, the NOAA sent two automated emails. The first to confirm the information request and the second with a download link for the data.

I then downloaded the .csv file and performed the following data integrity checks using pandas DataFrames.

In [4]:
#  Import the historical local weather data for Lafayette Regional Airport, LA
df_la_lra = pd.read_csv("NCDC_NOAA/Louisiana/Lafayette/1340010.csv", header=0)
df_la_lra.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45624 entries, 0 to 45623
Data columns (total 13 columns):
STATION      45624 non-null object
NAME         45624 non-null object
LATITUDE     45624 non-null float64
LONGITUDE    45624 non-null float64
ELEVATION    45624 non-null float64
DATE         45624 non-null object
AWND         7069 non-null float64
PRCP         44878 non-null float64
SNOW         30151 non-null float64
TMAX         45504 non-null float64
TMIN         45496 non-null float64
WSF2         7065 non-null float64
WSF5         7033 non-null float64
dtypes: float64(10), object(3)
memory usage: 4.5+ MB


In [5]:
# Check the wind data available
df_la_wind_lra = df_la_lra[["DATE", "AWND"]].dropna()
print("Wind data ranges from " + str(df_la_wind_lra["DATE"].min()) + " to " + str(df_la_wind_lra["DATE"].max()))

Wind data ranges from 1998-08-21 to 2017-12-31


## Missing Wind Data for Lafayette Regional Airport, LA
I noticed in the pandas DataFrame information summary that a large majority of the wind data is missing and will not adequately cover our sugarcane production data from the USDA/NASS. Therefore, I moved on to the next weather nearest weather station.

***
## Evaluating Historical Weather Data for Crowley, LA
Once the desired weather station nearest the sugarcane farming growing region is located, I clicked on the cell tower icon and added the dataset to my cart (free data).

I then selected the output format to be CUSTOM GHCN-Daily CSV and selected the date range to go as far back as possible for the station in question to 2017-12-31.

    1906-06-01 to 2017-12-31

I then selected the available data categories for the station in question:
    1. Station Name
    2. Geographic Location
    3. TMAX - Maximum Air Temperature
    4. TMIN - Minimum Air Temperature
    5. PRCP - Precipitation
    6. SNOW - Snowfall
    7. WDMV - Total wind movement
    
Upon this request, the NOAA sent two automated emails. The first to confirm the information request and the second with a download link for the data.

I then downloaded the .csv file and performed the following data integrity checks using pandas DataFrames.

In [6]:
#  Import the historical local weather data for Crowley, LA
df_la_cr = pd.read_csv("NCDC_NOAA/Louisiana/Crowley/1338765.csv", header=0)
df_la_cr.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33604 entries, 0 to 33603
Data columns (total 11 columns):
STATION      33604 non-null object
NAME         33604 non-null object
LATITUDE     33604 non-null float64
LONGITUDE    33604 non-null float64
ELEVATION    33604 non-null float64
DATE         33604 non-null object
PRCP         33470 non-null float64
SNOW         29606 non-null float64
TMAX         27912 non-null float64
TMIN         27977 non-null float64
WDMV         1809 non-null float64
dtypes: float64(8), object(3)
memory usage: 2.8+ MB


In [7]:
# Check the wind data available
df_la_wind_cr = df_la_cr[["DATE", "WDMV"]].dropna()
print("Wind data ranges from " + str(df_la_wind_cr["DATE"].min()) + " to " + str(df_la_wind_cr["DATE"].max()))

Wind data ranges from 1926-08-01 to 1950-05-31


## Missing Wind Data for Crowley, LA
I noticed in the pandas DataFrame information summary that a large majority of the wind data is missing and will not adequately cover our sugarcane production data from the USDA/NASS. Therefore, I moved on to the next weather nearest weather station.

***
## Evaluating Historical Weather Data for Jennings, LA
Once the desired weather station nearest the sugarcane farming growing region is located, I clicked on the cell tower icon and added the dataset to my cart (free data).

I then selected the output format to be CUSTOM GHCN-Daily CSV and selected the date range to go as far back as possible for the station in question to 2017-12-31.

    1897-09-01 to 2017-12-31

I then selected the available data categories for the station in question:
    1. Station Name
    2. Geographic Location
    3. TMAX - Maximum Air Temperature
    4. TMIN - Minimum Air Temperature
    5. PRCP - Precipitation
    6. SNOW - Snowfall
    7. WDMV - Total wind movement
    
Upon this request, the NOAA sent two automated emails. The first to confirm the information request and the second with a download link for the data.

I then downloaded the .csv file and performed the following data integrity checks using pandas DataFrames.

In [8]:
#  Import the historical local weather data for Jennings, LA
df_la_jn = pd.read_csv("NCDC_NOAA/Louisiana/Jennings/1338779.csv", header=0)
df_la_jn.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 43950 entries, 0 to 43949
Data columns (total 11 columns):
STATION      43950 non-null object
NAME         43950 non-null object
LATITUDE     43950 non-null float64
LONGITUDE    43950 non-null float64
ELEVATION    43950 non-null float64
DATE         43950 non-null object
PRCP         43158 non-null float64
SNOW         32049 non-null float64
TMAX         43735 non-null float64
TMIN         43567 non-null float64
WDMV         9811 non-null float64
dtypes: float64(8), object(3)
memory usage: 3.7+ MB


In [9]:
# Check the wind data available
df_la_wind_jn = df_la_jn[["DATE", "WDMV"]].dropna()
print("Wind data ranges from " + str(df_la_wind_jn["DATE"].min()) + " to " + str(df_la_wind_jn["DATE"].max()))

Wind data ranges from 1990-01-01 to 2017-12-31


## Missing Wind Data for Jennings, LA........................................Combination Candidate
I noticed in the pandas DataFrame information summary that a large majority of the wind data is missing and will not adequately cover our sugarcane production data from the USDA/NASS. Therefore, I moved on to the next weather nearest weather station.

***
## Evaluating Historical Weather Data for Houma, LA
Once the desired weather station nearest the sugarcane farming growing region is located, I clicked on the cell tower icon and added the dataset to my cart (free data).

I then selected the output format to be CUSTOM GHCN-Daily CSV and selected the date range to go as far back as possible for the station in question to 2017-12-31.

    1893-01-01 to 2013-08-16

I then selected the available data categories for the station in question:
    1. Station Name
    2. Geographic Location
    3. TMAX - Maximum Air Temperature
    4. TMIN - Minimum Air Temperature
    5. PRCP - Precipitation
    6. SNOW - Snowfall
    7. WDMV - Total wind movement
    
Upon this request, the NOAA sent two automated emails. The first to confirm the information request and the second with a download link for the data.

I then downloaded the .csv file and performed the following data integrity checks using pandas DataFrames.

In [10]:
#  Import the historical local weather data for Houma, LA
df_la_hm = pd.read_csv("NCDC_NOAA/Louisiana/Houma/1338701.csv", header=0)
df_la_hm.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41092 entries, 0 to 41091
Data columns (total 11 columns):
STATION      41092 non-null object
NAME         41092 non-null object
LATITUDE     41092 non-null float64
LONGITUDE    41092 non-null float64
ELEVATION    41092 non-null float64
DATE         41092 non-null object
PRCP         40256 non-null float64
SNOW         35714 non-null float64
TMAX         40627 non-null float64
TMIN         40197 non-null float64
WDMV         10496 non-null float64
dtypes: float64(8), object(3)
memory usage: 3.4+ MB


In [11]:
# Check the wind data available
df_la_wind_hm = df_la_hm[["DATE", "WDMV"]].dropna()
print("Wind data ranges from " + str(df_la_wind_hm["DATE"].min()) + " to " + str(df_la_wind_hm["DATE"].max()))

Wind data ranges from 1977-02-01 to 2006-12-31


## Missing Wind Data for Houma, LA........................................Combination Candidate
I noticed in the pandas DataFrame information summary that a large majority of the wind data is missing and will not adequately cover our sugarcane production data from the USDA/NASS. Therefore, I moved on to the next weather nearest weather station.

***
## Evaluating Historical Weather Data for New Orleans Airport, LA
Once the desired weather station nearest the sugarcane farming growing region is located, I clicked on the cell tower icon and added the dataset to my cart (free data).

I then selected the output format to be CUSTOM GHCN-Daily CSV and selected the date range to go as far back as possible for the station in question to 2017-12-31.

    1945-10-01 to 2017-12-31

I then selected the available data categories for the station in question:
    1. Station Name
    2. Geographic Location
    3. TMAX - Maximum Air Temperature
    4. TMIN - Minimum Air Temperature
    5. PRCP - Precipitation
    6. SNOW - Snowfall
    7. AWND - Average wind speed
    8. WSF1 - Fastest 1-minute wind speed
    9. WSF2 - Fastest 2-minute wind speed
    10. WSF5 - Fastest 5-second wind speed
    11. WSFG - Peak gust wind speed
    
Upon this request, the NOAA sent two automated emails. The first to confirm the information request and the second with a download link for the data.

I then downloaded the .csv file and performed the following data integrity checks using pandas DataFrames.

In [12]:
#  Import the historical local weather data for New Orleans Airport, LA
df_la_noa = pd.read_csv("NCDC_NOAA/Louisiana/New Orleans/Airport/1338795.csv", header=0)
df_la_noa.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25568 entries, 0 to 25567
Data columns (total 15 columns):
STATION      25568 non-null object
NAME         25568 non-null object
LATITUDE     25568 non-null float64
LONGITUDE    25568 non-null float64
ELEVATION    25568 non-null float64
DATE         25568 non-null object
AWND         12413 non-null float64
PRCP         25567 non-null float64
SNOW         22036 non-null float64
TMAX         25568 non-null int64
TMIN         25568 non-null int64
WSF1         11436 non-null float64
WSF2         7913 non-null float64
WSF5         7869 non-null float64
WSFG         16660 non-null float64
dtypes: float64(10), int64(2), object(3)
memory usage: 2.9+ MB


In [13]:
# Check the wind data available - PEAK GUST WIND SPEED
df_la_wind_noa = df_la_noa[["DATE", "WSFG"]].dropna()
print("Wind data ranges from " + str(df_la_wind_noa["DATE"].min()) + " to " + str(df_la_wind_noa["DATE"].max()))

Wind data ranges from 1949-07-16 to 1996-04-30


In [14]:
# Check the wind data available - AVERAGE WIND SPEED
df_la_wind_noa = df_la_noa[["DATE", "AWND"]].dropna()
print("Wind data ranges from " + str(df_la_wind_noa["DATE"].min()) + " to " + str(df_la_wind_noa["DATE"].max()))

Wind data ranges from 1984-01-01 to 2017-12-31


## Missing Wind Data for New Orleans Airport, LA
I noticed in the pandas DataFrame information summary that a large majority of the wind data is missing and will not adequately cover our sugarcane production data from the USDA/NASS. Therefore, I moved on to the next weather nearest weather station.

***
## Evaluating Historical Weather Data for New Orleans Audubon, LA
Once the desired weather station nearest the sugarcane farming growing region is located, I clicked on the cell tower icon and added the dataset to my cart (free data).

I then selected the output format to be CUSTOM GHCN-Daily CSV and selected the date range to go as far back as possible for the station in question to 2017-12-31.

    1893-01-01 to 2017-12-31

I then selected the available data categories for the station in question:
    1. Station Name
    2. Geographic Location
    3. TMAX - Maximum Air Temperature
    4. TMIN - Minimum Air Temperature
    5. PRCP - Precipitation
    6. SNOW - Snowfall
    7. AWND - Average wind speed
    8. WSFM - Fastest mile wind speed
    9. WSF1 - Fastest 1-minute wind speed
    10. WSFG - Peak gust wind speed
    
Upon this request, the NOAA sent two automated emails. The first to confirm the information request and the second with a download link for the data.

I then downloaded the .csv file and performed the following data integrity checks using pandas DataFrames.

In [15]:
#  Import the historical local weather data for New Orleans Audubon, LA
df_la_noad = pd.read_csv("NCDC_NOAA/Louisiana/New Orleans/Audubon/1337846.csv", header=0)
df_la_noad.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44260 entries, 0 to 44259
Data columns (total 14 columns):
STATION      44260 non-null object
NAME         44260 non-null object
LATITUDE     44260 non-null float64
LONGITUDE    44260 non-null float64
ELEVATION    44260 non-null float64
DATE         44260 non-null object
AWND         365 non-null float64
PRCP         43427 non-null float64
SNOW         29027 non-null float64
TMAX         43038 non-null float64
TMIN         43045 non-null float64
WSF1         5 non-null float64
WSFG         2 non-null float64
WSFM         1644 non-null float64
dtypes: float64(11), object(3)
memory usage: 4.7+ MB


In [16]:
# Check the wind data available
df_la_wind_noad = df_la_noad[["DATE", "WSFM"]].dropna()
print("Wind data ranges from " + str(df_la_wind_noad["DATE"].min()) + " to " + str(df_la_wind_noad["DATE"].max()))

Wind data ranges from 1/1/1966 to 9/9/1971


## Missing Wind Data for New Orleans Audubon, LA
I noticed in the pandas DataFrame information summary that a large majority of the wind data is missing and will not adequately cover our sugarcane production data from the USDA/NASS. Therefore, I moved on to the next weather nearest weather station.

***
## Evaluating Historical Weather Data for New Orleans Alvin Callender Field, LA
Once the desired weather station nearest the sugarcane farming growing region is located, I clicked on the cell tower icon and added the dataset to my cart (free data).

I then selected the output format to be CUSTOM GHCN-Daily CSV and selected the date range to go as far back as possible for the station in question to 2017-12-31.

    1956-06-01 to 2017-08-28

I then selected the available data categories for the station in question:
    1. Station Name
    2. Geographic Location
    3. TMAX - Maximum Air Temperature
    4. TMIN - Minimum Air Temperature
    5. PRCP - Precipitation
    6. SNOW - Snowfall
    7. AWND - Average wind speed
    8. WSF2 - Fastest 2-minute wind speed
    9. WSF5 - Fastest 5-second wind speed
    10. WSFG - Peak gust wind speed
    
Upon this request, the NOAA sent two automated emails. The first to confirm the information request and the second with a download link for the data.

I then downloaded the .csv file and performed the following data integrity checks using pandas DataFrames.

In [17]:
#  Import the historical local weather data for New Orleans Alvin Callender Field, LA
df_la_noacf = pd.read_csv("NCDC_NOAA/Louisiana/New Orleans/Alvin Callender Field/1338814.csv", header=0)
df_la_noacf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20284 entries, 0 to 20283
Data columns (total 14 columns):
STATION      20284 non-null object
NAME         20284 non-null object
LATITUDE     20284 non-null float64
LONGITUDE    20284 non-null float64
ELEVATION    20284 non-null float64
DATE         20284 non-null object
AWND         6953 non-null float64
PRCP         20131 non-null float64
SNOW         17053 non-null float64
TMAX         18539 non-null float64
TMIN         18522 non-null float64
WSF2         1617 non-null float64
WSF5         1607 non-null float64
WSFG         16172 non-null float64
dtypes: float64(11), object(3)
memory usage: 2.2+ MB


In [18]:
# Check the wind data available
df_la_wind_noacf = df_la_noacf[["DATE", "WSFG"]].dropna()
print("Wind data ranges from " + str(df_la_wind_noacf["DATE"].min()) + " to " + str(df_la_wind_noacf["DATE"].max()))

Wind data ranges from 1958-01-01 to 2005-07-31


## Missing Wind Data for New Orleans Alvin Callender Field, LA
I noticed in the pandas DataFrame information summary that a large majority of the wind data is missing and will not adequately cover our sugarcane production data from the USDA/NASS. Therefore, I moved on to the next weather nearest weather station.

***
## Evaluating Historical Weather Data for Baton Rouge LSU, LA
Once the desired weather station nearest the sugarcane farming growing region is located, I clicked on the cell tower icon and added the dataset to my cart (free data).

I then selected the output format to be CUSTOM GHCN-Daily CSV and selected the date range to go as far back as possible for the station in question to 2017-12-31.

    1963-01-01 to 2017-12-31

I then selected the available data categories for the station in question:
    1. Station Name
    2. Geographic Location
    3. TMAX - Maximum Air Temperature
    4. TMIN - Minimum Air Temperature
    5. PRCP - Precipitation
    6. SNOW - Snowfall
    7. WDMV - Total wind movement
    
Upon this request, the NOAA sent two automated emails. The first to confirm the information request and the second with a download link for the data.

I then downloaded the .csv file and performed the following data integrity checks using pandas DataFrames.

In [19]:
#  Import the historical local weather data for Baton Rouge LSU, LA
df_la_brlsu = pd.read_csv("NCDC_NOAA/Louisiana/Baton Rouge/LSU/1338856.csv", header=0)
df_la_brlsu.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19872 entries, 0 to 19871
Data columns (total 11 columns):
STATION      19872 non-null object
NAME         19872 non-null object
LATITUDE     19872 non-null float64
LONGITUDE    19872 non-null float64
ELEVATION    19872 non-null float64
DATE         19872 non-null object
PRCP         19815 non-null float64
SNOW         17243 non-null float64
TMAX         19808 non-null float64
TMIN         19809 non-null float64
WDMV         15597 non-null float64
dtypes: float64(8), object(3)
memory usage: 1.7+ MB


In [20]:
# Check the wind data available
df_la_wind_brlsu = df_la_brlsu[["DATE", "WDMV"]].dropna()
print("Wind data ranges from " + str(df_la_wind_brlsu["DATE"].min()) + " to " + str(df_la_wind_brlsu["DATE"].max()))

Wind data ranges from 1/1/1963 to 9/9/2008


## Missing Wind Data for Baton Rouge LSU, LA......................Combination Candidate
I noticed in the pandas DataFrame information summary that a large majority of the wind data is missing and will not adequately cover our sugarcane production data from the USDA/NASS. Therefore, I moved on to the next weather nearest weather station.

***
## Evaluating Historical Weather Data for Baton Rouge Ryan Airport, LA
Once the desired weather station nearest the sugarcane farming growing region is located, I clicked on the cell tower icon and added the dataset to my cart (free data).

I then selected the output format to be CUSTOM GHCN-Daily CSV and selected the date range to go as far back as possible for the station in question to 2017-12-31.

    1930-01-01 to 2017-12-31

I then selected the available data categories for the station in question:
    1. Station Name
    2. Geographic Location
    3. TMAX - Maximum Air Temperature
    4. TMIN - Minimum Air Temperature
    5. PRCP - Precipitation
    6. SNOW - Snowfall
    7. AWND - Average wind speed
    8. WSF1 - Fastest 1-minute wind speed
    9. WSF2 - Fastest 2-minute wind speed
    10. WSF5 - Fastest 5-second wind speed
    11. WSFG - Peak gust wind speed
    
Upon this request, the NOAA sent two automated emails. The first to confirm the information request and the second with a download link for the data.

I then downloaded the .csv file and performed the following data integrity checks using pandas DataFrames.

In [21]:
#  Import the historical local weather data for Baton Rouge Ryan Airport, LA
df_la_brra = pd.read_csv("NCDC_NOAA/Louisiana/Baton Rouge/Ryan Airport/1337840.csv", header=0)
df_la_brra.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32142 entries, 0 to 32141
Data columns (total 15 columns):
STATION      32142 non-null object
NAME         32142 non-null object
LATITUDE     32142 non-null float64
LONGITUDE    32142 non-null float64
ELEVATION    32142 non-null float64
DATE         32142 non-null object
AWND         12411 non-null float64
PRCP         32114 non-null float64
SNOW         29623 non-null float64
TMAX         32026 non-null float64
TMIN         32104 non-null float64
WSF1         10378 non-null float64
WSF2         8976 non-null float64
WSF5         8902 non-null float64
WSFG         8007 non-null float64
dtypes: float64(12), object(3)
memory usage: 3.7+ MB


In [22]:
# Check the wind data available
df_la_wind_brra = df_la_brra[["DATE", "AWND"]].dropna()
print("Wind data ranges from " + str(df_la_wind_brra["DATE"].min()) + " to " + str(df_la_wind_brra["DATE"].max()))

Wind data ranges from 1984-01-01 to 2017-12-31


## Missing Wind Data for Baton Rouge Ryan Airport, LA
I noticed in the pandas DataFrame information summary that a large majority of the wind data is missing and will not adequately cover our sugarcane production data from the USDA/NASS. Therefore, I moved on to the next weather nearest weather station.

***
## Combining Weather Station Information for Louisiana
It looks like the following weather stations' wind data could be combined to provide a relatively complete weather dataset for Louisiana:
1. Jennings, LA
2. Houma, LA
3. Baton Rouge LSU, LA

To be clearer, I will be combining not only the wind data (the more critical missing element), but also the temperature, precipitation, and snowfall data for these combination candidates as well.

***
## Evaluating and Combining Wind Movement Data

In [23]:
# Reshape the historical weather data for Jennings, LA to extract only the relavent data
# for future analysis
df_la_wind_j = df_la_jn[["DATE", "WDMV"]]
df_la_wind_j.DATE = pd.to_datetime(df_la_wind_j.DATE)
df_la_wind_j = df_la_wind_j.set_index("DATE")
df_la_wind_j = df_la_wind_j.sort_index().dropna()
df_la_wind_j["Year"] = df_la_wind_j.index.year
df_la_wind_j["Month"] = df_la_wind_j.index.month
df_la_wind_j["YYYY-MM"] = df_la_wind_j.index.strftime("%Y-%m")

# Reshape the historical weather data for Houma, LA to extract only the relavent data
# for future analysis
df_la_wind_h = df_la_hm[["DATE", "WDMV"]]
df_la_wind_h.DATE = pd.to_datetime(df_la_wind_h.DATE)
df_la_wind_h = df_la_wind_h.set_index("DATE")
df_la_wind_h = df_la_wind_h.sort_index().dropna()
df_la_wind_h["Year"] = df_la_wind_h.index.year
df_la_wind_h["Month"] = df_la_wind_h.index.month
df_la_wind_h["YYYY-MM"] = df_la_wind_h.index.strftime("%Y-%m")

# Reshape the historical weather data for Baton Rouge LSU, LA to extract only the relavent data
# for future analysis
df_la_wind_b = df_la_brlsu[["DATE", "WDMV"]]
df_la_wind_b.DATE = pd.to_datetime(df_la_wind_b.DATE)
df_la_wind_b = df_la_wind_b.set_index("DATE")
df_la_wind_b = df_la_wind_b.sort_index().dropna()
df_la_wind_b["Year"] = df_la_wind_b.index.year
df_la_wind_b["Month"] = df_la_wind_b.index.month
df_la_wind_b["YYYY-MM"] = df_la_wind_b.index.strftime("%Y-%m")



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



In [24]:
# Plot the historical weather combination Dataframe candidates to visually inspect their respective
# data integrity
# Generate an interactive plot using the plotly package
trace1 = go.Scatter(x=df_la_wind_j.index, y=df_la_wind_j.WDMV, mode="lines",
                    line=dict(color="rgb(35,122,181)"), name="Jennings, LA")
trace2 = go.Scatter(x=df_la_wind_h.index, y=df_la_wind_h.WDMV, mode="lines",
                    line=dict(color="rgb(255,127,14)"), name="Houma, LA")
trace3 = go.Scatter(x=df_la_wind_b.index, y=df_la_wind_b.WDMV, mode="lines", 
                    line=dict(color="rgb(44,160,44)"), name="LSU, LA")
trace4 = go.Box(y=df_la_wind_j.WDMV, marker=dict(color="rgb(35,122,181)"),
                boxmean="sd", name="Jennings, LA")
trace5 = go.Box(y=df_la_wind_h.WDMV, marker=dict(color="rgb(255,127,14)"),
                boxmean="sd", name="Houma, LA")
trace6 = go.Box(y=df_la_wind_b.WDMV, marker=dict(color="rgb(44,160,44)"),
                boxmean="sd", name="LSU, LA")

fig = tools.make_subplots(rows=3, cols=2, shared_xaxes=True, shared_yaxes=True,
                          vertical_spacing=0.01)
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 3, 1)
fig.append_trace(trace4, 1, 2)
fig.append_trace(trace5, 2, 2)
fig.append_trace(trace6, 3, 2)

fig["layout"].update(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    height=1000,
    title="<b>Historical Wind Data for Louisiana Station Combination Candidates</b>",
    titlefont=dict(family="serif", size=24),
    yaxis1=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    yaxis2=dict(title="<b>Wind Movement, miles</b>",
                titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14), anchor="x"),
    yaxis3=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    yaxis4=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14), anchor="x2"),
    yaxis5=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    yaxis6=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    xaxis1=dict(title="<b>Year</b>", titlefont=dict(family="serif", size=14), 
                tickfont=dict(family="serif", size=14), domain=[0, 0.85]),
    xaxis2=dict(tickfont=dict(family="serif", size=14), domain=[0.9, 1]),
    xaxis3=dict(title="<b>Year</b>", titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14), domain=[0, 0.85]),
    xaxis4=dict(tickfont=dict(family="serif", size=14), domain=[0.9, 1]),
    xaxis5=dict(title="<b>Year</b>", titlefont=dict(family="serif", size=14), 
                tickfont=dict(family="serif", size=14), domain=[0, 0.85]),
    xaxis6=dict(tickfont=dict(family="serif", size=14), domain=[0.9, 1]))

py.offline.iplot(fig)

This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y1 ]
[ (2,1) x1,y2 ]  [ (2,2) x2,y2 ]
[ (3,1) x1,y3 ]  [ (3,2) x2,y3 ]



## Inspect the High Wind Anomalies and Remove from Dataset if Applicable
We can see in the above interactive plotly scatter plot, that we're getting faulty wind recordings on the Jennings, LA weather station of 999.2 miles on the following date ranges:
1. 2013-01-26 to 2013-01-27
2. 2013-02-22 to 2013-03-05
3. 2013-03-21 to 2013-03-30

We can see in the above interactive plotly scatter plot, that we're getting unexplainable high wind recordings on the Baton Rouge LSU, LA weather station of 517, 620.1, and 697.8 miles on the following dates respectively:
1. 1982-01-31
2. 1985-09-14
3. 2000-05-11

I used the following website to determine if these high wind recordings are related to past hurricanes, but these readings are still unexplainably high.
http://www.wpc.ncep.noaa.gov/research/lahur.pdf

Let's remove these anomalies from the daily summary data, then reproduce the monthly subplots from above.

In [25]:
# Remove the wind movement daily data anomalies 
df_la_wind_j.WDMV = [np.nan if w >= 999 else w for w in df_la_wind_j.WDMV]
df_la_wind_b.WDMV = [np.nan if w >= 517 else w for w in df_la_wind_b.WDMV]

# Plot the historical weather combination Dataframe candidates to visually inspect their respective
# data integrity
# Generate an interactive plot using the plotly package
trace1 = go.Scatter(x=df_la_wind_j.index, y=df_la_wind_j.WDMV, mode="lines",
                    line=dict(color="rgb(35,122,181)"), name="Jennings, LA")
trace2 = go.Scatter(x=df_la_wind_h.index, y=df_la_wind_h.WDMV, mode="lines",
                    line=dict(color="rgb(255,127,14)"), name="Houma, LA")
trace3 = go.Scatter(x=df_la_wind_b.index, y=df_la_wind_b.WDMV, mode="lines", 
                    line=dict(color="rgb(44,160,44)"), name="LSU, LA")
trace4 = go.Box(y=df_la_wind_j.WDMV, marker=dict(color="rgb(35,122,181)"),
                boxmean="sd", name="Jennings, LA")
trace5 = go.Box(y=df_la_wind_h.WDMV, marker=dict(color="rgb(255,127,14)"),
                boxmean="sd", name="Houma, LA")
trace6 = go.Box(y=df_la_wind_b.WDMV, marker=dict(color="rgb(44,160,44)"),
                boxmean="sd", name="LSU, LA")

fig = tools.make_subplots(rows=3, cols=2, shared_xaxes=True, shared_yaxes=True,
                          vertical_spacing=0.01)
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 3, 1)
fig.append_trace(trace4, 1, 2)
fig.append_trace(trace5, 2, 2)
fig.append_trace(trace6, 3, 2)

fig["layout"].update(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    height=1000,
    title="<b>Historical Wind Data for Louisiana Station Combination Candidates</b>",
    titlefont=dict(family="serif", size=24),
    yaxis1=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    yaxis2=dict(title="<b>Wind Movement, miles</b>",
                titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14), anchor="x"),
    yaxis3=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    yaxis4=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14), anchor="x2"),
    yaxis5=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    yaxis6=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    xaxis1=dict(title="<b>Year</b>", titlefont=dict(family="serif", size=14), 
                tickfont=dict(family="serif", size=14), domain=[0, 0.85]),
    xaxis2=dict(tickfont=dict(family="serif", size=14), domain=[0.9, 1]),
    xaxis3=dict(title="<b>Year</b>", titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14), domain=[0, 0.85]),
    xaxis4=dict(tickfont=dict(family="serif", size=14), domain=[0.9, 1]),
    xaxis5=dict(title="<b>Year</b>", titlefont=dict(family="serif", size=14), 
                tickfont=dict(family="serif", size=14), domain=[0, 0.85]),
    xaxis6=dict(tickfont=dict(family="serif", size=14), domain=[0.9, 1]))

py.offline.iplot(fig)

This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y1 ]
[ (2,1) x1,y2 ]  [ (2,2) x2,y2 ]
[ (3,1) x1,y3 ]  [ (3,2) x2,y3 ]



## Combine the Validated Wind Data



In [26]:
# Join the three weather stations' wind data together and take the average. Also, prepare
# the new DataFrame for pivoting the data into 
df_la_wind = df_la_wind_b[["WDMV"]].join(df_la_wind_h[["WDMV"]], how="outer", lsuffix="_B", rsuffix="_H") \
        .join(df_la_wind_j[["WDMV"]], how="outer").mean(axis=1).to_frame(name="WDMV").dropna()
df_la_wind["Year"] = df_la_wind.index.year
df_la_wind["Month"] = df_la_wind.index.month
df_la_wind["YYYY-MM"] = df_la_wind.index.strftime("%Y-%m")

# Data reshaped to be indexed by Year and split into Monthly columns with the appropriate aggregation method
df_la_monthly_wind = df_la_wind.pivot_table(index="Year", columns="Month", values="WDMV", aggfunc="sum")

In [27]:
# Generate an interactive plot using the plotly package
trace1 = go.Scatter(x=df_la_wind.index, y=df_la_wind.WDMV, mode="lines", name="Time Line")
trace2 = go.Box(y=df_la_monthly_wind[1], boxmean="sd", name="Jan")
trace3 = go.Box(y=df_la_monthly_wind[2], boxmean="sd", name="Feb")
trace4 = go.Box(y=df_la_monthly_wind[3], boxmean="sd", name="Mar")
trace5 = go.Box(y=df_la_monthly_wind[4], boxmean="sd", name="Apr")
trace6 = go.Box(y=df_la_monthly_wind[5], boxmean="sd", name="May")
trace7 = go.Box(y=df_la_monthly_wind[6], boxmean="sd", name="Jun")
trace8 = go.Box(y=df_la_monthly_wind[7], boxmean="sd", name="Jul")
trace9 = go.Box(y=df_la_monthly_wind[8], boxmean="sd", name="Aug")
trace10 = go.Box(y=df_la_monthly_wind[9], boxmean="sd", name="Sep")
trace11 = go.Box(y=df_la_monthly_wind[10], boxmean="sd", name="Oct")
trace12 = go.Box(y=df_la_monthly_wind[11], boxmean="sd", name="Nov")
trace13 = go.Box(y=df_la_monthly_wind[12], boxmean="sd", name="Dec")

fig = tools.make_subplots(rows=2, cols=1, vertical_spacing=0.05)
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 2, 1)
fig.append_trace(trace5, 2, 1)
fig.append_trace(trace6, 2, 1)
fig.append_trace(trace7, 2, 1)
fig.append_trace(trace8, 2, 1)
fig.append_trace(trace9, 2, 1)
fig.append_trace(trace10, 2, 1)
fig.append_trace(trace11, 2, 1)
fig.append_trace(trace12, 2, 1)
fig.append_trace(trace13, 2, 1)

fig["layout"].update(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    height=600,
    title="<b>Historical Wind Data for Louisiana Post Combination</b>",
    titlefont=dict(family="serif", size=24),
    yaxis1=dict(title="<b>Mean Wind Movement, miles</b>", titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    yaxis2=dict(title="<b>Cummulative Wind Movement, miles</b>", 
                titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    xaxis1=dict(title="<b>Date</b>", titlefont=dict(family="serif", size=14), 
                tickfont=dict(family="serif", size=14)),
    xaxis2=dict(tickfont=dict(family="serif", size=14)))

py.offline.iplot(fig)

This is the format of your plot grid:
[ (1,1) x1,y1 ]
[ (2,1) x2,y2 ]



***
## Evaluating and Combining Temperature Data