***
## Import the Required Python Packages and Methods

In [1]:
# Import the required packages
import pandas as pd
import numpy as np
import plotly as py
import plotly.graph_objs as go
from plotly import tools
from IPython.display import IFrame

py.offline.init_notebook_mode(connected=True)

***
## Finding and Evaluating Historical Regional Weather Data
The United States National Centers for Environmental Information (NOAA), formerly known as the National Climatic Data Center (NCDC), has a good website where I acquired this data using their online searchable database:

https://www.ncdc.noaa.gov/cdo-web/

I followed the hyperlink to their historical **Data Tools**: https://www.ncdc.noaa.gov/cdo-web/datatools

I then followed the link to their **Find a Station** tool: https://www.ncdc.noaa.gov/cdo-web/datatools/findstation

Once at this website, I then tried to find a land based weather station that is near the regions shown on the following .png image provided by the the USDA/NASS for 2016.

In [2]:
IFrame("NASS_USDA/SC-PR-RGBChor.png", width=760, height=587)

***
## Finding and Evaluating Historical Weather Data for Texas
Inside the NOAA Climate Data Online, Data Tools, Find a Station website, I then selected the following categorical data parameters:
    1. Enter Location:    Texas, USA
    2. Select Dataset:    Daily Summaries
    3. Select Date Range: 2000-01-01 to 2017-12-31
    4. Data Categories:   [Air Temperature, Precipitation, Wind]
    
I then zoomed down into the website's provided Google map feature to focus into the region where the majority of Louisiana's sugarcane is farmed, producing the follow Google search area with weather stations depicted as cell towers.

In [3]:
IFrame("NCDC_NOAA/Texas/Capture.png", width=624, height=556)

***
## Evaluating Historical Weather Data for Weslaco, TX
Once the desired weather station nearest the sugarcane farming growing region is located, I clicked on the cell tower icon and added the dataset to my cart (free data).

I then selected the output format to be CUSTOM GHCN-Daily CSV and selected the date range to go as far back as possible for the station in question to 2017-12-31.

    1914-02-01 to 2017-12-31

I then selected the available data categories for the station in question:
    1. Station Name
    2. Geographic Location
    3. TMAX - Maximum Air Temperature
    4. TMIN - Minimum Air Temperature
    5. PRCP - Precipitation
    6. SNOW - Snowfall
    7. WDMV - Total wind movement
    
Upon this request, the NOAA sent two automated emails. The first to confirm the information request and the second with a download link for the data.

I then downloaded the .csv file and performed the following data integrity checks using pandas DataFrames.

In [4]:
#  Import the historical local weather data for Weslaco, TX
df_tx_ws = pd.read_csv("NCDC_NOAA/Texas/Weslaco/1343909.csv", header=0)
df_tx_ws.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36048 entries, 0 to 36047
Data columns (total 11 columns):
STATION      36048 non-null object
NAME         36048 non-null object
LATITUDE     36048 non-null float64
LONGITUDE    36048 non-null float64
ELEVATION    36048 non-null float64
DATE         36048 non-null object
PRCP         35449 non-null float64
SNOW         22127 non-null float64
TMAX         34682 non-null float64
TMIN         34775 non-null float64
WDMV         9563 non-null float64
dtypes: float64(8), object(3)
memory usage: 3.0+ MB


In [5]:
# Check the wind data available
df_tx_wind_ws = df_tx_ws[["DATE", "WDMV"]].dropna()
print("Wind data ranges from " + str(df_tx_wind_ws["DATE"].min()) + " to " + str(df_tx_wind_ws["DATE"].max()))

Wind data ranges from 1947-09-01 to 2011-02-28


## Missing Wind Data for Weslaco, TX
I noticed in the pandas DataFrame information summary that a large majority of the wind data is missing and will not adequately cover our sugarcane production data from the USDA/NASS. Therefore, I moved on to the next weather nearest weather station.

***
## Evaluating Historical Weather Data for Santa Rosa, TX
Once the desired weather station nearest the sugarcane farming growing region is located, I clicked on the cell tower icon and added the dataset to my cart (free data).

I then selected the output format to be CUSTOM GHCN-Daily CSV and selected the date range to go as far back as possible for the station in question to 2017-12-31.

    1987-03-01 to 2017-12-31

I then selected the available data categories for the station in question:
    1. Station Name
    2. Geographic Location
    3. TMAX - Maximum Air Temperature
    4. TMIN - Minimum Air Temperature
    5. PRCP - Precipitation
    6. SNOW - Snowfall
    7. WDMV - Total wind movement
    
Upon this request, the NOAA sent two automated emails. The first to confirm the information request and the second with a download link for the data.

I then downloaded the .csv file and performed the following data integrity checks using pandas DataFrames.

In [6]:
#  Import the historical local weather data for Santa Rosa, TX
df_tx_sr = pd.read_csv("NCDC_NOAA/Texas/Santa Rosa/1343944.csv", header=0)
df_tx_sr.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10733 entries, 0 to 10732
Data columns (total 11 columns):
STATION      10733 non-null object
NAME         10733 non-null object
LATITUDE     10733 non-null float64
LONGITUDE    10733 non-null float64
ELEVATION    10733 non-null float64
DATE         10733 non-null object
PRCP         10707 non-null float64
SNOW         8705 non-null float64
TMAX         10667 non-null float64
TMIN         10659 non-null float64
WDMV         1872 non-null float64
dtypes: float64(8), object(3)
memory usage: 922.4+ KB


In [7]:
# Check the wind data available
df_tx_wind_sr = df_tx_sr[["DATE", "WDMV"]].dropna()
print("Wind data ranges from " + str(df_tx_wind_sr["DATE"].min()) + " to " + str(df_tx_wind_sr["DATE"].max()))

Wind data ranges from 2003-04-06 to 2017-09-30


## Missing Wind Data for Santa Rosa, TX
I noticed in the pandas DataFrame information summary that a large majority of the wind data is missing and will not adequately cover our sugarcane production data from the USDA/NASS. Therefore, I moved on to the next weather nearest weather station.

***
## Evaluating Historical Weather Data for Rio Grande City, TX
Once the desired weather station nearest the sugarcane farming growing region is located, I clicked on the cell tower icon and added the dataset to my cart (free data).

I then selected the output format to be CUSTOM GHCN-Daily CSV and selected the date range to go as far back as possible for the station in question to 2017-12-31.

    1892-07-01 to 2017-12-31

I then selected the available data categories for the station in question:
    1. Station Name
    2. Geographic Location
    3. TMAX - Maximum Air Temperature
    4. TMIN - Minimum Air Temperature
    5. PRCP - Precipitation
    6. SNOW - Snowfall
    7. WDMV - Total wind movement
    
Upon this request, the NOAA sent two automated emails. The first to confirm the information request and the second with a download link for the data.

I then downloaded the .csv file and performed the following data integrity checks using pandas DataFrames.

In [8]:
#  Import the historical local weather data for Rio Grande City, TX
df_tx_rg = pd.read_csv("NCDC_NOAA/Texas/Rio Grande City/1343968.csv", header=0)
df_tx_rg.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39794 entries, 0 to 39793
Data columns (total 11 columns):
STATION      39794 non-null object
NAME         39794 non-null object
LATITUDE     39794 non-null float64
LONGITUDE    39794 non-null float64
ELEVATION    39794 non-null float64
DATE         39794 non-null object
PRCP         39574 non-null float64
SNOW         32318 non-null float64
TMAX         33499 non-null float64
TMIN         33365 non-null float64
WDMV         16875 non-null float64
dtypes: float64(8), object(3)
memory usage: 3.3+ MB


In [9]:
# Check the wind data available
df_tx_wind_rg = df_tx_rg[["DATE", "WDMV"]].dropna()
print("Wind data ranges from " + str(df_tx_wind_rg["DATE"].min()) + " to " + str(df_tx_wind_rg["DATE"].max()))

Wind data ranges from 1962-07-02 to 2017-09-28


## Missing Wind Data for Rio Grande City, TX.............................Combination Candidate
I noticed in the pandas DataFrame information summary that a large majority of the wind data is missing and will not adequately cover our sugarcane production data from the USDA/NASS. Therefore, I moved on to the next weather nearest weather station.

***
## Evaluating Historical Weather Data for McCook, TX
Once the desired weather station nearest the sugarcane farming growing region is located, I clicked on the cell tower icon and added the dataset to my cart (free data).

I then selected the output format to be CUSTOM GHCN-Daily CSV and selected the date range to go as far back as possible for the station in question to 2017-12-31.

    1941-08-25 to 2017-12-31

I then selected the available data categories for the station in question:
    1. Station Name
    2. Geographic Location
    3. TMAX - Maximum Air Temperature
    4. TMIN - Minimum Air Temperature
    5. PRCP - Precipitation
    6. SNOW - Snowfall
    7. WDMV - Total wind movement
    
Upon this request, the NOAA sent two automated emails. The first to confirm the information request and the second with a download link for the data.

I then downloaded the .csv file and performed the following data integrity checks using pandas DataFrames.

In [10]:
#  Import the historical local weather data for McCook, TX
df_tx_mc = pd.read_csv("NCDC_NOAA/Texas/McCook/1344013.csv", header=0)
df_tx_mc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26682 entries, 0 to 26681
Data columns (total 11 columns):
STATION      26682 non-null object
NAME         26682 non-null object
LATITUDE     26682 non-null float64
LONGITUDE    26682 non-null float64
ELEVATION    26682 non-null float64
DATE         26682 non-null object
PRCP         26464 non-null float64
SNOW         24840 non-null float64
TMAX         25900 non-null float64
TMIN         25807 non-null float64
WDMV         14541 non-null float64
dtypes: float64(8), object(3)
memory usage: 2.2+ MB


In [11]:
# Check the wind data available
df_tx_wind_mc = df_tx_mc[["DATE", "WDMV"]].dropna()
print("Wind data ranges from " + str(df_tx_wind_mc["DATE"].min()) + " to " + str(df_tx_wind_mc["DATE"].max()))

Wind data ranges from 1963-09-01 to 2017-09-26


## Missing Wind Data for McCook, TX
I noticed in the pandas DataFrame information summary that a large majority of the wind data is missing and will not adequately cover our sugarcane production data from the USDA/NASS. Therefore, I moved on to the next weather nearest weather station.

***
## Evaluating Historical Weather Data for Falcon Dam, TX
Once the desired weather station nearest the sugarcane farming growing region is located, I clicked on the cell tower icon and added the dataset to my cart (free data).

I then selected the output format to be CUSTOM GHCN-Daily CSV and selected the date range to go as far back as possible for the station in question to 2017-12-31.

    1962-07-01 to 2017-12-31

I then selected the available data categories for the station in question:
    1. Station Name
    2. Geographic Location
    3. TMAX - Maximum Air Temperature
    4. TMIN - Minimum Air Temperature
    5. PRCP - Precipitation
    6. SNOW - Snowfall
    7. WDMV - Total wind movement
    
Upon this request, the NOAA sent two automated emails. The first to confirm the information request and the second with a download link for the data.

I then downloaded the .csv file and performed the following data integrity checks using pandas DataFrames.

In [12]:
#  Import the historical local weather data for Falcon Dam, TX
df_tx_fd = pd.read_csv("NCDC_NOAA/Texas/Falcon Dam/1344038.csv", header=0)
df_tx_fd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20079 entries, 0 to 20078
Data columns (total 11 columns):
STATION      20079 non-null object
NAME         20079 non-null object
LATITUDE     20079 non-null float64
LONGITUDE    20079 non-null float64
ELEVATION    20079 non-null float64
DATE         20079 non-null object
PRCP         20059 non-null float64
SNOW         17558 non-null float64
TMAX         19919 non-null float64
TMIN         19942 non-null float64
WDMV         5011 non-null float64
dtypes: float64(8), object(3)
memory usage: 1.7+ MB


In [13]:
# Check the wind data available
df_tx_wind_fd = df_tx_fd[["DATE", "WDMV"]].dropna()
print("Wind data ranges from " + str(df_tx_wind_fd["DATE"].min()) + " to " + str(df_tx_wind_fd["DATE"].max()))

Wind data ranges from 2003-05-01 to 2017-09-30


## Missing Wind Data for Falcon Dam, TX...................................Combination Candidate
I noticed in the pandas DataFrame information summary that a large majority of the wind data is missing and will not adequately cover our sugarcane production data from the USDA/NASS. Therefore, I moved on to the next weather nearest weather station.

***
## Combining Weather Station Information for Texas
It looks like the following weather stations' wind data could be combined to provide a relatively complete weather dataset for Texas:
1. Weslaco, TX
2. Rio Grande City, TX

To be clearer, I will be combining not only the wind data (the more critical missing element), but also the temperature, precipitation, and snowfall data for these combination candidates as well.

***
## Evaluating and Combining Wind Movement Data

In [14]:
# Reshape the historical weather data for McCook, TX to extract only the relavent data
# for future analysis
df_tx_wind_f = df_tx_fd[["DATE", "WDMV"]]
df_tx_wind_f.DATE = pd.to_datetime(df_tx_wind_f.DATE)
df_tx_wind_f = df_tx_wind_f.set_index("DATE")
df_tx_wind_f = df_tx_wind_f.sort_index().dropna()
df_tx_wind_f["Year"] = df_tx_wind_f.index.year
df_tx_wind_f["Month"] = df_tx_wind_f.index.month
df_tx_wind_f["YYYY-MM"] = df_tx_wind_f.index.strftime("%Y-%m")

# Reshape the historical weather data for Rio Grande City, TX to extract only the relavent data
# for future analysis
df_tx_wind_r = df_tx_rg[["DATE", "WDMV"]]
df_tx_wind_r.DATE = pd.to_datetime(df_tx_wind_r.DATE)
df_tx_wind_r = df_tx_wind_r.set_index("DATE")
df_tx_wind_r = df_tx_wind_r.sort_index().dropna()
df_tx_wind_r["Year"] = df_tx_wind_r.index.year
df_tx_wind_r["Month"] = df_tx_wind_r.index.month
df_tx_wind_r["YYYY-MM"] = df_tx_wind_r.index.strftime("%Y-%m")



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



In [15]:
# Plot the historical weather combination Dataframe candidates to visually inspect their respective
# data integrity
# Generate an interactive plot using the plotly package
trace1 = go.Scatter(x=df_tx_wind_f.index, y=df_tx_wind_f.WDMV, mode="lines",
                    line=dict(color="rgb(35,122,181)"), name="Falcon Dam, TX")
trace2 = go.Scatter(x=df_tx_wind_r.index, y=df_tx_wind_r.WDMV, mode="lines",
                    line=dict(color="rgb(255,127,14)"), name="Rio Grande City, TX")
trace3 = go.Box(y=df_tx_wind_f.WDMV, marker=dict(color="rgb(35,122,181)"),
                boxmean="sd", name="Falcon Dam, TX")
trace4 = go.Box(y=df_tx_wind_r.WDMV, marker=dict(color="rgb(255,127,14)"),
                boxmean="sd", name="Rio Grande City, TX")

fig = tools.make_subplots(rows=2, cols=2, shared_xaxes=True, shared_yaxes=True,
                          vertical_spacing=0.01)
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 1, 2)
fig.append_trace(trace4, 2, 2)

fig["layout"].update(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    height=675,
    title="<b>Historical Wind Data for Texas Station Combination Candidates</b>",
    titlefont=dict(family="serif", size=24),
    yaxis1=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    yaxis2=dict(title="<b>Wind Movement, miles</b>",
                titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14), anchor="x"),
    yaxis3=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    yaxis4=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14), anchor="x2"),
    yaxis5=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    yaxis6=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    xaxis1=dict(title="<b>Year</b>", titlefont=dict(family="serif", size=14), 
                tickfont=dict(family="serif", size=14), domain=[0, 0.85]),
    xaxis2=dict(tickfont=dict(family="serif", size=14), domain=[0.9, 1]),
    xaxis3=dict(title="<b>Year</b>", titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14), domain=[0, 0.85]),
    xaxis4=dict(tickfont=dict(family="serif", size=14), domain=[0.9, 1]),
    xaxis5=dict(title="<b>Year</b>", titlefont=dict(family="serif", size=14), 
                tickfont=dict(family="serif", size=14), domain=[0, 0.85]),
    xaxis6=dict(tickfont=dict(family="serif", size=14), domain=[0.9, 1]))

py.offline.iplot(fig)

This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y1 ]
[ (2,1) x1,y2 ]  [ (2,2) x2,y2 ]



## Inspect the Wind Anomalies and Remove from Dataset if Applicable
We can see in the above interactive plotly scatter plot, that we're getting unexplainable high wind recordings on the Falcon Dam, TX weather station of 8783.1, 7178.1, and 8451.3 miles on the following dates respectively:
1. 2014-06-06
2. 2014-08-19
3. 2014-09-01

I used the following websites to determine if these high wind recordings are related to past hurricanes, but these readings are still unexplainably high.
https://www.weather.gov/media/lch/events/txhurricanehistory.pdf
https://en.wikipedia.org/wiki/List_of_Texas_hurricanes_(1980%E2%80%93present)

Let's remove these anomalies from the daily summary data, then reproduce the monthly subplots from above.

In [16]:
# Remove the daily data anomalies 
df_tx_wind_f.WDMV = [np.nan if w >= 999 else w for w in df_tx_wind_f.WDMV]

# Plot the historical weather combination Dataframe candidates to visually inspect their respective
# data integrity
# Generate an interactive plot using the plotly package
trace1 = go.Scatter(x=df_tx_wind_f.index, y=df_tx_wind_f.WDMV, mode="lines",
                    line=dict(color="rgb(35,122,181)"), name="Falcon Dam, TX")
trace2 = go.Scatter(x=df_tx_wind_r.index, y=df_tx_wind_r.WDMV, mode="lines",
                    line=dict(color="rgb(255,127,14)"), name="Rio Grande City, TX")
trace3 = go.Box(y=df_tx_wind_f.WDMV, marker=dict(color="rgb(35,122,181)"),
                boxmean="sd", name="Falcon Dam, TX")
trace4 = go.Box(y=df_tx_wind_r.WDMV, marker=dict(color="rgb(255,127,14)"),
                boxmean="sd", name="Rio Grande City, TX")

fig = tools.make_subplots(rows=2, cols=2, shared_xaxes=True, shared_yaxes=True,
                          vertical_spacing=0.01)
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 1, 2)
fig.append_trace(trace4, 2, 2)

fig["layout"].update(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    height=675,
    title="<b>Historical Wind Data for Texas Station Combination Candidates</b>",
    titlefont=dict(family="serif", size=24),
    yaxis1=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    yaxis2=dict(title="<b>Wind Movement, miles</b>",
                titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14), anchor="x"),
    yaxis3=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    yaxis4=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14), anchor="x2"),
    yaxis5=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    yaxis6=dict(titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    xaxis1=dict(title="<b>Year</b>", titlefont=dict(family="serif", size=14), 
                tickfont=dict(family="serif", size=14), domain=[0, 0.85]),
    xaxis2=dict(tickfont=dict(family="serif", size=14), domain=[0.9, 1]),
    xaxis3=dict(title="<b>Year</b>", titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14), domain=[0, 0.85]),
    xaxis4=dict(tickfont=dict(family="serif", size=14), domain=[0.9, 1]),
    xaxis5=dict(title="<b>Year</b>", titlefont=dict(family="serif", size=14), 
                tickfont=dict(family="serif", size=14), domain=[0, 0.85]),
    xaxis6=dict(tickfont=dict(family="serif", size=14), domain=[0.9, 1]))

py.offline.iplot(fig)

This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y1 ]
[ (2,1) x1,y2 ]  [ (2,2) x2,y2 ]



## Combine the Validated Wind Data

In [17]:
# Join the three weather stations' wind data together and take the average. Also, prepare
# the new DataFrame for pivoting the data into 
df_tx_wind = df_tx_wind_r[["WDMV"]].join(df_tx_wind_f[["WDMV"]], how="outer", lsuffix="_R", rsuffix="_F") \
                                   .mean(axis=1).to_frame(name="WDMV").dropna()
df_tx_wind["Year"] = df_tx_wind.index.year
df_tx_wind["Month"] = df_tx_wind.index.month
df_tx_wind["YYYY-MM"] = df_tx_wind.index.strftime("%Y-%m")

# Data reshaped to be indexed by Year and split into Monthly columns with the appropriate aggregation method
df_tx_monthly_wind = df_tx_wind.pivot_table(index="Year", columns="Month", values="WDMV", aggfunc="sum")

In [18]:
# Generate an interactive plot using the plotly package
trace1 = go.Scatter(x=df_tx_wind.index, y=df_tx_wind.WDMV, mode="lines", name="Time Line")
trace2 = go.Box(y=df_tx_monthly_wind[1], boxmean="sd", name="Jan")
trace3 = go.Box(y=df_tx_monthly_wind[2], boxmean="sd", name="Feb")
trace4 = go.Box(y=df_tx_monthly_wind[3], boxmean="sd", name="Mar")
trace5 = go.Box(y=df_tx_monthly_wind[4], boxmean="sd", name="Apr")
trace6 = go.Box(y=df_tx_monthly_wind[5], boxmean="sd", name="May")
trace7 = go.Box(y=df_tx_monthly_wind[6], boxmean="sd", name="Jun")
trace8 = go.Box(y=df_tx_monthly_wind[7], boxmean="sd", name="Jul")
trace9 = go.Box(y=df_tx_monthly_wind[8], boxmean="sd", name="Aug")
trace10 = go.Box(y=df_tx_monthly_wind[9], boxmean="sd", name="Sep")
trace11 = go.Box(y=df_tx_monthly_wind[10], boxmean="sd", name="Oct")
trace12 = go.Box(y=df_tx_monthly_wind[11], boxmean="sd", name="Nov")
trace13 = go.Box(y=df_tx_monthly_wind[12], boxmean="sd", name="Dec")

fig = tools.make_subplots(rows=2, cols=1, vertical_spacing=0.05)
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 2, 1)
fig.append_trace(trace5, 2, 1)
fig.append_trace(trace6, 2, 1)
fig.append_trace(trace7, 2, 1)
fig.append_trace(trace8, 2, 1)
fig.append_trace(trace9, 2, 1)
fig.append_trace(trace10, 2, 1)
fig.append_trace(trace11, 2, 1)
fig.append_trace(trace12, 2, 1)
fig.append_trace(trace13, 2, 1)

fig["layout"].update(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    height=600,
    title="<b>Historical Wind Data for Texas Post Combination</b>",
    titlefont=dict(family="serif", size=24),
    yaxis1=dict(title="<b>Mean Wind Movement, miles</b>", titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    yaxis2=dict(title="<b>Cummulative Wind Movement, miles</b>", 
                titlefont=dict(family="serif", size=14),
                tickfont=dict(family="serif", size=14)),
    xaxis1=dict(tickfont=dict(family="serif", size=14)),
    xaxis2=dict(tickfont=dict(family="serif", size=14)))

py.offline.iplot(fig)

This is the format of your plot grid:
[ (1,1) x1,y1 ]
[ (2,1) x2,y2 ]



***
## Further Data Integrity Checks for Rio Grande City, TX
The wind data appears to cover enough of the Florida sugarcane production data. Therefore, I continued with data integrity validation using more pandas DataFrames as well as a varity of different plotly interactive chart types.

In [19]:
# Reshape the remaining historical weather data for Rio Grande City, TX to extract only the relavent 
# data# for future analysis
df_tx = df_tx_rg[["DATE", "PRCP", "SNOW", "TMAX", "TMIN"]] \
                .sort_values("DATE")
df_tx.DATE = pd.to_datetime(df_tx.DATE)
df_tx = df_tx.set_index("DATE")
df_tx["Year"] = df_tx.index.year
df_tx["Month"] = df_tx.index.month
df_tx["YYYY-MM"] = df_tx.index.strftime("%Y-%m")

# Data reshaped/resampled to be indexed by YYYY-MM with the appropriate aggregation method
df_tx_lines = df_tx.pivot_table(index="YYYY-MM", values=["TMAX","TMIN","PRCP","SNOW"], aggfunc="mean")

# Data reshaped to be indexed by Year and split into Monthly columns with the appropriate aggregation method
df_tx_tmax = df_tx[["Year","Month","TMAX"]].dropna()
df_tx_tmax = df_tx_tmax.pivot_table(index="Year", columns="Month", values="TMAX", aggfunc="mean").round(2)
df_tx_tmin = df_tx[["Year","Month","TMIN"]].dropna()
df_tx_tmin = df_tx_tmin.pivot_table(index="Year", columns="Month", values="TMIN", aggfunc="mean").round(2)
df_tx_prcp = df_tx[["Year","Month","PRCP"]].dropna()
df_tx_prcp = df_tx_prcp.pivot_table(index="Year", columns="Month", values="PRCP", aggfunc="sum")
df_tx_snow = df_tx[["Year","Month","SNOW"]].dropna()
df_tx_snow = df_tx_snow.pivot_table(index="Year", columns="Month", values="SNOW", aggfunc="sum")

In [20]:
# Plot the historical weather Dataframe for Rio Grande City, TX for some quick Exploratory
# Data Analysis (EDA)
# Generate an interactive plot using the plotly package
layout = go.Layout(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    title="<b>Historical Temperature Data for Rio Grande City, TX</b>",
    titlefont=dict(family="serif", size=24),
    yaxis=dict(title="<b>Mean Temperature, {0}F</b>".format(u'\xb0'), 
               titlefont=dict(family="serif", size=14),
               tickfont=dict(family="serif", size=14)),
    xaxis=dict(title="<b>Year</b>", titlefont=dict(family="serif", size=14), 
               tickfont=dict(family="serif", size=14)))

trace1 = go.Scatter(x=df_tx_lines.index, y=df_tx_lines.TMAX, mode="lines", name="tmax")
trace2 = go.Scatter(x=df_tx_lines.index, y=df_tx_lines.TMIN, mode="lines", name="tmin")

fig = go.Figure(data=[trace1, trace2], layout=layout)

py.offline.iplot(fig)

In [21]:
# Plot the historical weather Dataframe forRio Grande City, TX for some quick Exploratory
# Data Analysis (EDA)
# Generate an interactive plot using the plotly package
layout = go.Layout(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    title="<b>Historical Maximum Temperature Data for Rio Grande City, TX</b>",
    titlefont=dict(family="serif", size=24),
    yaxis=dict(title="<b>Mean Temperature, {0}F</b>".format(u'\xb0'), 
               titlefont=dict(family="serif", size=14),
               tickfont=dict(family="serif", size=14)),
    xaxis=dict(title="<b>Month</b>", titlefont=dict(family="serif", size=14), 
               tickfont=dict(family="serif", size=14)))

trace1 = go.Box(y=df_tx_tmax[1], boxmean="sd", name="Jan")
trace2 = go.Box(y=df_tx_tmax[2], boxmean="sd", name="Feb")
trace3 = go.Box(y=df_tx_tmax[3], boxmean="sd", name="Mar")
trace4 = go.Box(y=df_tx_tmax[4], boxmean="sd", name="Apr")
trace5 = go.Box(y=df_tx_tmax[5], boxmean="sd", name="May")
trace6 = go.Box(y=df_tx_tmax[6], boxmean="sd", name="Jun")
trace7 = go.Box(y=df_tx_tmax[7], boxmean="sd", name="Jul")
trace8 = go.Box(y=df_tx_tmax[8], boxmean="sd", name="Aug")
trace9 = go.Box(y=df_tx_tmax[9], boxmean="sd", name="Sep")
trace10 = go.Box(y=df_tx_tmax[10], boxmean="sd", name="Oct")
trace11 = go.Box(y=df_tx_tmax[11], boxmean="sd", name="Nov")
trace12 = go.Box(y=df_tx_tmax[12], boxmean="sd", name="Dec")

fig = go.Figure(data=[trace1, trace2, trace3, trace4, trace5, trace6, \
                      trace7, trace8, trace9, trace10, trace11, trace12], layout=layout)

py.offline.iplot(fig)

In [22]:
# Plot the historical weather Dataframe forRio Grande City, TX for some quick Exploratory
# Data Analysis (EDA)
# Generate an interactive plot using the plotly package
layout = go.Layout(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    title="<b>Historical Minimum Temperature Data for Rio Grande City, TX</b>",
    titlefont=dict(family="serif", size=24),
    yaxis=dict(title="<b>Mean Temperature, {0}F</b>".format(u'\xb0'), 
               titlefont=dict(family="serif", size=14),
               tickfont=dict(family="serif", size=14)),
    xaxis=dict(title="<b>Month</b>", titlefont=dict(family="serif", size=14), 
               tickfont=dict(family="serif", size=14)))

trace1 = go.Box(y=df_tx_tmin[1], boxmean="sd", name="Jan")
trace2 = go.Box(y=df_tx_tmin[2], boxmean="sd", name="Feb")
trace3 = go.Box(y=df_tx_tmin[3], boxmean="sd", name="Mar")
trace4 = go.Box(y=df_tx_tmin[4], boxmean="sd", name="Apr")
trace5 = go.Box(y=df_tx_tmin[5], boxmean="sd", name="May")
trace6 = go.Box(y=df_tx_tmin[6], boxmean="sd", name="Jun")
trace7 = go.Box(y=df_tx_tmin[7], boxmean="sd", name="Jul")
trace8 = go.Box(y=df_tx_tmin[8], boxmean="sd", name="Aug")
trace9 = go.Box(y=df_tx_tmin[9], boxmean="sd", name="Sep")
trace10 = go.Box(y=df_tx_tmin[10], boxmean="sd", name="Oct")
trace11 = go.Box(y=df_tx_tmin[11], boxmean="sd", name="Nov")
trace12 = go.Box(y=df_tx_tmin[12], boxmean="sd", name="Dec")

fig = go.Figure(data=[trace1, trace2, trace3, trace4, trace5, trace6, \
                      trace7, trace8, trace9, trace10, trace11, trace12], layout=layout)

py.offline.iplot(fig)

In [23]:
# Plot the historical weather Dataframe for Rio Grande City, TX for some quick Exploratory
# Data Analysis (EDA)
# Generate an interactive plot using the plotly package
layout = go.Layout(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    title="<b>Historical Precipitation Data for Rio Grande City, TX</b>",
    titlefont=dict(family="serif", size=24),
    yaxis=dict(title="<b>Mean Precipitation, inches</b>", 
               titlefont=dict(family="serif", size=14),
               tickfont=dict(family="serif", size=14)),
    xaxis=dict(title="<b>Year</b>", titlefont=dict(family="serif", size=14), 
               tickfont=dict(family="serif", size=14)))

trace1 = go.Scatter(x=df_tx_lines.index, y=df_tx_lines.PRCP, mode="lines", name="prcp")

fig = go.Figure(data=[trace1], layout=layout)

py.offline.iplot(fig)

In [24]:
# Plot the historical weather Dataframe for Rio Grande City, TX for some quick Exploratory
# Data Analysis (EDA)
# Generate an interactive plot using the plotly package
layout = go.Layout(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    title="<b>Historical Precipitation Data for Rio Grande City, TX</b>",
    titlefont=dict(family="serif", size=24),
    yaxis=dict(title="<b>Cummulative Precipitation, inches</b>", 
               titlefont=dict(family="serif", size=14),
               tickfont=dict(family="serif", size=14)),
    xaxis=dict(title="<b>Month</b>", titlefont=dict(family="serif", size=14), 
               tickfont=dict(family="serif", size=14)))

trace1 = go.Box(y=df_tx_prcp[1], boxmean="sd", name="Jan")
trace2 = go.Box(y=df_tx_prcp[2], boxmean="sd", name="Feb")
trace3 = go.Box(y=df_tx_prcp[3], boxmean="sd", name="Mar")
trace4 = go.Box(y=df_tx_prcp[4], boxmean="sd", name="Apr")
trace5 = go.Box(y=df_tx_prcp[5], boxmean="sd", name="May")
trace6 = go.Box(y=df_tx_prcp[6], boxmean="sd", name="Jun")
trace7 = go.Box(y=df_tx_prcp[7], boxmean="sd", name="Jul")
trace8 = go.Box(y=df_tx_prcp[8], boxmean="sd", name="Aug")
trace9 = go.Box(y=df_tx_prcp[9], boxmean="sd", name="Sep")
trace10 = go.Box(y=df_tx_prcp[10], boxmean="sd", name="Oct")
trace11 = go.Box(y=df_tx_prcp[11], boxmean="sd", name="Nov")
trace12 = go.Box(y=df_tx_prcp[12], boxmean="sd", name="Dec")

fig = go.Figure(data=[trace1, trace2, trace3, trace4, trace5, trace6, \
                      trace7, trace8, trace9, trace10, trace11, trace12], layout=layout)

py.offline.iplot(fig)

In [25]:
# Plot the historical weather Dataframe for Rio Grande City, TX for some quick Exploratory
# Data Analysis (EDA)
# Generate an interactive plot using the plotly package
layout = go.Layout(
    plot_bgcolor="rgb(247,247,247)",
    legend=dict(font=dict(family="serif", size=12)),
    title="<b>Historical Snowfall Data for Rio Grande City, TX</b>",
    titlefont=dict(family="serif", size=24),
    yaxis=dict(title="<b>Mean Snowfall, inches</b>", 
               titlefont=dict(family="serif", size=14),
               tickfont=dict(family="serif", size=14)),
    xaxis=dict(title="<b>Year</b>", titlefont=dict(family="serif", size=14), 
               tickfont=dict(family="serif", size=14)))

trace1 = go.Scatter(x=df_tx_lines.index, y=df_tx_lines.SNOW, mode="lines", name="snow")

fig = go.Figure(data=[trace1], layout=layout)

py.offline.iplot(fig)

***
## Create the Texas Weather DataFrame from the Combinations and Export

In [26]:
df_tx = df_tx[["TMAX", "TMIN", "PRCP", "SNOW"]].join(df_tx_wind[["WDMV"]], how="outer")

df_tx.to_csv("df_tx_weather.csv")