# AEK201, OpenWeather, and Surface Water
by Mere

OpenWeather data is hourly, well data for AEK201 is hourly, but surface water data for the Spokane river is daily, so if we want to use surface water data, we'll want to convert the weather and well data to hourly.

In this notebook, I'll do two main things:
* Make a df containing total *daily* precipitation (rain + snow), average daily well measurement data, and daily surface water data
    - Plot these three together on another set of axes
* Make a df containing total hourly precipitation (rain + snow) and hourly well measurement data
    - Plot those on the same set of axes
    
### To do:
I still need to make some time zone/Daylight Savings Time stuff if we want to work on an hourly level. I think for the hourly well data, it'll be best to translate everything to UTC and then translate back the way I do for the weather data, but if we're working on the 

I actually still need to make the **plots** for the daily data too, but I think the df is good (may have some missing data though, particularly for the river, which we need to make decisions about)

In [55]:
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt

In [149]:
# read in data

well_data = pd.read_csv('./data/EIM-data-AEK201/EIMTimeSeriesResults_2023Oct22_222975.csv',
                        low_memory=False)

weather_data = pd.read_csv('./data/open-weather-spokane.csv')

river_data = pd.read_csv('./data/USGS-Surface-Water-Site-12422500.tsv',
                         low_memory=False,
                         delimiter='\t',
                         comment='#')

In [150]:
# get the *daily* precipitation data

weather_data = weather_data[['dt_iso',
                             'rain_1h', #Rain volume for the last hour, mm
                             'snow_1h']] #Snow volume for the last hour, mm (in liquid state)

# change the dt_iso into actual datetimes
# the to_datetime method doesn't like the current format,
# so drop the last chunk
def trunc(isodt):
    return isodt[0:-10]

weather_data['dt_iso'] = weather_data['dt_iso'].apply(trunc)

weather_data['dt_iso'] = pd.to_datetime(weather_data['dt_iso'],
                                        utc=True)

# get a window of years near our well data
# (will reduce further later when we combine with the well data
# this is just to speed up things for now)
weather_data = weather_data.loc[(weather_data.dt_iso.dt.year > 2006) 
                                & 
                                (weather_data.dt_iso.dt.year < 2018)]

# convert to US/Pacific time and get the dates
weather_data['dt_pac'] = weather_data['dt_iso'].dt.tz_convert('US/Pacific')
weather_data['date'] = weather_data.dt_pac.dt.date

# get hourly total precipitation
weather_data[['rain_1h','snow_1h']] = weather_data[['rain_1h','snow_1h']].fillna(value=0)
weather_data['precip'] = weather_data.rain_1h + weather_data.snow_1h
weather_data.reset_index(inplace=True)


# make a df with daily precipitation total
daily_precip = weather_data[['precip','date']].groupby(weather_data.date)[['precip']].sum()
daily_precip.reset_index(inplace=True)


In [151]:
display(weather_data)
display(daily_precip)

Unnamed: 0,index,dt_iso,rain_1h,snow_1h,dt_pac,date,precip
0,260015,2007-01-01 00:00:00+00:00,0.0,0.0,2006-12-31 16:00:00-08:00,2006-12-31,0.0
1,260016,2007-01-01 01:00:00+00:00,0.0,0.0,2006-12-31 17:00:00-08:00,2006-12-31,0.0
2,260017,2007-01-01 02:00:00+00:00,0.0,0.0,2006-12-31 18:00:00-08:00,2006-12-31,0.0
3,260018,2007-01-01 03:00:00+00:00,0.0,0.0,2006-12-31 19:00:00-08:00,2006-12-31,0.0
4,260019,2007-01-01 04:00:00+00:00,0.0,0.0,2006-12-31 20:00:00-08:00,2006-12-31,0.0
...,...,...,...,...,...,...,...
101934,361949,2017-12-31 19:00:00+00:00,0.0,0.0,2017-12-31 11:00:00-08:00,2017-12-31,0.0
101935,361950,2017-12-31 20:00:00+00:00,0.0,0.0,2017-12-31 12:00:00-08:00,2017-12-31,0.0
101936,361951,2017-12-31 21:00:00+00:00,0.0,0.0,2017-12-31 13:00:00-08:00,2017-12-31,0.0
101937,361952,2017-12-31 22:00:00+00:00,0.0,0.0,2017-12-31 14:00:00-08:00,2017-12-31,0.0


Unnamed: 0,date,precip
0,2006-12-31,0.00
1,2007-01-01,1.03
2,2007-01-02,3.95
3,2007-01-03,9.31
4,2007-01-04,1.20
...,...,...
4014,2017-12-27,2.23
4015,2017-12-28,6.74
4016,2017-12-29,43.81
4017,2017-12-30,2.26


In [152]:
# extract, rename, and combine relevant columns of the well data

data = well_data.loc[well_data['Result_Parameter_Name']==
                     'Water level in well (depth below measuring point)'][['Field_Collection_Date_Time',
                                                                           'Result_Value']].copy()
short_names={'Field_Collection_Date_Time':'well_meas_time',
                 'Result_Value':'water_depth'}

data = data.rename(columns=short_names)

data.well_meas_time = pd.to_datetime(data.well_meas_time)
# need to mess with format since I get a warning here, but it does work

# make a date and average per day
data['date'] = data.well_meas_time.dt.date

daily_data = data[['water_depth','date']].groupby(data.date)[['water_depth']].mean()
daily_data.reset_index(inplace=True)

  data.well_meas_time = pd.to_datetime(data.well_meas_time)


In [153]:
# cleaning via Marcos
river_data = river_data.drop(0, axis=0)

daily_river = river_data[['datetime','149640_00060_00003','149641_00065_00003']]

headers = {'datetime':'date', '149640_00060_00003':'discharge_cfs', '149641_00065_00003':'gage_ht'}
daily_river = daily_river.rename(columns=headers)

daily_river['date'] = pd.to_datetime(daily_river['date']).dt.date
daily_river['discharge_cfs'] = daily_river['discharge_cfs'].astype(float)
daily_river['gage_ht'] = daily_river['gage_ht'].astype(float)

In [155]:
# merge daily well water_depth and daily precip
daily_data = daily_data.merge(daily_precip, how='inner', on='date')

# merge with daily river
daily_data = daily_data.merge(daily_river, how='left', on='date')

In [157]:
daily_data = daily_data.drop(0, axis=0).reset_index()
display(daily_data)

Unnamed: 0,index,date,water_depth,precip,discharge_cfs,gage_ht
0,1,2007-01-01,66.457083,1.03,5620.0,20.24
1,2,2007-01-02,66.462500,3.95,5620.0,20.24
2,3,2007-01-03,66.422083,9.31,5660.0,20.26
3,4,2007-01-04,66.375833,1.20,6370.0,20.55
4,5,2007-01-05,66.245833,3.41,7110.0,20.84
...,...,...,...,...,...,...
3919,3920,2017-09-24,69.101458,0.00,2400.0,18.61
3920,3921,2017-09-25,69.022083,0.00,2400.0,18.61
3921,3922,2017-09-26,68.966333,0.00,2390.0,18.61
3922,3923,2017-09-27,68.887042,0.00,2410.0,18.62


In [None]:
# plots go here!

## Dealing with timezones for hourly measurements

In [91]:
data = well_data.loc[well_data['Result_Parameter_Name']==
                     'Water level in well (depth below measuring point)'][['Field_Collection_Date_Time',
                                                                           'Time_Zone',
                                                                           'Result_Value']].copy()

short_names={'Field_Collection_Date_Time':'well_meas_time',
             'Time_Zone':'timezone',
                 'Result_Value':'water_depth'}

data = data.rename(columns=short_names)

data

Unnamed: 0,well_meas_time,timezone,water_depth
0,1/1/2007 12:00:00 AM,PDT - Pacific Daylight Time (GMT-7),66.43000
2,1/1/2007 1:00:00 AM,PDT - Pacific Daylight Time (GMT-7),66.45000
4,1/1/2007 2:00:00 AM,PDT - Pacific Daylight Time (GMT-7),66.40000
6,1/1/2007 3:00:00 AM,PDT - Pacific Daylight Time (GMT-7),66.43000
8,1/1/2007 4:00:00 AM,PDT - Pacific Daylight Time (GMT-7),66.43000
...,...,...,...
222965,12/31/2016 7:00:00 PM,PDT - Pacific Daylight Time (GMT-7),66.93494
222967,12/31/2016 8:00:00 PM,PDT - Pacific Daylight Time (GMT-7),66.94403
222969,12/31/2016 9:00:00 PM,PDT - Pacific Daylight Time (GMT-7),66.93914
222971,12/31/2016 10:00:00 PM,PDT - Pacific Daylight Time (GMT-7),66.94816
