For each 24 hour day (midnight tomidnight) we'd like to know the following:
* Temperature and barometric pressure values at sunrise and sunset
* Difference between temp and pressure at sunrise and sunset
* Maximum, minimum and averages for temp and pressure for both of the time frames of sunset to sunrise and sunrise to sunset
* Differences between the max, min and average for both of these time frames.


# Load libraries

In [1]:
# pandas for data structure
import pandas as pd

# Load data

#### Weather Data

* data collected from [Wunderground](https://www.wunderground.com/weather/api/)
* hosted file: [Google Drive](https://drive.google.com/file/d/1eS0gGM14g7iFulUeqz3XwbKb5OtK9aSI/view)

In [2]:
# local file
filename_wunderground = '../data/wunderground-170701_171101-day_night.csv'

In [3]:
# load data into dataframes
wund = pd.read_csv(filename_wunderground, parse_dates=['utc_date'])

In [4]:
wund['utc_date'] = wund['utc_date'].dt.tz_localize('utc')

In [5]:
# localize datetime make local_date column
wund['local_date'] = pd.to_datetime(wund.loc[:, 'utc_date']).dt.tz_convert('US/Mountain')

In [43]:
wund = wund.set_index('local_date')

In [266]:
wund = wund[['station_id', 'pressurei', 'pressurem', 'tempi', 'tempm', 'utc_date']]

In [267]:
wund.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 727764 entries, 2017-07-01 00:12:00-06:00 to 2017-10-31 23:56:00-06:00
Data columns (total 6 columns):
station_id    727764 non-null object
pressurei     727764 non-null float64
pressurem     727764 non-null float64
tempi         727764 non-null float64
tempm         727764 non-null float64
utc_date      727764 non-null datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1), float64(4), object(1)
memory usage: 58.9+ MB


In [268]:
wund.head(2)

Unnamed: 0_level_0,station_id,pressurei,pressurem,tempi,tempm,utc_date
local_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-07-01 00:12:00-06:00,KMTCORVA9,26.0,880.4,58.8,14.9,2017-07-01 06:12:00+00:00
2017-07-01 00:28:00-06:00,KMTCORVA9,26.0,880.4,59.0,15.0,2017-07-01 06:28:00+00:00


#### Sunset Sunrise Data

In [210]:
# Load Sunset Sunrise data
sun_filename = '../data/sunrise_sunset-wunderground-utc.csv'
sun = pd.read_csv(sun_filename, parse_dates=['sunrise', 'sunset'])

In [211]:
# Select a subset of loaded DataFrame
sun = sun[['station_id', 'sunrise', 'sunset']]

In [212]:
# Rename columns
sun.columns = ['station_id', 'sunrise_utc', 'sunset_utc']

In [213]:
# Localize datetime to UTC
sun['sunrise_utc'] = sun['sunrise_utc'].dt.tz_localize('utc')
sun['sunset_utc'] = sun['sunset_utc'].dt.tz_localize('utc')

In [214]:
# Create US/Mountain datetimes
sun['sunrise_local'] = pd.to_datetime(sun.loc[:, 'sunrise_utc']).dt.tz_convert('US/Mountain')
sun['sunset_local'] = pd.to_datetime(sun.loc[:, 'sunset_utc']).dt.tz_convert('US/Mountain')

In [246]:
# Reorder columns
sun = sun[['station_id', 'sunrise_local', 'sunset_local', 'sunrise_utc', 'sunset_utc']]

In [250]:
# Drop rows which start 2017-06-30
sun = sun[sun.sunset_local.dt.month != 6]

In [254]:
# Reset index
sun = sun.reset_index(drop=True)

In [259]:
sun.head(2)

Unnamed: 0,station_id,sunrise_local,sunset_local,sunrise_utc,sunset_utc
0,KMTCORVA9,2017-07-01 05:48:49-06:00,2017-07-01 21:31:41-06:00,2017-07-01 11:48:49+00:00,2017-07-02 03:31:41+00:00
1,KMTCORVA9,2017-07-02 05:49:27-06:00,2017-07-02 21:31:25-06:00,2017-07-02 11:49:27+00:00,2017-07-03 03:31:25+00:00


# New DataFrame

In [327]:
# Column names
columns = ['rise_tempi','set_tempi','rise_pressurei','set_pressurei', 'rise_set_diff', 'set_rise_diff',
          'set_rise_max', 'set_rise_min', 'set_rise_ave', 'rise_set_max', 'rise_set_min', 'rise_set_ave',
          'set_rise_rise_set_max_diff', 'set_rise_rise_set_min_diff', 'set_rise_rise_set_ave_diff']

In [329]:
calcs = pd.DataFrame(columns=columns)

In [330]:
calcs.info()

<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Data columns (total 15 columns):
rise_tempi                    0 non-null object
set_tempi                     0 non-null object
rise_pressurei                0 non-null object
set_pressurei                 0 non-null object
rise_set_diff                 0 non-null object
set_rise_diff                 0 non-null object
set_rise_max                  0 non-null object
set_rise_min                  0 non-null object
set_rise_ave                  0 non-null object
rise_set_max                  0 non-null object
rise_set_min                  0 non-null object
rise_set_ave                  0 non-null object
set_rise_rise_set_max_diff    0 non-null object
set_rise_rise_set_min_diff    0 non-null object
set_rise_rise_set_ave_diff    0 non-null object
dtypes: object(15)
memory usage: 0.0+ bytes


# Calculations

### Column for values at Sunrise and Sunset
* 'values' refers to temperature and pressure data
* indexed by day
* date will go sunrise to sunset
* columns = ['rise_tempi','set_tempi','rise_pressurei','set_pressurei']
* index = ['2017-07-01', .... '2017-10-31']

* Get closest wund.local_date to sun.sunrise_local
* [query the closest datetime index](https://stackoverflow.com/questions/42264848/pandas-dataframe-how-to-query-the-closest-datetime-index)

#### Query wund for closest datetime to light variable

In [319]:
def find_closest_weather_sample(station, date):
    idx = wund[wund.station_id == station].index.get_loc(date, method='nearest')
    return idx

In [326]:
# Find weather samples for sunset and sunrise
for row in sun.head(1).iterrows():
    station_id = row[1]['station_id']
    sunrise_local = row[1]['sunrise_local']
    sunset_local = row[1]['sunset_local']
    print(station_id, 'Sunrise:', sunrise_local)
    print('Nearest Weather Sample:')
    print(wund.iloc[find_closest_weather_sample(station_id, sunrise_local)])
    print('\n')
    

KMTCORVA9 Sunrise: 2017-07-01 05:48:49-06:00
Nearest Weather Sample:
station_id                    KMTCORVA9
pressurei                            26
pressurem                         880.4
tempi                              51.1
tempm                              10.6
utc_date      2017-07-01 11:51:00+00:00
Name: 2017-07-01 05:51:00-06:00, dtype: object




### Difference sunset, sunrise values

### Difference sunrise, sunset values

### Max, min, ave : sunset to sunrise : sunrise to sunset

### Difference max, min, ave sunset to sunrise : sunrise to sunset