# Aggregate Sensor Stats
In script A_load_and_combine_data we created a table called df_sensor_phase which contianed the full temperature and humidity sensor readings, together with the breeding_year, season_year and breeding_phase.

This script creates a set of aggregates from this data to understand the nest microclimates. The aggregates include:
* `Annual stats             -> df_sensor_stats_annual`
* `Seasonal stats           -> df_sensor_stats_seasonal`
* `Monthly stats            -> df_sensor_stats_monthly`
* `Daily stats              -> df_sensor_stats_daily`
* `Stats by breeding phase  -> df_sensor_stats_breeding_phase`



## 1. Set up the environment
### 1.1 Import the required libraries
We need a certain set of common libraries for the tasks to be performed. These are imported below. If an import statement errors, you will need to install the library in your environment using the command line command `pip install <library>`.

In [1]:
print('Setting up environment and variables...', flush=True)
import pandas as pd
import os
import numpy as np
import datetime
import time

# all the useful and reuseable functions are defined in helper_functions.py
from helper_functions import *

Setting up environment and variables...


### 1.2 Set up the variables
You will need to change the values of the variables below to suit the names and directory location of your files to be loaded.

In [2]:
# update these with your file paths
sensor_phase = os.path.normpath('./output/A_load_and_combine_data/df_sensor_phase.pkl')

# write intermediate tables to disk for debugging purposes
write_temps = True
df_sensor_data = None

log('Done.')

Mon Apr 10 21:03:01 2017 - Done.


### 1.3 Import the data file

In [4]:
df_sensor_phase = pd.read_pickle(sensor_phase)

### Create the aggregate temperature and humidity calculations

Calculation: **Number of days per month with a temp >= 35C**

In [5]:
log('   1. Days per month >= 35C...')
# get the records > 35
# Convert the datetime to a month and day (in addition to the existing breeding_year)
# Count the distinct dates per nest per year per month
df_temp_above_35C = df_sensor_phase.loc[df_sensor_phase['temp_c'] >= 35].reset_index()
gb_monthly_days_above_35C = df_temp_above_35C.groupby(['nest_id', 'breeding_year', 'month']).size()
log('   1. Done')

Mon Apr 10 21:09:15 2017 -    1. Days per month >= 35C...
Mon Apr 10 21:09:15 2017 -    1. Done


Calculation: **Number of days per season with a temp >= 35C**

**Note**: Assumes that Summer 2013 is Jan-Feb13 and Dec13; i.e. all the summer months in the year 2013 rather than the Summer season that starts in 2013 (which would be Dec13-Feb14).

In [6]:
log('   2. Days per season >= 35C...')
# use the >35C table from #1, sum for each season
gb_seasonal_days_above_35C = df_temp_above_35C.groupby(['nest_id', 'season_year', 'season']).size().reset_index()
log('   2. Done')

Mon Apr 10 21:09:18 2017 -    2. Days per season >= 35C...
Mon Apr 10 21:09:18 2017 -    2. Done


Calculation: **Number of days per year with a temp >= 35C**

In [7]:
log('   3. Days per year >= 35C...')
# use the >35C table from #1, sum for each year
gb_annual_days_above_35C = df_temp_above_35C.groupby(['nest_id', 'breeding_year']).size()
log('   3. Done')

Mon Apr 10 21:09:19 2017 -    3. Days per year >= 35C...
Mon Apr 10 21:09:19 2017 -    3. Done


Calculation: **Number of days per month with a temp >= 40C**

In [8]:
log('   4. Days per month >= 40C...')
# get the records > 40
# Convert the datetime to a month and day (in addition to the existing breeding_year)
# Count the distinct dates per nest per year per month
df_temp_above_40C = df_sensor_phase.loc[df_sensor_phase['temp_c'] >= 40].reset_index()
gb_monthly_days_above_40C = df_temp_above_40C.groupby(['nest_id', 'breeding_year', 'month']).size()
log('   4. Done')

Mon Apr 10 21:09:31 2017 -    4. Days per month >= 40C...
Mon Apr 10 21:09:31 2017 -    4. Done


Calculation: **Number of days per season with a temp >= 40C**

In [9]:
log('   5. Days per season >= 40C...')
# use the >40C table from #4, sum for each season
gb_seasonal_days_above_40C = df_temp_above_40C.groupby(['nest_id', 'season_year', 'season']).size()
log('   5. Done')

Mon Apr 10 21:09:38 2017 -    5. Days per season >= 40C...
Mon Apr 10 21:09:38 2017 -    5. Done


Calculation: **Number of days per year with a temp >= 40C**

In [10]:
log('   6. Days per year >= 40C...')
# use the >35C table from #1, sum for each year
gb_annual_days_above_40C = df_temp_above_40C.groupby(['nest_id', 'breeding_year']).size()
log('   6. Done')

Mon Apr 10 21:09:39 2017 -    6. Days per year >= 40C...
Mon Apr 10 21:09:39 2017 -    6. Done


Calculation: **Daily temperature and humidity stats**
# How to get a temp/humd range in here?

In [11]:
log('   7. Daily temp and humidity stats...')
agg = {
    'humidity': {
        'max': 'max',
        'min': 'min',
        'mean': 'mean',
        'median': 'median',
        'stddev': 'std',
    },
    'temp_c': {
        'max': 'max',
        'min': 'min',
        'mean': 'mean',
        'median': 'median',
        'stddev': 'std'
    }
}
gb_daily_sensor_stats = df_sensor_phase.groupby(['nest_id', 'breeding_year', 'season_year', 'calendar_year', 'breeding_phase', 'season', 'month', 'day']).agg(agg).reset_index()
gb_daily_sensor_stats.columns = [' '.join(col).strip() for col in gb_daily_sensor_stats.columns.values]
log('   7. Done.')

Mon Apr 10 21:09:40 2017 -    7. Daily temp and humidity stats...
Mon Apr 10 21:09:41 2017 -    7. Done.


Calculation: **Monthly temperature and humidity stats**

In [12]:
log('   8. Monthly temp and humidity stats...')
gb_monthly_sensor_stats = df_sensor_phase.groupby(['nest_id', 'breeding_year', 'season_year', 'calendar_year', 'breeding_phase', 'season', 'month']).agg(agg).reset_index()
gb_monthly_sensor_stats.columns = [' '.join(col).strip() for col in gb_monthly_sensor_stats.columns.values]
log('   8. Done.')

Mon Apr 10 21:09:44 2017 -    8. Monthly temp and humidity stats...
Mon Apr 10 21:09:45 2017 -    8. Done.


Calculation: **Seasonal temperature and humidity stats**

In [13]:
log('   9. Seasonal temp and humidity stats...')
gb_seasonal_sensor_stats = df_sensor_phase.groupby(['nest_id', 'breeding_year', 'season_year', 'calendar_year', 'breeding_phase', 'season']).agg(agg).reset_index()
gb_seasonal_sensor_stats.columns = [' '.join(col).strip() for col in gb_seasonal_sensor_stats.columns.values]
log('   9. Done.')

Mon Apr 10 21:09:47 2017 -    9. Seasonal temp and humidity stats...
Mon Apr 10 21:09:48 2017 -    9. Done.


Calculation: **Annual temperature and humidity stats**

In [14]:
log('   10. Annual temp and humidity stats...')
gb_annual_sensor_stats = df_sensor_phase.groupby(['nest_id', 'breeding_year', 'season_year', 'calendar_year', 'breeding_phase', 'season', 'month', 'day']).agg(agg).reset_index()
gb_annual_sensor_stats.columns = [' '.join(col).strip() for col in gb_annual_sensor_stats.columns.values]
log('   10. Done.')

Mon Apr 10 21:09:50 2017 -    10. Annual temp and humidity stats...
Mon Apr 10 21:09:52 2017 -    10. Done.


Calculation: **Mean min temp and humidity by month, season, year, phase**

In [None]:
log('   7. Mean min temp and humidity...')
log('   7. Done.')

Calculation: **Mean max temp and humidity by month, season, year, phase**

In [None]:
log('   8. Mean max temp and humidity...')

log('   8. Done.')

Calculation: **Mean temp and humidity range by month, season, year, phase**

In [None]:
log('   9. Mean temp and humidity range...')

log('   9. Done.')

Calculation: **Hours exceeding 35C by year, season, month and phase**

In [None]:
log('   11. Hours exceeding 35C...')

log('   11. Done.')

Calculation: **Hours exceeding 40C by year, season, month and phase**

In [None]:
log('   12. Hours exceeding 40C...')

log('   12. Done.')

# Placeholder - need BOM data
Calculation: **Mean variance from ambient max temperature by year, season, month, phase**

In [None]:
log('   13. Mean monthly humidity...')

log('   13. Done.')

# Placeholder - need BOM data
Calculation: **Mean variance from ambient min temperature by year, season, month, phase**

In [None]:
log('   14. Mean monthly humidity...')

log('   14. Done.')

In [None]:
log('Calculating the sensor stats: Done.')

## Join the aggregates by their time period (all annual etc)

# Join the breeding success data to the aggregates

In [None]:
log('Cleaning up intermediate data tables...')

log('Done.')

**Pickle the two key data files for use in later scripts**

In [None]:
log('Writing the final tables to pickle for future use...')

log('Done.')