# Aggregate Sensor Stats
In script A_load_and_combine_data we created a table called df_sensor_phase which contianed the full temperature and humidity sensor readings, together with the breeding_year, season_year and breeding_phase.

This script creates a set of aggregates from this data to understand the nest microclimates. The aggregates include:
* `Annual stats             -> df_sensor_stats_annual`
* `Seasonal stats           -> df_sensor_stats_seasonal`
* `Monthly stats            -> df_sensor_stats_monthly`
* `Daily stats              -> df_sensor_stats_daily`
* `Stats by breeding phase  -> df_sensor_stats_breeding_phase`
* `Stats by clutch          -> df_sensor_stats_clutch`



## 1. Set up the environment
### 1.1 Import the required libraries
We need a certain set of common libraries for the tasks to be performed. These are imported below. If an import statement errors, you will need to install the library in your environment using the command line command `pip install <library>`.

In [2]:
print('Setting up environment and variables...', flush=True)
import pandas as pd
import os
import numpy as np
import datetime
import time

# all the useful and reuseable functions are defined in helper_functions.py
from helper_functions import *

Setting up environment and variables...


### 1.2 Set up the variables
You will need to change the values of the variables below to suit the names and directory location of your files to be loaded.

In [3]:
# update these with your file paths
sensor_phase = os.path.normpath('./output/A_load_and_combine_data/df_sensor_phase.pkl')

# write intermediate tables to disk for debugging purposes
write_temps = True
output_path = os.path.normpath('./output/B_Aggregate_Sensor_Stats')
df_sensor_data = None

log('Done.')

Thu Aug 31 19:46:29 2017 - Done.


In [4]:
def write_temp_file(df, filepath, df_name):
    '''
    If write_temps is true, this function will write the specified Pandas dataframe (df) to csv at the specified location (filepath).
    Variables:
        df: a Pandas dataframe to be written to csv.
        filepath: a string in Unix path format (using / not \) for the csv destination.
        df_name: human readable name or description of the dataframe for logging purposes.
    '''
    if write_temps:
        print('{0} - Writing intermediate table {1} to disk.'.format(str(time.ctime()), df_name, filepath), flush=True)
        if not os.path.exists(output_path):
            os.makedirs(output_path)
        df.to_csv(os.path.normpath(filepath))
        if os.path.getsize(filepath) > 0:
            print('{0} - Written {1}: {2:.3f} MB'.format(str(time.ctime()), filepath, os.path.getsize(filepath)/1000000), flush=True)
           

### 1.3 Import the data file

In [5]:
df_sensor_phase = pd.read_pickle(sensor_phase)

### Create the aggregate temperature and humidity calculations

Calculation: **Number of days per year, month, season, breeding phase and clutch with a temp >= 35C**

In [7]:
log('   1. Days >= 35C...')
# get the records > 35c
df_temp_above_35C = df_sensor_phase.loc[df_sensor_phase['temp_c'] >= 35].reset_index()

# get the count of records per day over 35C. One record per day, to be summarised and counted again in each larger unit.
df_days_above_35C = pd.DataFrame({'count': df_temp_above_35C.groupby(['nest_id', 'breeding_year', 'season_year', 'clutch', 'breeding_phase', 'season', 'month', 'day']).size()}).reset_index()

# Summarise the daily version into larger units, counting the resords (one per day) in the dailty version to get days>35
df_monthly_days_above_35C = pd.DataFrame({'days_above_35C': df_days_above_35C.groupby(['nest_id', 'breeding_year', 'month']).size()}).reset_index()
df_seasonal_days_above_35C = pd.DataFrame({'days_above_35C': df_days_above_35C.groupby(['nest_id', 'season_year', 'season']).size()}).reset_index()
df_annual_days_above_35C = pd.DataFrame({'days_above_35C': df_days_above_35C.groupby(['nest_id', 'breeding_year']).size()}).reset_index()
df_phase_days_above_35C = pd.DataFrame({'days_above_35C': df_days_above_35C.groupby(['nest_id', 'breeding_year', 'clutch', 'breeding_phase']).size()}).reset_index()
df_clutch_days_above_35C = pd.DataFrame({'days_above_35C': df_days_above_35C.groupby(['nest_id', 'breeding_year', 'clutch']).size()}).reset_index()
log('   1. Done')

Thu Aug 31 19:47:12 2017 -    1. Days >= 35C...
Thu Aug 31 19:47:12 2017 -    1. Done


Calculation: **Number of days per month with a temp >= 40C**

In [8]:
log('   2. Days >= 40C...')
# get the records > 40c
df_temp_above_40C = df_sensor_phase.loc[df_sensor_phase['temp_c'] >= 40].reset_index()

# get the count of records per day over 35C. One record per day, to be summarised and counted again in each larger unit.
df_days_above_40C = pd.DataFrame({'count': df_temp_above_40C.groupby(['nest_id', 'breeding_year', 'season_year', 'clutch', 'breeding_phase', 'season', 'month', 'day']).size()}).reset_index()

# Summarise the daily version into larger units, counting the resords (one per day) in the dailty version to get days>35
df_monthly_days_above_40C = pd.DataFrame({'days_above_40C': df_days_above_40C.groupby(['nest_id', 'breeding_year', 'month']).size()}).reset_index()
df_seasonal_days_above_40C = pd.DataFrame({'days_above_40C': df_days_above_40C.groupby(['nest_id', 'season_year', 'season']).size()}).reset_index()
df_annual_days_above_40C = pd.DataFrame({'days_above_40C': df_days_above_40C.groupby(['nest_id', 'breeding_year']).size()}).reset_index()
df_phase_days_above_40C = pd.DataFrame({'days_above_40C': df_days_above_40C.groupby(['nest_id', 'breeding_year', 'clutch', 'breeding_phase']).size()}).reset_index()
df_clutch_days_above_40C = pd.DataFrame({'days_above_40C': df_days_above_40C.groupby(['nest_id', 'breeding_year', 'clutch']).size()}).reset_index()
log('   2. Done')

Thu Aug 31 19:47:33 2017 -    2. Days >= 40C...
Thu Aug 31 19:47:33 2017 -    2. Done


Calculation: **Daily temperature and humidity stats**

In [9]:
log('   7. Daily temp and humidity stats...')
agg = {
    'humidity': {
        'max': 'max',
        'min': 'min',
        'mean': 'mean',
        'median': 'median',
        'stddev': 'std'
    },
    'temp_c': {
        'max': 'max',
        'min': 'min',
        'mean': 'mean',
        'median': 'median',
        'stddev': 'std'
    }
}
gb_daily_sensor_stats = df_sensor_phase.groupby(['nest_id', 'breeding_year', 'season_year', 'calendar_year', 'breeding_phase', 'season', 'month', 'day', 'clutch']).agg(agg).reset_index()
gb_daily_sensor_stats.columns = [' '.join(col).strip() for col in gb_daily_sensor_stats.columns.values]
# daily range = max-min
gb_daily_sensor_stats['humidity range'] = gb_daily_sensor_stats['humidity max'] - gb_daily_sensor_stats['humidity min']
gb_daily_sensor_stats['temp_c range'] = gb_daily_sensor_stats['temp_c max'] - gb_daily_sensor_stats['temp_c min']
log('   7. Done.')

Thu Aug 31 19:47:58 2017 -    7. Daily temp and humidity stats...
Thu Aug 31 19:47:59 2017 -    7. Done.


gb_daily_sensor_stats[['nest_id', 'breeding_year', 'season_year', 'calendar_year',
       'breeding_phase', 'season', 'month', 'day', 'clutch', 'humidity max',
       'humidity min', 'humidity mean', 'humidity median', 'humidity stddev',
       'temp_c max', 'temp_c min', 'temp_c mean', 'temp_c median',
       'temp_c stddev', 'humidity range', 'temp_c range']]

Calculation: **Monthly temperature and humidity stats**

In [10]:
log('   8. Monthly temp and humidity stats...')
gb_monthly_sensor_stats = df_sensor_phase.groupby(['nest_id', 'breeding_year', 'month']).agg(agg).reset_index()
gb_monthly_sensor_stats.columns = [' '.join(col).strip() for col in gb_monthly_sensor_stats.columns.values]
# monthly range = max-min
gb_monthly_sensor_stats['humidity range'] = gb_monthly_sensor_stats['humidity max'] - gb_monthly_sensor_stats['humidity min']
gb_monthly_sensor_stats['temp_c range'] = gb_monthly_sensor_stats['temp_c max'] - gb_monthly_sensor_stats['temp_c min']
log('   8. Done.')

Thu Aug 31 19:48:16 2017 -    8. Monthly temp and humidity stats...
Thu Aug 31 19:48:16 2017 -    8. Done.


Calculation: **Seasonal temperature and humidity stats**

In [11]:
log('   9. Seasonal temp and humidity stats...')
gb_seasonal_sensor_stats = df_sensor_phase.groupby(['nest_id', 'season_year', 'season']).agg(agg).reset_index()
gb_seasonal_sensor_stats.columns = [' '.join(col).strip() for col in gb_seasonal_sensor_stats.columns.values]
# seasonal range = max-min
gb_seasonal_sensor_stats['humidity range'] = gb_seasonal_sensor_stats['humidity max'] - gb_seasonal_sensor_stats['humidity min']
gb_seasonal_sensor_stats['temp_c range'] = gb_seasonal_sensor_stats['temp_c max'] - gb_seasonal_sensor_stats['temp_c min']
log('   9. Done.')

Thu Aug 31 19:48:18 2017 -    9. Seasonal temp and humidity stats...
Thu Aug 31 19:48:19 2017 -    9. Done.


Calculation: **Annual temperature and humidity stats**

In [12]:
log('   10. Annual temp and humidity stats...')
gb_annual_sensor_stats = df_sensor_phase.groupby(['nest_id', 'breeding_year']).agg(agg).reset_index()
gb_annual_sensor_stats.columns = [' '.join(col).strip() for col in gb_annual_sensor_stats.columns.values]
# annual range = max-min
gb_annual_sensor_stats['humidity range'] = gb_annual_sensor_stats['humidity max'] - gb_annual_sensor_stats['humidity min']
gb_annual_sensor_stats['temp_c range'] = gb_annual_sensor_stats['temp_c max'] - gb_annual_sensor_stats['temp_c min']
log('   10. Done.')

Thu Aug 31 19:48:26 2017 -    10. Annual temp and humidity stats...
Thu Aug 31 19:48:26 2017 -    10. Done.


Calculation: **Breeding phase temperature and humidity stats**

In [13]:
log('   11. Seasonal temp and humidity stats...')
gb_phase_sensor_stats = df_sensor_phase.groupby(['nest_id', 'breeding_year', 'clutch', 'breeding_phase']).agg(agg).reset_index()
gb_phase_sensor_stats.columns = [' '.join(col).strip() for col in gb_phase_sensor_stats.columns.values]
# range = max-min
gb_phase_sensor_stats['humidity range'] = gb_phase_sensor_stats['humidity max'] - gb_phase_sensor_stats['humidity min']
gb_phase_sensor_stats['temp_c range'] = gb_phase_sensor_stats['temp_c max'] - gb_phase_sensor_stats['temp_c min']
log('   11. Done.')

Thu Aug 31 19:48:32 2017 -    11. Seasonal temp and humidity stats...
Thu Aug 31 19:48:33 2017 -    11. Done.


Calculation: **Clutch temperature and humidity stats**

In [14]:
log('   10.5 Seasonal temp and humidity stats...')
gb_clutch_sensor_stats = df_sensor_phase.groupby(['nest_id', 'breeding_year', 'clutch']).agg(agg).reset_index()
gb_clutch_sensor_stats.columns = [' '.join(col).strip() for col in gb_clutch_sensor_stats.columns.values]
# range = max-min
gb_clutch_sensor_stats['humidity range'] = gb_clutch_sensor_stats['humidity max'] - gb_clutch_sensor_stats['humidity min']
gb_clutch_sensor_stats['temp_c range'] = gb_clutch_sensor_stats['temp_c max'] - gb_clutch_sensor_stats['temp_c min']
log('   10.5 Done.')

Thu Aug 31 19:52:15 2017 -    10.5 Seasonal temp and humidity stats...
Thu Aug 31 19:52:16 2017 -    10.5 Done.


Calculation: **Mean min temp and humidity by month, season, year, phase**

In [16]:
log('   11. Mean min temp and humidity range by month, season, year, phase...')
agg = {
    'humidity min': {'mean_min': 'mean'},
    'temp_c min': {'mean_min': 'mean'}
}
gb_monthly_min = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year', 'month']).agg(agg).reset_index()
gb_monthly_min.columns = [' '.join(col).strip() for col in gb_monthly_min.columns.values]

gb_seasonal_min = gb_daily_sensor_stats.groupby(['nest_id', 'season_year', 'season']).agg(agg).reset_index()
gb_seasonal_min.columns = [' '.join(col).strip() for col in gb_seasonal_min.columns.values]

gb_annual_min = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year']).agg(agg).reset_index()
gb_annual_min.columns = [' '.join(col).strip() for col in gb_annual_min.columns.values]

gb_phase_min = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year', 'clutch', 'breeding_phase']).agg(agg).reset_index()
gb_phase_min.columns = [' '.join(col).strip() for col in gb_phase_min.columns.values]

gb_clutch_min = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year', 'clutch']).agg(agg).reset_index()
gb_clutch_min.columns = [' '.join(col).strip() for col in gb_clutch_min.columns.values]
log('   11. Done.')

Thu Aug 31 19:53:37 2017 -    11. Mean min temp and humidity range by month, season, year, phase...
Thu Aug 31 19:53:37 2017 -    11. Done.


Calculation: **Mean max temp and humidity by month, season, year, phase**

In [17]:
log('   8. Mean max temp and humidity by month, season, year, phase...')
agg = {
    'humidity max': {'mean_max': 'mean'},
    'temp_c max': {'mean_max': 'mean'}
}
gb_monthly_max = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year', 'month']).agg(agg).reset_index()
gb_monthly_max.columns = [' '.join(col).strip() for col in gb_monthly_max.columns.values]

gb_seasonal_max = gb_daily_sensor_stats.groupby(['nest_id', 'season_year', 'season']).agg(agg).reset_index()
gb_seasonal_max.columns = [' '.join(col).strip() for col in gb_seasonal_max.columns.values]

gb_annual_max = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year']).agg(agg).reset_index()
gb_annual_max.columns = [' '.join(col).strip() for col in gb_annual_max.columns.values]

gb_phase_max = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year', 'clutch', 'breeding_phase']).agg(agg).reset_index()
gb_phase_max.columns = [' '.join(col).strip() for col in gb_phase_max.columns.values]

gb_clutch_max = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year', 'clutch']).agg(agg).reset_index()
gb_clutch_max.columns = [' '.join(col).strip() for col in gb_clutch_max.columns.values]
log('   8. Done.')

Thu Aug 31 19:54:34 2017 -    8. Mean max temp and humidity by month, season, year, phase...
Thu Aug 31 19:54:34 2017 -    8. Done.


Calculation: **Mean temp and humidity range by month, season, year, phase**

In [18]:
log('   9. Mean temp and humidity range by month, season, year, phase...')
agg = {
    'humidity range': {'mean_range': 'mean'},
    'temp_c range': {'mean_range': 'mean'}
}
gb_monthly_range = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year', 'month']).agg(agg).reset_index()
gb_monthly_range.columns = [' '.join(col).strip() for col in gb_monthly_range.columns.values]

gb_seasonal_range = gb_daily_sensor_stats.groupby(['nest_id', 'season_year', 'season']).agg(agg).reset_index()
gb_seasonal_range.columns = [' '.join(col).strip() for col in gb_seasonal_range.columns.values]

gb_annual_range = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year']).agg(agg).reset_index()
gb_annual_range.columns = [' '.join(col).strip() for col in gb_annual_range.columns.values]

gb_phase_range = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year', 'clutch', 'breeding_phase']).agg(agg).reset_index()
gb_phase_range.columns = [' '.join(col).strip() for col in gb_phase_range.columns.values]

gb_clutch_range = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year', 'clutch']).agg(agg).reset_index()
gb_clutch_range.columns = [' '.join(col).strip() for col in gb_clutch_range.columns.values]
log('   9. Done.')

Thu Aug 31 19:55:23 2017 -    9. Mean temp and humidity range by month, season, year, phase...
Thu Aug 31 19:55:23 2017 -    9. Done.


Calculation: **Hours exceeding 35C by year, season, month and phase**

In [19]:
log('   11. Hours exceeding 35C...')
# Get a version of df_sensor_phase that is sorted correctly
temp = df_sensor_phase.sort_values(by=['nest_id', 'datetime']).copy().reset_index()
temp['recnum'] = temp['index']

# Get all records with a temp_c >= 35 C
df_hours_above_35C = temp[df_sensor_phase['temp_c'] >= 35].copy()
df_hours_above_35C['next_rec'] = df_hours_above_35C['recnum'] + 1

# Get all records with recnum+1 using a merge
df_hours_above_35C = pd.merge(
    left=df_hours_above_35C,
    right=temp,
    how='left',
    left_on='next_rec',
    right_on='recnum',
    sort=True,
    suffixes=('_orig', '_next')
)[['recnum_orig', 'datetime_orig', 'nest_id_orig', 'humidity_orig',
       'temp_c_orig', 'breeding_year_orig', 'temp_bucket_orig',
       'humidity_bucket_orig', 'clutch_1_orig', 'clutch_2_orig',
       'clutch_3_orig', 'clutch_orig', 'egg_lay_date_orig',
       'courting_date_orig', 'hatch_date_orig', 'dead_or_fledge_date_orig',
       'clutch_count_orig', 'calendar_year_orig', 'month_orig', 'day_orig',
       'hour_orig', 'minute_orig', 'season_orig', 'season_year_orig',
       'breeding_phase_orig', 'datetime_next']]

del temp

# Get the Timedelta between recnum-datestamp and next_rec-datestamp
df_hours_above_35C['time_at_temp'] = df_hours_above_35C['datetime_next'] - df_hours_above_35C['datetime_orig']
df_hours_above_35C['hours_above_35C'] = df_hours_above_35C['time_at_temp'].apply(lambda x: x.seconds / 3600)

# Sum the Timedeltas per nest per year, season, month
# monthly
gb_monthly_hours_above_35C = df_hours_above_35C[
    ['nest_id_orig', 'breeding_year_orig', 'month_orig', 'hours_above_35C']
].groupby(
    ['nest_id_orig', 'breeding_year_orig', 'month_orig']
).sum().reset_index()

# annual
gb_annual_hours_above_35C = df_hours_above_35C[
    ['nest_id_orig', 'breeding_year_orig', 'hours_above_35C']
].groupby(
    ['nest_id_orig', 'breeding_year_orig']
).sum().reset_index()

# seasonal
gb_seasonal_hours_above_35C = df_hours_above_35C[
    ['nest_id_orig', 'season_year_orig', 'season_orig', 'hours_above_35C']
].groupby(
    ['nest_id_orig', 'season_year_orig', 'season_orig']
).sum().reset_index()

# breeding phase
gb_phase_hours_above_35C = df_hours_above_35C[
    ['nest_id_orig', 'breeding_year_orig', 'clutch_orig', 'breeding_phase_orig', 'hours_above_35C']
].groupby(
    ['nest_id_orig', 'breeding_year_orig', 'clutch_orig', 'breeding_phase_orig']
).sum().reset_index()

# clutch
gb_clutch_hours_above_35C = df_hours_above_35C[
    ['nest_id_orig', 'breeding_year_orig', 'clutch_orig', 'hours_above_35C']
].groupby(
    ['nest_id_orig', 'breeding_year_orig', 'clutch_orig']
).sum().reset_index()

log('   11. Done.')

Thu Aug 31 19:56:22 2017 -    11. Hours exceeding 35C...
Thu Aug 31 19:56:26 2017 -    11. Done.


Calculation: **Hours exceeding 40C by year, season, month and phase**

In [20]:
log('   12. Hours exceeding 40C...')
# Get a version of df_sensor_phase that is sorted correctly
temp = df_sensor_phase.sort_values(by=['nest_id', 'datetime']).copy().reset_index()
temp['recnum'] = temp['index']

# Get all records with a temp_c >= 40 C
df_hours_above_40C = temp[df_sensor_phase['temp_c'] >= 40].copy()
df_hours_above_40C['next_rec'] = df_hours_above_40C['recnum'] + 1

# Get all records with recnum+1 using a merge
df_hours_above_40C = pd.merge(
    left=df_hours_above_40C,
    right=temp,
    how='left',
    left_on='next_rec',
    right_on='recnum',
    sort=True,
    suffixes=('_orig', '_next')
)[['recnum_orig', 'datetime_orig', 'nest_id_orig', 'humidity_orig',
       'temp_c_orig', 'breeding_year_orig', 'temp_bucket_orig',
       'humidity_bucket_orig', 'clutch_1_orig', 'clutch_2_orig',
       'clutch_3_orig', 'clutch_orig', 'egg_lay_date_orig',
       'courting_date_orig', 'hatch_date_orig', 'dead_or_fledge_date_orig',
       'clutch_count_orig', 'calendar_year_orig', 'month_orig', 'day_orig',
       'hour_orig', 'minute_orig', 'season_orig', 'season_year_orig',
       'breeding_phase_orig', 'datetime_next']]

del temp

# Get the Timedelta between recnum-datestamp and next_rec-datestamp
df_hours_above_40C['time_at_temp'] = df_hours_above_40C['datetime_next'] - df_hours_above_40C['datetime_orig']
df_hours_above_40C['hours_above_40C'] = df_hours_above_40C['time_at_temp'].apply(lambda x: x.seconds / 3600)

# Sum the Timedeltas per nest per year, season, month
gb_monthly_hours_above_40C = df_hours_above_40C[
    ['nest_id_orig', 'breeding_year_orig', 'month_orig', 'hours_above_40C']
].groupby(
    ['nest_id_orig', 'breeding_year_orig', 'month_orig']
).sum().reset_index()

gb_annual_hours_above_40C = df_hours_above_40C[
    ['nest_id_orig', 'breeding_year_orig', 'hours_above_40C']
].groupby(
    ['nest_id_orig', 'breeding_year_orig']
).sum().reset_index()

gb_seasonal_hours_above_40C = df_hours_above_40C[
    ['nest_id_orig', 'season_year_orig', 'season_orig', 'hours_above_40C']
].groupby(
    ['nest_id_orig', 'season_year_orig', 'season_orig']
).sum().reset_index()

gb_phase_hours_above_40C = df_hours_above_40C[
    ['nest_id_orig', 'breeding_year_orig', 'clutch_orig', 'breeding_phase_orig', 'hours_above_40C']
].groupby(
    ['nest_id_orig', 'breeding_year_orig', 'clutch_orig', 'breeding_phase_orig']
).sum().reset_index()

gb_clutch_hours_above_40C = df_hours_above_40C[
    ['nest_id_orig', 'breeding_year_orig', 'clutch_orig', 'hours_above_40C']
].groupby(
    ['nest_id_orig', 'breeding_year_orig', 'clutch_orig']
).sum().reset_index()

log('   12. Done.')

Thu Aug 31 19:57:31 2017 -    12. Hours exceeding 40C...
Thu Aug 31 19:57:35 2017 -    12. Done.


# Placeholder - need BOM data
Calculation: **Mean variance from ambient max temperature by year, season, month, phase**

In [17]:
# log('   13. Mean monthly humidity...')

# log('   13. Done.')

# Placeholder - need BOM data
Calculation: **Mean variance from ambient min temperature by year, season, month, phase**

In [18]:
# log('   14. Mean monthly humidity...')

# log('   14. Done.')

In [19]:
log('Calculating the sensor stats: Done.')

Sat Jul  8 20:44:59 2017 - Calculating the sensor stats: Done.


## Join the aggregates by their time period (all annual, all monthly etc) and join the breeding success data to the aggregates

### Monthly aggregate data

In [25]:
log('Joining monthly tables...')
df_monthly_microclimate = pd.merge(left=gb_monthly_sensor_stats, 
                                   right=gb_monthly_range, 
                                   how='left', 
                                   on=['nest_id', 'breeding_year', 'month'], 
                                   sort=True)

df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
                                   right=df_monthly_days_above_35C,
                                   how='left',
                                   on=['nest_id', 'breeding_year', 'month'],
                                   sort=True)

df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
                                   right=df_monthly_days_above_40C,
                                   how='left',
                                   on=['nest_id', 'breeding_year', 'month'],
                                   sort=True)

df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
                                   right=gb_monthly_hours_above_35C,
                                   how='left',
                                   left_on=['nest_id', 'breeding_year', 'month'],
                                   right_on=['nest_id_orig', 'breeding_year_orig', 'month_orig'],
                                   sort=True)

df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
                                   right=gb_monthly_hours_above_40C,
                                   how='left',
                                   left_on=['nest_id', 'breeding_year', 'month'],
                                   right_on=['nest_id_orig', 'breeding_year_orig', 'month_orig'],
                                   sort=True)
df_monthly_microclimate.sort_values(by=['nest_id', 'breeding_year', 'month'], inplace=True)
write_temp_file(df_monthly_microclimate, '{0}/df_monthly_microclimate.csv'.format(output_path), 'df_monthly_microclimate')
log('Done.')

log('Placeholder: Joining breeding success to monthly microclimate data -> df_monthly_microclimate_vs_breeding...')
# PLACEHOLDER
log('Done.')

Thu Aug 31 20:21:20 2017 - Joining monthly tables...
Thu Aug 31 20:21:20 2017 - Writing intermediate table df_monthly_microclimate to disk.
Thu Aug 31 20:21:20 2017 - Written output\B_Aggregate_Sensor_Stats/df_monthly_microclimate.csv: 0.501 MB
Thu Aug 31 20:21:20 2017 - Done.
Thu Aug 31 20:21:20 2017 - Placeholder: Joining breeding success to monthly microclimate data -> df_monthly_microclimate_vs_breeding...
Thu Aug 31 20:21:20 2017 - Done.


### Annual aggregate data

In [26]:
log('Joining annual tables...')
df_annual_microclimate = pd.merge(left=gb_annual_sensor_stats, 
                                   right=gb_annual_range, 
                                   how='left', 
                                   on=['nest_id', 'breeding_year'], 
                                   sort=True)

df_annual_microclimate = pd.merge(left=df_annual_microclimate,
                                   right=df_annual_days_above_35C,
                                   how='left',
                                   on=['nest_id', 'breeding_year'],
                                   sort=True)

df_annual_microclimate = pd.merge(left=df_annual_microclimate,
                                   right=df_annual_days_above_40C,
                                   how='left',
                                   on=['nest_id', 'breeding_year'],
                                   sort=True)

df_annual_microclimate = pd.merge(left=df_annual_microclimate,
                                   right=gb_annual_hours_above_35C,
                                   how='left',
                                   left_on=['nest_id', 'breeding_year'],
                                   right_on=['nest_id_orig', 'breeding_year_orig'],
                                   sort=True)

df_annual_microclimate = pd.merge(left=df_annual_microclimate,
                                   right=gb_annual_hours_above_40C,
                                   how='left',
                                   left_on=['nest_id', 'breeding_year'],
                                   right_on=['nest_id_orig', 'breeding_year_orig'],
                                   sort=True)
df_annual_microclimate.sort_values(by=['nest_id', 'breeding_year'], inplace=True)
write_temp_file(df_annual_microclimate, '{0}/df_annual_microclimate.csv'.format(output_path), 'df_annual_microclimate')
log('Done.')

log('Placeholder: Joining breeding success to annual microclimate data -> df_annual_microclimate_vs_breeding...')
# PLACEHOLDER
log('Done.')

Thu Aug 31 20:21:24 2017 - Joining annual tables...
Thu Aug 31 20:21:24 2017 - Writing intermediate table df_annual_microclimate to disk.
Thu Aug 31 20:21:24 2017 - Written output\B_Aggregate_Sensor_Stats/df_annual_microclimate.csv: 0.079 MB
Thu Aug 31 20:21:24 2017 - Done.
Thu Aug 31 20:21:24 2017 - Placeholder: Joining breeding success to annual microclimate data -> df_annual_microclimate_vs_breeding...
Thu Aug 31 20:21:24 2017 - Done.


### Seasonal aggregate data

In [27]:
log('Joining seasonal tables...')
df_seasonal_microclimate = pd.merge(left=gb_seasonal_sensor_stats, 
                                   right=gb_seasonal_range, 
                                   how='left', 
                                   on=['nest_id', 'season_year', 'season'], 
                                   sort=True)

df_seasonal_microclimate = pd.merge(left=df_seasonal_microclimate,
                                   right=df_seasonal_days_above_35C,
                                   how='left',
                                   on=['nest_id', 'season_year', 'season'], 
                                   sort=True)

df_seasonal_microclimate = pd.merge(left=df_seasonal_microclimate,
                                   right=df_seasonal_days_above_40C,
                                   how='left',
                                   on=['nest_id', 'season_year', 'season'], 
                                   sort=True)

df_seasonal_microclimate = pd.merge(left=df_seasonal_microclimate,
                                   right=gb_seasonal_hours_above_35C,
                                   how='left',
                                   left_on=['nest_id', 'season_year', 'season'], 
                                   right_on=['nest_id_orig', 'season_year_orig', 'season_orig'],
                                   sort=True)

df_seasonal_microclimate = pd.merge(left=df_seasonal_microclimate,
                                   right=gb_seasonal_hours_above_40C,
                                   how='left',
                                   left_on=['nest_id', 'season_year', 'season'], 
                                   right_on=['nest_id_orig', 'season_year_orig', 'season_orig'],
                                   sort=True)
df_seasonal_microclimate.sort_values(by=['nest_id', 'season_year', 'season'], inplace=True)
write_temp_file(df_seasonal_microclimate, '{0}/df_seasonal_microclimate.csv'.format(output_path), 'df_seasonal_microclimate')
log('Done.')

log('Placeholder: Joining breeding success to seasonal microclimate data -> df_seasonal_microclimate_vs_breeding...')
# PLACEHOLDER
log('Done.')

Thu Aug 31 20:21:26 2017 - Joining seasonal tables...
Thu Aug 31 20:21:26 2017 - Writing intermediate table df_seasonal_microclimate to disk.
Thu Aug 31 20:21:26 2017 - Written output\B_Aggregate_Sensor_Stats/df_seasonal_microclimate.csv: 0.207 MB
Thu Aug 31 20:21:26 2017 - Done.
Thu Aug 31 20:21:26 2017 - Placeholder: Joining breeding success to seasonal microclimate data -> df_seasonal_microclimate_vs_breeding...
Thu Aug 31 20:21:26 2017 - Done.


### Breeding phase aggregate data

In [28]:
log('Joining breeding phase tables...')

print('\nJoin breeding phase stats to average temp/humidity range')
print('   gb_phase_sensor_stats:', len(gb_phase_sensor_stats))
print('   gb_phase_range:', len(gb_phase_range))
df_breeding_microclimate = pd.merge(left=gb_phase_sensor_stats, # 275
                                   right=gb_phase_range, # 275
                                   how='left', 
                                   on=['nest_id', 'breeding_year', 'clutch', 'breeding_phase'], 
                                   sort=False)
print('   Result: df_breeding_microclimate:', len(df_breeding_microclimate))

print('\nAdd average minimum')
print('   df_breeding_microclimate:', len(df_breeding_microclimate))
print('   gb_phase_min:', len(gb_phase_min))
df_breeding_microclimate = pd.merge(left=df_breeding_microclimate,
                                   right=gb_phase_min,
                                   how='left',
                                   on=['nest_id', 'breeding_year', 'clutch', 'breeding_phase'],
                                   sort=True)
print('   Result: df_breeding_microclimate:', len(df_breeding_microclimate))

print('\nAdd average maximum')
print('   df_breeding_microclimate:', len(df_breeding_microclimate))
print('   gb_phase_max:', len(gb_phase_max))
df_breeding_microclimate = pd.merge(left=df_breeding_microclimate,
                                   right=gb_phase_max,
                                   how='left',
                                   on=['nest_id', 'breeding_year', 'clutch', 'breeding_phase'],
                                   sort=True)
print('   Result: df_breeding_microclimate:', len(df_breeding_microclimate))

print('\nAdd days >35C')
print('   df_breeding_microclimate:', len(df_breeding_microclimate))
print('   df_phase_days_above_35C:', len(df_phase_days_above_35C))
df_breeding_microclimate = pd.merge(left=df_breeding_microclimate,
                                   right=df_phase_days_above_35C,
                                   how='left',
                                   on=['nest_id', 'breeding_year', 'clutch', 'breeding_phase'],
                                   sort=True)
print('   Result: df_breeding_microclimate:', len(df_breeding_microclimate))

print('\nAdd days >40C')
print('   df_breeding_microclimate:', len(df_breeding_microclimate))
print('   df_phase_days_above_40C:', len(df_phase_days_above_40C))
df_breeding_microclimate = pd.merge(left=df_breeding_microclimate,
                                   right=df_phase_days_above_40C,
                                   how='left',
                                   on=['nest_id', 'breeding_year', 'clutch', 'breeding_phase'],
                                   sort=True)
print('   Result: df_breeding_microclimate:', len(df_breeding_microclimate))

print('\nAdd hours >35C')
print('   df_breeding_microclimate:', len(df_breeding_microclimate))
print('   gb_phase_hours_above_35C:', len(gb_phase_hours_above_35C))
df_breeding_microclimate = pd.merge(left=df_breeding_microclimate,
                                   right=gb_phase_hours_above_35C,
                                   how='left',
                                   left_on=['nest_id', 'breeding_year', 'clutch', 'breeding_phase'],
                                   right_on=['nest_id_orig', 'breeding_year_orig', 'clutch_orig', 'breeding_phase_orig'],
                                   sort=True)
print('   Result: df_breeding_microclimate:', len(df_breeding_microclimate))

print('\nAdd hours >40C')
print('   df_breeding_microclimate:', len(df_breeding_microclimate))
print('   gb_phase_hours_above_40C:', len(gb_phase_hours_above_40C))
df_breeding_microclimate = pd.merge(left=df_breeding_microclimate,
                                   right=gb_phase_hours_above_40C,
                                   how='left',
                                   left_on=['nest_id', 'breeding_year', 'clutch', 'breeding_phase'],
                                   right_on=['nest_id_orig', 'breeding_year_orig', 'clutch_orig', 'breeding_phase_orig'],
                                   sort=True)
print('   Result: df_breeding_microclimate:', len(df_breeding_microclimate))

write_temp_file(df_breeding_microclimate, '{0}/df_breeding_phase_microclimate.csv'.format(output_path), 'df_breeding_phase_microclimate')
log('Done.')

# log('Placeholder: Joining breeding success to monthly microclimate data -> df_monthly_microclimate_vs_breeding...')
# # PLACEHOLDER
# log('Done.')

Thu Aug 31 20:21:29 2017 - Joining breeding phase tables...

Join breeding phase stats to average temp/humidity range
   gb_phase_sensor_stats: 275
   gb_phase_range: 275
   Result: df_breeding_microclimate: 275

Add average minimum
   df_breeding_microclimate: 275
   gb_phase_min: 275
   Result: df_breeding_microclimate: 275

Add average maximum
   df_breeding_microclimate: 275
   gb_phase_max: 275
   Result: df_breeding_microclimate: 275

Add days >35C
   df_breeding_microclimate: 275
   df_phase_days_above_35C: 102
   Result: df_breeding_microclimate: 275

Add days >40C
   df_breeding_microclimate: 275
   df_phase_days_above_40C: 72
   Result: df_breeding_microclimate: 275

Add hours >35C
   df_breeding_microclimate: 275
   gb_phase_hours_above_35C: 102
   Result: df_breeding_microclimate: 275

Add hours >40C
   df_breeding_microclimate: 275
   gb_phase_hours_above_40C: 72
   Result: df_breeding_microclimate: 275
Thu Aug 31 20:21:29 2017 - Writing intermediate table df_breeding_phas

### Clutch aggregate data

In [29]:
log('Joining breeding clutch tables...')

print('\nJoin breeding clutch stats to average temp/humidity range')
print('   gb_clutch_sensor_stats:', len(gb_clutch_sensor_stats))
print('   gb_clutch_range:', len(gb_clutch_range))
df_clutch_microclimate = pd.merge(left=gb_clutch_sensor_stats, # 275
                                   right=gb_clutch_range, # 275
                                   how='left', 
                                   on=['nest_id', 'breeding_year', 'clutch'], 
                                   sort=False)
print('   Result: df_clutch_microclimate:', len(df_clutch_microclimate))

print('\nAdd average minimum')
print('   df_clutch_microclimate:', len(df_clutch_microclimate))
print('   gb_clutch_min:', len(gb_clutch_min))
df_clutch_microclimate = pd.merge(left=df_clutch_microclimate,
                                   right=gb_clutch_min,
                                   how='left',
                                   on=['nest_id', 'breeding_year', 'clutch'],
                                   sort=True)
print('   Result: df_clutch_microclimate:', len(df_clutch_microclimate))

print('\nAdd average maximum')
print('   df_clutch_microclimate:', len(df_clutch_microclimate))
print('   gb_clutch_max:', len(gb_clutch_max))
df_clutch_microclimate = pd.merge(left=df_clutch_microclimate,
                                   right=gb_clutch_max,
                                   how='left',
                                   on=['nest_id', 'breeding_year', 'clutch'],
                                   sort=True)
print('   Result: df_clutch_microclimate:', len(df_clutch_microclimate))

print('\nAdd days >35C')
print('   df_clutch_microclimate:', len(df_clutch_microclimate))
print('   df_clutch_days_above_35C:', len(df_clutch_days_above_35C))
df_clutch_microclimate = pd.merge(left=df_clutch_microclimate,
                                   right=df_clutch_days_above_35C,
                                   how='left',
                                   on=['nest_id', 'breeding_year', 'clutch'],
                                   sort=True)
print('   Result: df_clutch_microclimate:', len(df_clutch_microclimate))

print('\nAdd days >40C')
print('   df_clutch_microclimate:', len(df_clutch_microclimate))
print('   df_clutch_days_above_40C:', len(df_clutch_days_above_40C))
df_clutch_microclimate = pd.merge(left=df_clutch_microclimate,
                                   right=df_clutch_days_above_40C,
                                   how='left',
                                   on=['nest_id', 'breeding_year', 'clutch'],
                                   sort=True)
print('   Result: df_clutch_microclimate:', len(df_clutch_microclimate))

print('\nAdd hours >35C')
print('   df_clutch_microclimate:', len(df_clutch_microclimate))
print('   gb_clutch_hours_above_35C:', len(gb_clutch_hours_above_35C))
df_clutch_microclimate = pd.merge(left=df_clutch_microclimate,
                                   right=gb_clutch_hours_above_35C,
                                   how='left',
                                   left_on=['nest_id', 'breeding_year', 'clutch'],
                                   right_on=['nest_id_orig', 'breeding_year_orig', 'clutch_orig'],
                                   sort=True)
print('   Result: df_clutch_microclimate:', len(df_clutch_microclimate))

print('\nAdd hours >40C')
print('   df_clutch_microclimate:', len(df_clutch_microclimate))
print('   gb_clutch_hours_above_40C:', len(gb_clutch_hours_above_40C))
df_clutch_microclimate = pd.merge(left=df_clutch_microclimate,
                                   right=gb_clutch_hours_above_40C,
                                   how='left',
                                   left_on=['nest_id', 'breeding_year', 'clutch'],
                                   right_on=['nest_id_orig', 'breeding_year_orig', 'clutch_orig'],
                                   sort=True)
print('   Result: df_clutch_microclimate:', len(df_clutch_microclimate))

write_temp_file(df_clutch_microclimate, '{0}/df_breeding_clutch_microclimate.csv'.format(output_path), 'df_breeding_clutch_microclimate')
log('Done.')

# log('Placeholder: Joining breeding success to monthly microclimate data -> df_monthly_microclimate_vs_breeding...')
# # PLACEHOLDER
# log('Done.')

Thu Aug 31 20:21:33 2017 - Joining breeding clutch tables...

Join breeding clutch stats to average temp/humidity range
   gb_clutch_sensor_stats: 102
   gb_clutch_range: 102
   Result: df_clutch_microclimate: 102

Add average minimum
   df_clutch_microclimate: 102
   gb_clutch_min: 102
   Result: df_clutch_microclimate: 102

Add average maximum
   df_clutch_microclimate: 102
   gb_clutch_max: 102
   Result: df_clutch_microclimate: 102

Add days >35C
   df_clutch_microclimate: 102
   df_clutch_days_above_35C: 86
   Result: df_clutch_microclimate: 102

Add days >40C
   df_clutch_microclimate: 102
   df_clutch_days_above_40C: 66
   Result: df_clutch_microclimate: 102

Add hours >35C
   df_clutch_microclimate: 102
   gb_clutch_hours_above_35C: 86
   Result: df_clutch_microclimate: 102

Add hours >40C
   df_clutch_microclimate: 102
   gb_clutch_hours_above_40C: 66
   Result: df_clutch_microclimate: 102
Thu Aug 31 20:21:33 2017 - Writing intermediate table df_breeding_clutch_microclimate to

**Pickle the two key data files for use in later scripts**

In [30]:
log('Writing the final tables to pickle for future use...')
df_monthly_microclimate.to_pickle('{0}/df_monthly_microlimate.pkl'.format(output_path))
df_annual_microclimate.to_pickle('{0}/df_annual_microlimate.pkl'.format(output_path))
df_seasonal_microclimate.to_pickle('{0}/df_seasonal_microlimate.pkl'.format(output_path))
df_breeding_microclimate.to_pickle('{0}/df_breeding_microclimate.pkl'.format(output_path))
df_clutch_microclimate.to_pickle('{0}/df_clutch_microlimate.pkl'.format(output_path))

write_temp_file(df_annual_microclimate, '{0}/Microclimate_Annual.csv'.format(output_path), 'Microclimate_Annual')
write_temp_file(df_monthly_microclimate, '{0}/Microclimate_Monthly.csv'.format(output_path), 'Microclimate_Monthly')
write_temp_file(df_seasonal_microclimate, '{0}/Microclimate_Seasonal.csv'.format(output_path), 'Microclimate_Seasonal')
write_temp_file(df_breeding_microclimate, '{0}/Microclimate_BreedingPhase.csv'.format(output_path), 'Microclimate_BreedingPhase')
write_temp_file(df_clutch_microclimate, '{0}/Microclimate_Clutch.csv'.format(output_path), 'Microclimate_Clutch')

log('Done.')

Thu Aug 31 20:21:36 2017 - Writing the final tables to pickle for future use...
Thu Aug 31 20:21:36 2017 - Writing intermediate table Microclimate_Annual to disk.
Thu Aug 31 20:21:36 2017 - Written output\B_Aggregate_Sensor_Stats/Microclimate_Annual.csv: 0.079 MB
Thu Aug 31 20:21:36 2017 - Writing intermediate table Microclimate_Monthly to disk.
Thu Aug 31 20:21:36 2017 - Written output\B_Aggregate_Sensor_Stats/Microclimate_Monthly.csv: 0.501 MB
Thu Aug 31 20:21:36 2017 - Writing intermediate table Microclimate_Seasonal to disk.
Thu Aug 31 20:21:36 2017 - Written output\B_Aggregate_Sensor_Stats/Microclimate_Seasonal.csv: 0.207 MB
Thu Aug 31 20:21:36 2017 - Writing intermediate table Microclimate_BreedingPhase to disk.
Thu Aug 31 20:21:36 2017 - Written output\B_Aggregate_Sensor_Stats/Microclimate_BreedingPhase.csv: 0.106 MB
Thu Aug 31 20:21:36 2017 - Writing intermediate table Microclimate_Clutch to disk.
Thu Aug 31 20:21:36 2017 - Written output\B_Aggregate_Sensor_Stats/Microclimate_C