# Aggregate Sensor Stats
In script A_load_and_combine_data we created a table called df_sensor_phase which contianed the full temperature and humidity sensor readings, together with the breeding_year, season_year and breeding_phase.

This script creates a set of aggregates from this data to understand the nest microclimates. The aggregates include:
* `Annual stats             -> df_sensor_stats_annual`
* `Seasonal stats           -> df_sensor_stats_seasonal`
* `Monthly stats            -> df_sensor_stats_monthly`
* `Daily stats              -> df_sensor_stats_daily`
* `Stats by breeding phase  -> df_sensor_stats_breeding_phase`



## 1. Set up the environment
### 1.1 Import the required libraries
We need a certain set of common libraries for the tasks to be performed. These are imported below. If an import statement errors, you will need to install the library in your environment using the command line command `pip install <library>`.

In [1]:
print('Setting up environment and variables...', flush=True)
import pandas as pd
import os
import numpy as np
import datetime
import time

# all the useful and reuseable functions are defined in helper_functions.py
from helper_functions import *

Setting up environment and variables...


### 1.2 Set up the variables
You will need to change the values of the variables below to suit the names and directory location of your files to be loaded.

In [108]:
# update these with your file paths
sensor_phase = os.path.normpath('./output/A_load_and_combine_data/df_sensor_phase.pkl')

# write intermediate tables to disk for debugging purposes
write_temps = True
output_path = os.path.normpath('./output/B_Aggregate_Sensor_Stats')
df_sensor_data = None

log('Done.')

Tue Apr 18 09:05:52 2017 - Done.


In [109]:
def write_temp_file(df, filepath, df_name):
    '''
    If write_temps is true, this function will write the specified Pandas dataframe (df) to csv at the specified location (filepath).
    Variables:
        df: a Pandas dataframe to be written to csv.
        filepath: a string in Unix path format (using / not \) for the csv destination.
        df_name: human readable name or description of the dataframe for logging purposes.
    '''
    if write_temps:
        print('{0} - Writing intermediate table {1} to disk.'.format(str(time.ctime()), df_name, filepath), flush=True)
        if not os.path.exists(output_path):
            os.makedirs(output_path)
        df.to_csv(os.path.normpath(filepath))
        if os.path.getsize(filepath) > 0:
            print('{0} - Written {1}: {2:.3f} MB'.format(str(time.ctime()), filepath, os.path.getsize(filepath)/1000000), flush=True)
           

### 1.3 Import the data file

In [3]:
df_sensor_phase = pd.read_pickle(sensor_phase)

### Create the aggregate temperature and humidity calculations

Calculation: **Number of days per month with a temp >= 35C**

In [60]:
log('   1. Days per month >= 35C...')
# get the records > 35
# Convert the datetime to a month and day (in addition to the existing breeding_year)
# Count the distinct dates per nest per year per month
df_temp_above_35C = df_sensor_phase.loc[df_sensor_phase['temp_c'] >= 35].reset_index()
gb_monthly_days_above_35C = df_temp_above_35C.groupby(['nest_id', 'breeding_year', 'month']).size()
# convert from Series to Dataframe (which has multi-level index)
gb_monthly_days_above_35C = gb_monthly_days_above_35C.to_frame()
# rename axis 1 to remove the multi-index
gb_monthly_days_above_35C = gb_monthly_days_above_35C.rename_axis(None, axis=1).reset_index()
gb_monthly_days_above_35C.rename(columns={0: 'days_above_35C'}, inplace=True)
log('   1. Done')

Tue Apr 18 08:47:34 2017 -    1. Days per month >= 35C...
Tue Apr 18 08:47:34 2017 -    1. Done


Calculation: **Number of days per season with a temp >= 35C**

**Note**: Assumes that Summer 2013 is Jan-Feb13 and Dec13; i.e. all the summer months in the year 2013 rather than the Summer season that starts in 2013 (which would be Dec13-Feb14).

In [63]:
log('   2. Days per season >= 35C...')
# use the >35C table from #1, sum for each season
gb_seasonal_days_above_35C = df_temp_above_35C.groupby(['nest_id', 'season_year', 'season']).size().reset_index()
gb_seasonal_days_above_35C.rename(columns={0: 'days_above_35C'}, inplace=True)
log('   2. Done')

Tue Apr 18 08:49:08 2017 -    2. Days per season >= 35C...
Tue Apr 18 08:49:08 2017 -    2. Done


Calculation: **Number of days per year with a temp >= 35C**

In [66]:
log('   3. Days per year >= 35C...')
# use the >35C table from #1, sum for each year
gb_annual_days_above_35C = df_temp_above_35C.groupby(['nest_id', 'breeding_year']).size()
# convert from Series to Dataframe (which has multi-level index)
gb_annual_days_above_35C = gb_annual_days_above_35C.to_frame()
# rename axis 1 to remove the multi-index
gb_annual_days_above_35C = gb_annual_days_above_35C.rename_axis(None, axis=1).reset_index()
gb_annual_days_above_35C.rename(columns={0: 'days_above_35C'}, inplace=True)
log('   3. Done')

Tue Apr 18 08:49:47 2017 -    3. Days per year >= 35C...
Tue Apr 18 08:49:47 2017 -    3. Done


Calculation: **Number of days per month with a temp >= 40C**

In [74]:
log('   4. Days per month >= 40C...')
# get the records > 40
# Convert the datetime to a month and day (in addition to the existing breeding_year)
# Count the distinct dates per nest per year per month
df_temp_above_40C = df_sensor_phase.loc[df_sensor_phase['temp_c'] >= 40].reset_index()
gb_monthly_days_above_40C = df_temp_above_40C.groupby(['nest_id', 'breeding_year', 'month']).size()
# convert from Series to Dataframe (which has multi-level index)
gb_monthly_days_above_40C = gb_monthly_days_above_40C.to_frame()
# rename axis 1 to remove the multi-index
gb_monthly_days_above_40C = gb_monthly_days_above_40C.rename_axis(None, axis=1).reset_index()
gb_monthly_days_above_40C.rename(columns={0: 'days_above_40C'}, inplace=True)
log('   4. Done')

Tue Apr 18 08:51:56 2017 -    4. Days per month >= 40C...
Tue Apr 18 08:51:56 2017 -    4. Done


Calculation: **Number of days per season with a temp >= 40C**

In [79]:
log('   5. Days per season >= 40C...')
# use the >40C table from #4, sum for each season
gb_seasonal_days_above_40C = df_temp_above_40C.groupby(['nest_id', 'season_year', 'season']).size()
# convert from Series to Dataframe (which has multi-level index)
gb_seasonal_days_above_40C = gb_seasonal_days_above_40C.to_frame()
# rename axis 1 to remove the multi-index
gb_seasonal_days_above_40C = gb_seasonal_days_above_40C.rename_axis(None, axis=1).reset_index()
gb_seasonal_days_above_40C.rename(columns={0: 'days_above_40C'}, inplace=True)
log('   5. Done')

Tue Apr 18 08:52:37 2017 -    5. Days per season >= 40C...
Tue Apr 18 08:52:37 2017 -    5. Done


Calculation: **Number of days per year with a temp >= 40C**

In [83]:
log('   6. Days per year >= 40C...')
# use the >35C table from #1, sum for each year
gb_annual_days_above_40C = df_temp_above_40C.groupby(['nest_id', 'breeding_year']).size()
# convert from Series to Dataframe (which has multi-level index)
gb_annual_days_above_40C = gb_annual_days_above_40C.to_frame()
# rename axis 1 to remove the multi-index
gb_annual_days_above_40C = gb_annual_days_above_40C.rename_axis(None, axis=1).reset_index()
gb_annual_days_above_40C.rename(columns={0: 'days_above_40C'}, inplace=True)
log('   6. Done')

Tue Apr 18 08:53:12 2017 -    6. Days per year >= 40C...
Tue Apr 18 08:53:12 2017 -    6. Done


Calculation: **Daily temperature and humidity stats**
# How to get a temp/humd range in here?

In [10]:
log('   7. Daily temp and humidity stats...')
agg = {
    'humidity': {
        'max': 'max',
        'min': 'min',
        'mean': 'mean',
        'median': 'median',
        'stddev': 'std'
    },
    'temp_c': {
        'max': 'max',
        'min': 'min',
        'mean': 'mean',
        'median': 'median',
        'stddev': 'std'
    }
}
gb_daily_sensor_stats = df_sensor_phase.groupby(['nest_id', 'breeding_year', 'season_year', 'calendar_year', 'breeding_phase', 'season', 'month', 'day', 'clutch']).agg(agg).reset_index()
gb_daily_sensor_stats.columns = [' '.join(col).strip() for col in gb_daily_sensor_stats.columns.values]
# daily range = max-min
gb_daily_sensor_stats['humidity range'] = gb_daily_sensor_stats['humidity max'] - gb_daily_sensor_stats['humidity min']
gb_daily_sensor_stats['temp_c range'] = gb_daily_sensor_stats['temp_c max'] - gb_daily_sensor_stats['temp_c min']
log('   7. Done.')

Tue Apr 18 08:19:23 2017 -    7. Daily temp and humidity stats...
Tue Apr 18 08:19:24 2017 -    7. Done.


Calculation: **Monthly temperature and humidity stats**

In [11]:
log('   8. Monthly temp and humidity stats...')
gb_monthly_sensor_stats = df_sensor_phase.groupby(['nest_id', 'breeding_year', 'month']).agg(agg).reset_index()
gb_monthly_sensor_stats.columns = [' '.join(col).strip() for col in gb_monthly_sensor_stats.columns.values]
# monthly range = max-min
# gb_monthly_sensor_stats['humidity range'] = gb_monthly_sensor_stats['humidity max'] - gb_monthly_sensor_stats['humidity min']
# gb_monthly_sensor_stats['temp_c range'] = gb_monthly_sensor_stats['temp_c max'] - gb_monthly_sensor_stats['temp_c min']
log('   8. Done.')

Tue Apr 18 08:19:24 2017 -    8. Monthly temp and humidity stats...
Tue Apr 18 08:19:25 2017 -    8. Done.


Calculation: **Seasonal temperature and humidity stats**

In [12]:
log('   9. Seasonal temp and humidity stats...')
gb_seasonal_sensor_stats = df_sensor_phase.groupby(['nest_id', 'season_year', 'season']).agg(agg).reset_index()
gb_seasonal_sensor_stats.columns = [' '.join(col).strip() for col in gb_seasonal_sensor_stats.columns.values]
# seasonal range = max-min
gb_seasonal_sensor_stats['humidity range'] = gb_seasonal_sensor_stats['humidity max'] - gb_seasonal_sensor_stats['humidity min']
gb_seasonal_sensor_stats['temp_c range'] = gb_seasonal_sensor_stats['temp_c max'] - gb_seasonal_sensor_stats['temp_c min']
log('   9. Done.')

Tue Apr 18 08:19:25 2017 -    9. Seasonal temp and humidity stats...
Tue Apr 18 08:19:25 2017 -    9. Done.


Calculation: **Annual temperature and humidity stats**

In [13]:
log('   10. Annual temp and humidity stats...')
gb_annual_sensor_stats = df_sensor_phase.groupby(['nest_id', 'breeding_year']).agg(agg).reset_index()
gb_annual_sensor_stats.columns = [' '.join(col).strip() for col in gb_annual_sensor_stats.columns.values]
# annual range = max-min
gb_annual_sensor_stats['humidity range'] = gb_annual_sensor_stats['humidity max'] - gb_annual_sensor_stats['humidity min']
gb_annual_sensor_stats['temp_c range'] = gb_annual_sensor_stats['temp_c max'] - gb_annual_sensor_stats['temp_c min']
log('   10. Done.')

Tue Apr 18 08:19:25 2017 -    10. Annual temp and humidity stats...
Tue Apr 18 08:19:26 2017 -    10. Done.


Calculation: **Mean min temp and humidity by month, season, year, phase**

In [14]:
log('   11. Mean min temp and humidity by month, season, year, phase...')
agg = {
    'humidity range': {'mean_range': 'mean'},
    'temp_c range': {'mean_range': 'mean'}
}
gb_monthly_range = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year', 'month']).agg(agg).reset_index()
gb_monthly_range.columns = [' '.join(col).strip() for col in gb_monthly_range.columns.values]

gb_seasonal_range = gb_daily_sensor_stats.groupby(['nest_id', 'season_year', 'season']).agg(agg).reset_index()
gb_seasonal_range.columns = [' '.join(col).strip() for col in gb_seasonal_range.columns.values]

gb_annual_range = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year']).agg(agg).reset_index()
gb_annual_range.columns = [' '.join(col).strip() for col in gb_annual_range.columns.values]

gb_phase_range = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year', 'clutch', 'breeding_phase']).agg(agg).reset_index()
gb_phase_range.columns = [' '.join(col).strip() for col in gb_phase_range.columns.values]
log('   11. Done.')

Tue Apr 18 08:19:26 2017 -    11. Mean min temp and humidity by month, season, year, phase...
Tue Apr 18 08:19:26 2017 -    11. Done.


Calculation: **Mean max temp and humidity by month, season, year, phase**

In [15]:
log('   8. Mean max temp and humidity by month, season, year, phase...')
agg = {
    'humidity max': {'mean_max': 'mean'},
    'temp_c max': {'mean_max': 'mean'}
}
gb_monthly_max = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year', 'month']).agg(agg).reset_index()
gb_monthly_max.columns = [' '.join(col).strip() for col in gb_monthly_max.columns.values]

gb_seasonal_max = gb_daily_sensor_stats.groupby(['nest_id', 'season_year', 'season']).agg(agg).reset_index()
gb_seasonal_max.columns = [' '.join(col).strip() for col in gb_seasonal_max.columns.values]

gb_annual_max = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year']).agg(agg).reset_index()
gb_annual_max.columns = [' '.join(col).strip() for col in gb_annual_max.columns.values]

gb_phase_max = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year', 'clutch', 'breeding_phase']).agg(agg).reset_index()
gb_phase_max.columns = [' '.join(col).strip() for col in gb_phase_max.columns.values]
log('   8. Done.')

Tue Apr 18 08:19:26 2017 -    8. Mean max temp and humidity by month, season, year, phase...
Tue Apr 18 08:19:26 2017 -    8. Done.


Calculation: **Mean temp and humidity range by month, season, year, phase**

In [16]:
log('   9. Mean temp and humidity range by month, season, year, phase...')
agg = {
    'humidity range': {'mean_range': 'mean'},
    'temp_c range': {'mean_range': 'mean'}
}
gb_monthly_range = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year', 'month']).agg(agg).reset_index()
gb_monthly_range.columns = [' '.join(col).strip() for col in gb_monthly_range.columns.values]

gb_seasonal_range = gb_daily_sensor_stats.groupby(['nest_id', 'season_year', 'season']).agg(agg).reset_index()
gb_seasonal_range.columns = [' '.join(col).strip() for col in gb_seasonal_range.columns.values]

gb_annual_range = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year']).agg(agg).reset_index()
gb_annual_range.columns = [' '.join(col).strip() for col in gb_annual_range.columns.values]

gb_phase_range = gb_daily_sensor_stats.groupby(['nest_id', 'breeding_year', 'clutch', 'breeding_phase']).agg(agg).reset_index()
gb_phase_range.columns = [' '.join(col).strip() for col in gb_phase_range.columns.values]
log('   9. Done.')

Tue Apr 18 08:19:26 2017 -    9. Mean temp and humidity range by month, season, year, phase...
Tue Apr 18 08:19:26 2017 -    9. Done.


Calculation: **Hours exceeding 35C by year, season, month and phase**

In [17]:
log('   11. Hours exceeding 35C...')
# Get a version of df_sensor_phase that is sorted correctly
temp = df_sensor_phase.sort_values(by=['nest_id', 'datetime']).copy().reset_index()
temp['recnum'] = temp['index']

# Get all records with a temp_c >= 35 C
df_hours_above_35C = temp[df_sensor_phase['temp_c'] >= 35].copy()
df_hours_above_35C['next_rec'] = df_hours_above_35C['recnum'] + 1

# Get all records with recnum+1 using a merge
df_hours_above_35C = pd.merge(
    left=df_hours_above_35C,
    right=temp,
    how='left',
    left_on='next_rec',
    right_on='recnum',
    sort=True,
    suffixes=('_orig', '_next')
)[['recnum_orig', 'datetime_orig', 'nest_id_orig', 'humidity_orig',
       'temp_c_orig', 'breeding_year_orig', 'temp_bucket_orig',
       'humidity_bucket_orig', 'clutch_1_orig', 'clutch_2_orig',
       'clutch_3_orig', 'clutch_orig', 'egg_lay_date_orig',
       'courting_date_orig', 'hatch_date_orig', 'dead_or_fledge_date_orig',
       'clutch_count_orig', 'calendar_year_orig', 'month_orig', 'day_orig',
       'hour_orig', 'minute_orig', 'season_orig', 'season_year_orig',
       'breeding_phase_orig', 'datetime_next']]

del temp

# Get the Timedelta between recnum-datestamp and next_rec-datestamp
df_hours_above_35C['time_at_temp'] = df_hours_above_35C['datetime_next'] - df_hours_above_35C['datetime_orig']
df_hours_above_35C['hours_above_35C'] = df_hours_above_35C['time_at_temp'].apply(lambda x: x.seconds / 3600)

# Sum the Timedeltas per nest per year, season, month
gb_monthly_hours_above_35C = df_hours_above_35C[
    ['nest_id_orig', 'breeding_year_orig', 'month_orig', 'hours_above_35C']
].groupby(
    ['nest_id_orig', 'breeding_year_orig', 'month_orig']
).sum().reset_index()
gb_annual_hours_above_35C = df_hours_above_35C[
    ['nest_id_orig', 'breeding_year_orig', 'hours_above_35C']
].groupby(
    ['nest_id_orig', 'breeding_year_orig']
).sum().reset_index()
gb_seasonal_hours_above_35C = df_hours_above_35C[
    ['nest_id_orig', 'season_year_orig', 'season_orig', 'hours_above_35C']
].groupby(
    ['nest_id_orig', 'season_year_orig', 'season_orig']
).sum().reset_index()
log('   11. Done.')

Tue Apr 18 08:19:26 2017 -    11. Hours exceeding 35C...
Tue Apr 18 08:19:30 2017 -    11. Done.


Calculation: **Hours exceeding 40C by year, season, month and phase**

In [18]:
log('   12. Hours exceeding 40C...')
# Get a version of df_sensor_phase that is sorted correctly
temp = df_sensor_phase.sort_values(by=['nest_id', 'datetime']).copy().reset_index()
temp['recnum'] = temp['index']

# Get all records with a temp_c >= 40 C
df_hours_above_40C = temp[df_sensor_phase['temp_c'] >= 40].copy()
df_hours_above_40C['next_rec'] = df_hours_above_40C['recnum'] + 1

# Get all records with recnum+1 using a merge
df_hours_above_40C = pd.merge(
    left=df_hours_above_40C,
    right=temp,
    how='left',
    left_on='next_rec',
    right_on='recnum',
    sort=True,
    suffixes=('_orig', '_next')
)[['recnum_orig', 'datetime_orig', 'nest_id_orig', 'humidity_orig',
       'temp_c_orig', 'breeding_year_orig', 'temp_bucket_orig',
       'humidity_bucket_orig', 'clutch_1_orig', 'clutch_2_orig',
       'clutch_3_orig', 'clutch_orig', 'egg_lay_date_orig',
       'courting_date_orig', 'hatch_date_orig', 'dead_or_fledge_date_orig',
       'clutch_count_orig', 'calendar_year_orig', 'month_orig', 'day_orig',
       'hour_orig', 'minute_orig', 'season_orig', 'season_year_orig',
       'breeding_phase_orig', 'datetime_next']]

del temp

# Get the Timedelta between recnum-datestamp and next_rec-datestamp
df_hours_above_40C['time_at_temp'] = df_hours_above_40C['datetime_next'] - df_hours_above_40C['datetime_orig']
df_hours_above_40C['hours_above_40C'] = df_hours_above_40C['time_at_temp'].apply(lambda x: x.seconds / 3600)

# Sum the Timedeltas per nest per year, season, month
gb_monthly_hours_above_40C = df_hours_above_40C[
    ['nest_id_orig', 'breeding_year_orig', 'month_orig', 'hours_above_40C']
].groupby(
    ['nest_id_orig', 'breeding_year_orig', 'month_orig']
).sum().reset_index()

gb_annual_hours_above_40C = df_hours_above_40C[
    ['nest_id_orig', 'breeding_year_orig', 'hours_above_40C']
].groupby(
    ['nest_id_orig', 'breeding_year_orig']
).sum().reset_index()

gb_seasonal_hours_above_40C = df_hours_above_40C[
    ['nest_id_orig', 'season_year_orig', 'season_orig', 'hours_above_40C']
].groupby(
    ['nest_id_orig', 'season_year_orig', 'season_orig']
).sum().reset_index()
log('   12. Done.')

Tue Apr 18 08:19:30 2017 -    12. Hours exceeding 40C...
Tue Apr 18 08:19:33 2017 -    12. Done.


# Placeholder - need BOM data
Calculation: **Mean variance from ambient max temperature by year, season, month, phase**

In [19]:
log('   13. Mean monthly humidity...')

log('   13. Done.')

Tue Apr 18 08:19:33 2017 -    13. Mean monthly humidity...
Tue Apr 18 08:19:33 2017 -    13. Done.


# Placeholder - need BOM data
Calculation: **Mean variance from ambient min temperature by year, season, month, phase**

In [20]:
log('   14. Mean monthly humidity...')

log('   14. Done.')

Tue Apr 18 08:19:33 2017 -    14. Mean monthly humidity...
Tue Apr 18 08:19:33 2017 -    14. Done.


In [21]:
log('Calculating the sensor stats: Done.')

Tue Apr 18 08:19:33 2017 - Calculating the sensor stats: Done.


## Join the aggregates by their time period (all annual, all monthly etc) and join the breeding success data to the aggregates

### Monthly aggregate data

In [110]:
log('Joining monthly tables...')
df_monthly_microclimate = pd.merge(left=gb_monthly_sensor_stats, 
                                   right=gb_monthly_range, 
                                   how='left', 
                                   on=['nest_id', 'breeding_year', 'month'], 
                                   sort=True)

df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
                                   right=gb_monthly_days_above_35C,
                                   how='left',
                                   on=['nest_id', 'breeding_year', 'month'],
                                   sort=True)

df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
                                   right=gb_monthly_days_above_40C,
                                   how='left',
                                   on=['nest_id', 'breeding_year', 'month'],
                                   sort=True)

df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
                                   right=gb_monthly_hours_above_35C,
                                   how='left',
                                   left_on=['nest_id', 'breeding_year', 'month'],
                                   right_on=['nest_id_orig', 'breeding_year_orig', 'month_orig'],
                                   sort=True)

df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
                                   right=gb_monthly_days_above_35C,
                                   how='left',
                                   left_on=['nest_id', 'breeding_year', 'month'],
                                   right_on=['nest_id', 'breeding_year', 'month'],
                                   sort=True)
df_monthly_microclimate.sort_values(by=['nest_id', 'breeding_year', 'month'], inplace=True)
write_temp_file(df_monthly_microclimate, '{0}/df_monthly_microclimate.csv'.format(output_path), 'df_monthly_microclimate')
log('Done.')

log('Joining breeding success to monthly microclimate data -> df_monthly_microclimate_vs_breeding...')
# PLACEHOLDER
log('Done.')

Tue Apr 18 09:06:46 2017 - Joining monthly tables...
Tue Apr 18 09:06:46 2017 - Writing intermediate table df_monthly_microclimate to disk.
Tue Apr 18 09:06:46 2017 - Written output\B_Aggregate_Sensor_Stats/df_monthly_microclimate.csv: 0.429 MB
Tue Apr 18 09:06:46 2017 - Done.


### Annual aggregate data

In [None]:
log('Joining annual tables...')
df_annual_microclimate = pd.merge(left=gb_annual_sensor_stats, 
                                   right=gb_annual_range, 
                                   how='left', 
                                   on=['nest_id', 'breeding_year'], 
                                   sort=True)

df_annual_microclimate = pd.merge(left=df_annual_microclimate,
                                   right=gb_annual_days_above_35C,
                                   how='left',
                                   on=['nest_id', 'breeding_year'],
                                   sort=True)

df_annual_microclimate = pd.merge(left=df_annual_microclimate,
                                   right=gb_annual_days_above_40C,
                                   how='left',
                                   on=['nest_id', 'breeding_year'],
                                   sort=True)

df_annual_microclimate = pd.merge(left=df_annual_microclimate,
                                   right=gb_annual_hours_above_35C,
                                   how='left',
                                   left_on=['nest_id', 'breeding_year'],
                                   right_on=['nest_id_orig', 'breeding_year_orig'],
                                   sort=True)

df_annual_microclimate = pd.merge(left=df_annual_microclimate,
                                   right=gb_annual_days_above_35C,
                                   how='left',
                                   left_on=['nest_id', 'breeding_year'],
                                   right_on=['nest_id', 'breeding_year'],
                                   sort=True)
df_annual_microclimate.sort_values(by=['nest_id', 'breeding_year'], inplace=True)
write_temp_file(df_annual_microclimate, '{0}/df_annual_microclimate.csv'.format(output_path), 'df_annual_microclimate')
log('Done.')

log('Joining breeding success to annual microclimate data -> df_annual_microclimate_vs_breeding...')
# PLACEHOLDER
log('Done.')

### Seasonal aggregate data

In [None]:
# log('Joining monthly tables...')
# df_monthly_microclimate = pd.merge(left=gb_monthly_sensor_stats, 
#                                    right=gb_monthly_range, 
#                                    how='left', 
#                                    on=['nest_id', 'breeding_year', 'month'], 
#                                    sort=True)

# df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
#                                    right=gb_monthly_days_above_35C,
#                                    how='left',
#                                    on=['nest_id', 'breeding_year', 'month'],
#                                    sort=True)

# df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
#                                    right=gb_monthly_days_above_40C,
#                                    how='left',
#                                    on=['nest_id', 'breeding_year', 'month'],
#                                    sort=True)

# df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
#                                    right=gb_monthly_hours_above_35C,
#                                    how='left',
#                                    left_on=['nest_id', 'breeding_year', 'month'],
#                                    right_on=['nest_id_orig', 'breeding_year_orig', 'month_orig'],
#                                    sort=True)

# df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
#                                    right=gb_monthly_days_above_35C,
#                                    how='left',
#                                    left_on=['nest_id', 'breeding_year', 'month'],
#                                    right_on=['nest_id', 'breeding_year', 'month'],
#                                    sort=True)
# df_monthly_microclimate.sort_values(by=['nest_id', 'breeding_year', 'month'], inplace=True)
# write_temp_file(df_monthly_microclimate, '{0}/df_monthly_microclimate.csv'.format(output_path), 'df_monthly_microclimate')
# log('Done.')

# log('Joining breeding success to monthly microclimate data -> df_monthly_microclimate_vs_breeding...')
# # PLACEHOLDER
# log('Done.')

### Breeding phase aggregate data

In [None]:
# log('Joining monthly tables...')
# df_monthly_microclimate = pd.merge(left=gb_monthly_sensor_stats, 
#                                    right=gb_monthly_range, 
#                                    how='left', 
#                                    on=['nest_id', 'breeding_year', 'month'], 
#                                    sort=True)

# df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
#                                    right=gb_monthly_days_above_35C,
#                                    how='left',
#                                    on=['nest_id', 'breeding_year', 'month'],
#                                    sort=True)

# df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
#                                    right=gb_monthly_days_above_40C,
#                                    how='left',
#                                    on=['nest_id', 'breeding_year', 'month'],
#                                    sort=True)

# df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
#                                    right=gb_monthly_hours_above_35C,
#                                    how='left',
#                                    left_on=['nest_id', 'breeding_year', 'month'],
#                                    right_on=['nest_id_orig', 'breeding_year_orig', 'month_orig'],
#                                    sort=True)

# df_monthly_microclimate = pd.merge(left=df_monthly_microclimate,
#                                    right=gb_monthly_days_above_35C,
#                                    how='left',
#                                    left_on=['nest_id', 'breeding_year', 'month'],
#                                    right_on=['nest_id', 'breeding_year', 'month'],
#                                    sort=True)
# df_monthly_microclimate.sort_values(by=['nest_id', 'breeding_year', 'month'], inplace=True)
# write_temp_file(df_monthly_microclimate, '{0}/df_monthly_microclimate.csv'.format(output_path), 'df_monthly_microclimate')
# log('Done.')

# log('Joining breeding success to monthly microclimate data -> df_monthly_microclimate_vs_breeding...')
# # PLACEHOLDER
# log('Done.')

**Pickle the two key data files for use in later scripts**

In [None]:
log('Writing the final tables to pickle for future use...')

log('Done.')