## This notebook can be used to analyze data collected from HOBO sensors. 

#### Prior to going through this notebook, make sure you have exported the desired files from the HOBO data to CSV format. Assemble the files into the directory you would like to use for the analysis. All the CSV files from that directory will input into the dataframe. 

#### Also make sure you have downloaded the ENTIRE repository from github. You will need all the different ".py" files

#### To run each cell, hold down shift enter in the desired cell you would like to run. 

#### Word of advice - this may take a while to run. Please be aware of that, and if you have any questions, please feel free to reach out to me 
---
This first cell is used to bring in the neccessary libraries 

In [None]:
#Import the neccessary libraries and functions 
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, MonthLocator, YearLocator
import numpy as np
import re
import os
import interactive_file_selection
import glob
from matplotlib import rcParams
from pandas import Series, DataFrame
import seaborn as sb
import csv
import datetime as datetime
from datetime import datetime
import warnings
import tkinter as tk
from tkinter import filedialog
from tkinter import *
warnings.filterwarnings('ignore')

from degree_days import bug_degree_day, plant_degree_day
from time_of_day import hot_cold_time
from data_readin import data_read_in
from qa_qc_tests import persist_test, range_test, step_test
from subset import subset_variables, subset_site_type
from plots import multi_panel_plot

Next, you have the option to interactively select which folder contains the files you would like analyzed. Open the interactive window, select a file from within the desired folder, then select okay. The window will close and bring you back to this notebook

In [None]:
#Select the filetype by selecting a single file 
fname = interactive_file_selection.gui_fname()
path = os.path.dirname(fname)

If you decided to not use the file selection tool, you can use this cell (make sure to the line path = ...)

In [None]:
#You can also manually select the path using this cell
#path = '/Users/mgrover1/Desktop/microclimate/test_data/'

The next step is to read in all the files from the selected directory. Be warned - if there are a TON of files within the selected directory, this will take a long time to run...

In [None]:
extension = 'csv'
os.chdir(path)
result = [i for i in glob.glob('*.{}'.format(extension))]

#Complete the data read in process by pulling the data into a single dataframe
#Exports the dataframe to its own file 
x = data_read_in(result)

Drops any data points that are missing temperature or light data - also gives a report of how many total data points were dropped. This can be helpful for understand how much 'bad' data exits

In [None]:
#Remove all data entries that are missing any values 
df = x.dropna(how='any')

#Set the correct index value 
df.index = df.datetime

#Outputs a summary of how many data points were removed out of the total number 
print('Removed a total of ',(len(x) - len(df)), 'Entries', ' of ', len(df))

Next, we will do some exploratory data analysis. Charts are created for each sensor within the dataset, and output to files within the current directory (where all the data is at). This will help you understand what data you have, and where there are holes...

You can decide which variables to plot
- max_min_temperature
- max_min_light 
- diurnal_temperature_range

You can also determine what time period you would like average values over. Acceptable inputs include: 
- Daily 
- Weekly
- Monthly 
- Yearly


In [None]:
#Plots the maximum and minimum temperature for each site and sensor type by week 
multi_panel_plot(df = df, plot_vars = 'max_min_temperature', frequency = 'weekly')

In [None]:
#Plots the maximum and minimum temperature for each site and sensor type by week 
multi_panel_plot(df = df, plot_vars = 'max_min_light', frequency = 'weekly')

In [None]:
#Plots the maximum and minimum temperature for each site and sensor type by week 
multi_panel_plot(df = df, plot_vars = 'diurnal_temperature_range', frequency = 'weekly')

---
Summaries are also generated from the data to provide you with an understanding of what the statistics are for the dataset

In [None]:
#Generates a daily summary
daily_summary = df.groupby(['site','type','Year','Month','Day']).describe()
daily_summary

In [None]:
#Generates a weekly summary 
weekly_summary = df.groupby(['site','type','Year','Month','Week']).describe()
weekly_summary

In [None]:
#Generates a monthly summary 
monthly_summary = df.groupby(['site','type','Year','Month']).describe()
monthly_summary

In [None]:
#Generates a yearly summary 
yearly_summary = df.groupby(['site', 'type', 'Year']).describe()
yearly_summary

---
A Quality Assurance/Control is performed on the data using persistence, range, and step tests. The methodology and part of the code for this was taken from a project at New Mexico State University 

In [None]:
#Sets the index to be the datetime 
x.index = x.datetime

#QA/QC Analysis 
temps = x['temp']

#PERSISTENCE TEST
#Checks to see if there are too many values in a row
persist_df_result = persist_test(temps, 10, 2)
x['persist_test'] = persist_df_result['result']

#RANGE TEST
#Set the maximum and minimum temperature values you would expect from the sensor 
range_test_df = range_test(temps,90,5)
x['range_test']= range_test_df['result']

#STEP TEST
#Run the step test, with the second value being the maximum difference you would expect between time steps
step_test_df = step_test(temps_subset,10)
x['step_test'] = step_test_df['result']

#Tests to see if each timestep satisfies all the QA/QC components 
all_tests = []
for i in range(len(x)):
    if x['step_test'][i] & x['range_test'][i] & x['persist_test'][i] == True:
        all_tests.append('Pass')
    else:
        all_tests.append("Fail")
x['all_tests'] = all_tests

#Subsets the dataframe for only the data that passed the QA/QC test 
x = x[all_tests = True]

---

## WARNING - THIS SECTION CAN TAKE A VERY LONG TIME 

#### Plant and Bug Degree days can be helpful for analysis. 

In [None]:
#Calculate different degree days
#Takes a long time... 
x['Plant_Degree_Day'] = plant_degree_day(x)
x['Bug_Degree_Day'] = bug_degree_day(x)

### Final reports are generated by the year, month, and week

In [None]:
#Create reports detailing the number of temperature observations for each given sensor 
#Yearly Report 

#Create the yearly report for both count and average temperature 
yearly_report_count = x.groupby(['site','type','Year'])['temp'].count()
yearly_report_mean = x.groupby(['site','type','Year'])['temp'].mean()

#Exports the yearly reports
yearly_report_count.to_csv('yearly_count_report.csv',header=True)
yearly_report_mean.to_csv('yearly_avgtemp_report.csv',header=True)


#Create the monthly reports for both count and average temperature 
monthly_report_count = x.groupby(['site','type','Year','Month'])['temp'].count()
monthly_report_mean = x.groupby(['site','type','Year','Month'])['temp'].mean()

#Exports the monthly reports 
monthly_report_count.to_csv('monthly_count_report.csv',header=True)
monthly_report_mean.to_csv('monthly_avgtemp_report.csv',header=True)


#Create the weekly report for both count and average temperature 
weekly_report_count = x.groupby(['site','type','Year','Month','Week'])['temp'].count()
weekly_report_mean = x.groupby(['site','type','Year','Month','Week'])['temp'].mean()

#Exports the weekly reports 
weekly_report_count.to_csv('weekly_count_report.csv',header=True)
weekly_report_mean.to_csv('weekly_avgtemp_report.csv',header=True)

### Finalized graphs using the quality data

In [None]:
#Plots the maximum and minimum temperature for each site and sensor type by week 
multi_panel_plot(df = x, plot_vars = 'max_min_temperature', frequency = 'weekly')

#Plots the maximum and minimum temperature for each site and sensor type by week 
multi_panel_plot(df = x, plot_vars = 'max_min_light', frequency = 'weekly')

#Plots the maximum and minimum temperature for each site and sensor type by week 
multi_panel_plot(df = x, plot_vars = 'diurnal_temperature_range', frequency = 'weekly')