<h3>PreProcess: Update the existing USFWS Survey Plotting Data (.csv) file</h3>

Updates the very important data file: USFWS_Survey_Plotting_Data.csv. Use this notebook to add new survey data, as surveys are collected. 

**NOTE: The Plot Group 5 series relies on the USFWS_Survey_Plotting_Data.csv file to generate its box plots.**

<br />
**Instructions:**
1. Update the User Inputs cell, below, adding in the name of the new survey data spreadsheet and the id for the new survey
2. Under Cell -> Run All

<h4>Import requisite modules and libraries</h4>

In [1]:
import numpy as np
import pandas as pd

<h4>User Inputs:</h4>

In [5]:
####################################################################################
datapath='/Users/paulp/GoogleDrive/projects/PeaIslandBeachMonitoring/data/' # where your existing Survey file is stored...
sspath='/Users/paulp/GoogleDrive/projects/PeaIslandBeachMonitoring/data/xls/' # where your spreadsheet is stored

spdfn='USFWS_Survey_Plotting_Data.csv'     # NOTE this file contains all surveys from 7/2014 thru ???, inclusive

nsfn ='FWSGrainSizeAnalysis2016_10.xlsx'   # name of new survey file whose data are to be added to spdfn
new_survey='201610'  
####################################################################################

<h4>Load data from the existing USFWS Survey Data file and from the new survey spreadsheet</h4>

In [29]:
cols=(['group','transect','sample','survey','phi_-1','phi_-0.5','phi_0','phi_0.5','phi_1','phi_1.25','phi_1.5','phi_1.75','phi_2','phi_2.5','phi_3','phi_3.5','phi_4','remainder'])
spd_df=pd.read_csv(datapath+spdfn, names=cols)

ss_df=pd.read_excel(sspath+nsfn, skiprows=0)

<h4>Data Preprocessing I: (Converting the absolute screen weights from the spreadsheet into fraction weight percentages:</h4>

In [30]:
### Convert screen absolute weights posted in the spreadsheet df ss_df to weight percentages:

# recompute the total weight of the samples, just in case the summed_weight field is flawed...
ss_df['total_weight']=ss_df[['phi_-1', 'phi_-0,5','phi_0', 'phi_0,5','phi_1', 'phi_1,25','phi_1,5','phi_1,75',
'phi_2','phi_2,5','phi_3','phi_3,5','remainder']].sum(axis=1)

# recast the absolute screen weights to weight percent fractions: (This is meathead. There must be a better way!)
ss_df['phi_-1'] = (ss_df['phi_-1'] / ss_df['total_weight'])*100
ss_df['phi_-0,5'] = (ss_df['phi_-0,5'] / ss_df['total_weight'])*100
ss_df['phi_0'] = (ss_df['phi_0'] / ss_df['total_weight'])*100
ss_df['phi_0,5'] = (ss_df['phi_0,5'] / ss_df['total_weight'])*100
ss_df['phi_1'] = (ss_df['phi_1'] / ss_df['total_weight'])*100
ss_df['phi_1,25'] = (ss_df['phi_1,25'] / ss_df['total_weight'])*100
ss_df['phi_1,5'] = (ss_df['phi_1,5'] / ss_df['total_weight'])*100
ss_df['phi_1,75'] = (ss_df['phi_1,75'] / ss_df['total_weight'])*100
ss_df['phi_2'] = (ss_df['phi_2'] / ss_df['total_weight'])*100
ss_df['phi_2,5'] = (ss_df['phi_2,5'] / ss_df['total_weight'])*100
ss_df['phi_3'] = (ss_df['phi_3'] / ss_df['total_weight'])*100
ss_df['phi_3,5'] = (ss_df['phi_3,5'] / ss_df['total_weight'])*100
ss_df['remainder'] = (ss_df['remainder'] / ss_df['total_weight'])*100

<h4>Data Preprocessing II: (Adding the group field - classifies record as belonging either to the control or treatment group)

In [31]:
## create a populate new group column (field) based on transect id type (e.g., control or transect):
ss_df.loc[ss_df['transect_id'].str[0] == 'C', 'Group'] ='control'
ss_df.loc[ss_df['transect_id'].str[0] =='T', 'Group'] = 'treatment'

<h4>Data Preprocessing III: (Adding the survey, and 'dummy' phi_4, fields to the ss_df dataframe. We add the dummy phi 4 field to account for its existence in early surveys, and thus a presence in the Survey Data File. It's no longer used, but it's influence from the early days remains with us...)</h4>

In [44]:
ss_df['survey']=new_survey
ss_df['phi_4']=np.nan

<h4>Data Preprocessing IV: (re-sculpting the ss_df dataframe to match that of the spd_df. Actually, we'll create a new spreadsheet with just the stuff from ss_df that we want, in the order that we want, so that we can then proceed to merge (stack vertically) the two (spd_df + ss_df(new))

In [50]:
cols=(['group','transect','sample','survey','phi_-1','phi_-0.5','phi_0','phi_0.5','phi_1','phi_1.25','phi_1.5','phi_1.75','phi_2','phi_2.5','phi_3','phi_3.5','phi_4','remainder'])

ss2_df=ss_df[['Group','transect_id','sample_number','survey','phi_-1','phi_-0,5','phi_0','phi_0,5','phi_1','phi_1,25','phi_1,5','phi_1,75','phi_2','phi_2,5','phi_3','phi_3,5','phi_4','remainder']]
ss2_df.columns=cols


<h4>Merge thee two data frames, spd_df and ss2_df, to create one big one that will be written out, in the next cell, to the filesystem to create the latest iteration of the USFWS_Survey_Plotting_Data.csv file...</h4>