<h3>USFWS EDA Pea_Island_Beach_Grain_Size_Data_byTransect_Analysis </3>

Computes the overall or global mean, standard deviation (sorting), skewness, and kurtosis for each of the 27 transects (11 control, 16 treatment), for the entirety of the three-year, twelve survey duration of the Pea Island Beach Monitoring Project. The project spaned from July, 2014 to August, 2017. Sand samples were collected along these 27 transects with 4, and later 5, samples per transect. For the entire project, this amounts to 324 transects sampled.

Each sand sample was processed using standard dry sieving techniques, with mesh spacing, bin sie, set at:

phis=['phi-1','phi-0.5','phi_0','phi_0.5','phi_1','phi_1.25','phi_1.5','phi_1.75','phi_2','phi_2.5','phi_3','phi_3.5','phi_4','remainder']

The resulting individual bin, or sieved fraction weights were recorded in Microsoft Excel spreadsheets--12 in all. In this notebook we extract those sample weights, bring them together and then compute, sample by sample, the four statistical moments, recording each of these in a separate pandas dataframe. The resulting statistics are then pooled and mean values computed for each pooled statistic, yielding the final global values: a pooled sample size mean, standard eviation (sorting), skewness, and kurtosis for the entire 3-year long project.


**SUPPORTS (Not all of this stuff is used/needed):**

transects=['C11','C10','C9','C8','C7','C6','C5','C4','C3','C2','C1','T1','T2','T3','T4','T5','T6','T7','T8','T9','T10','T11','T12','T13','T14','T15','T16']

surveys=[201407, 201409, 201504, 201508, 201602, 201605, 201608, 201610 ]


whole phis=['phi_-1','phi_0','phi_1','phi_2','phi_3','remainder']

samples=['S1','S2','S3','S5','S4']
<br /><br />

--Notebook created circa: 1/2018
Author: Paul P


<h3>Load the requisite libraries</h3>

In [1]:
# import requiste Python libraries:
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sys.path.append('/Users/paulp/GoogleDrive/projects/SedSAS/')
import SedSASClass

%matplotlib inline

<h3>Load data from sources into a pandas dataframe</h3>

Then, load each survey dataframe into a single Pthon dictionary

In [2]:
fp='/Users/paulp/GoogleDrive/projects/PeaIslandBeachMonitoring/data/FinalDataSets/'

surveys=[201407, 201409, 201504, 201508, 201510, 201602, 201605, 201608, 201610, 
         201702, 201704, 201708 ]
transects=['C11','C10','C9','C8','C7','C6','C5','C4','C3','C2','C1','T1','T2','T3','T4',
           'T5','T6','T7','T8','T9','T10','T11','T12','T13','T14','T15','T16']
samples=['S1','S2','S3','S4','S5']  
scrns=[-1.0,-0.5,0.0,0.5,1.0,1.25,1.5,1.75,2.0,2.5,3.0,3.5,4.0,5.0]

###############################################################################

d={}
for survey in surveys:
    fn='PINWR_'+str(survey)+'_GrainSize.xlsx'
    df_=pd.read_excel(fp+fn, header=0)
    
    ### delete all non-essential columns and handle known issues:
    df_=df_.drop(['sheet_code','sample_date','pan_weight','pan+wet_weight',
                'pan+dry_weight'], axis=1).copy()
    df_=df_.rename(columns = {'transect_id':'Transect', 'sample_number':'Sample'})
    df_=df_.replace('-',0).copy()
    df_=df_.replace('.',0).copy()
    
    ### load prepped dataframe to dictionary, d:
    d[survey]=df_
   

### Compute statistics for each transect, across all surveys


In [3]:
####### For each transect, for each survey, compute the mean, sorting, skewness and
####### kurtosis. The put the output into a Python dictionary--we'll build a new stats
####### dataframe in the next cell.

statsD={}

for transect in transects:
    statsL=[]
    for name, df in d.items():
        df_=df.loc[df['Transect']==transect].iloc[:,2:].dropna(how='all')
        
        for row in df_.iterrows():
            indx, data = row
            dfs=pd.DataFrame(np.array(data)).transpose()
        
            sc = SedSASClass.SedSAS( str(name)+transect, dfs, scrns) 
            gs=sc.ComputeFWLogarithmicGraphicStats()
    
            statsL.append(gs)

    # convert the stats list for the current transect, with all the transect stats inside,
    # to a numpy array, then compute the mean down each column
    # mean: stats[0]; sorting stats[1];  skew: stats[2];  Kurtosis: stats[3]
    statsA=np.mean( np.array(statsL), axis=0 )
        
    statsD[transect]=statsA
    
print('Done')    

Extrapolating 5 percent quantile
----------------------------------------------------------------------
Particle-Size Composition Analysis. Processing Sample ID: 201504C11
----------------------------------------------------------------------
----------------------------------------------------------------------
Particle-Size Composition Analysis. Processing Sample ID: 201504C11
----------------------------------------------------------------------
----------------------------------------------------------------------
Particle-Size Composition Analysis. Processing Sample ID: 201504C11
----------------------------------------------------------------------
----------------------------------------------------------------------
Particle-Size Composition Analysis. Processing Sample ID: 201504C11
----------------------------------------------------------------------
Extrapolating 5 percent quantile
Extrapolating 10 percent quantile
Extrapolating 16 percent quantile
--------------------------

### Create a new dataframe with all of the transect statistics inside
### Then, have a look...

Can you capture the sf_stats dataframe into some type of external table?

In [5]:

df_stats=pd.DataFrame.from_dict(statsD, orient='index')
df_stats.columns=['Mean','Sorting','Skew','Kurt']

df_stats.to_csv('/Users/paulp/GoogleDrive/projects/PeaIslandBeachMonitoring/data/StatsbyTransect.csv' )
df_stats

Unnamed: 0,Mean,Sorting,Skew,Kurt
C11,1.208206,0.765057,-0.073346,1.060533
C5,0.727774,0.824161,-0.064467,0.994046
C4,0.85003,0.734794,-0.059211,1.058105
C2,0.644115,0.777739,-0.071026,1.012215
T5,0.842448,0.774825,-0.035918,1.038022
T12,0.959661,0.729996,-0.056577,1.041512
T4,0.817139,0.753545,-0.040432,1.042173
C10,0.84143,0.879137,-0.02647,0.982522
C7,0.805031,0.782472,-0.057895,1.090617
T15,0.881252,0.777761,-0.087568,1.071435


<h3>Error checking...</h3>

If something's amiss in the resulting statistics, use this cell to investigate...

In [None]:
#'T16'  'C6'
transect='C6'
for name, df_0 in d.items():
    #print(name)
    #print(df_0.loc[df_0['Transect']=='C6'].iloc[:,2:])
    df_1=df_0.loc[df_0['Transect']=='C6'].iloc[:,2:]
    for row in df_1.iterrows():
        index,data=row
        sc = SedSASClass.SedSAS( str(name)+transect, pd.DataFrame( np.array(data)).transpose(), scrns)
        print( sc.ComputeFWLogarithmicGraphicStats()[0] )
        


In [None]:
j=[-2.899845627963392,-2.7996912559267835,-2.6795060094828536,-2.499228139816959, 
-1.9984562796339176, -1.4976844194508763, -1.3174065497849814, -1.1972213033410515,
-1.0970669313044432]

round( sum(j)/len(j), 3 )

In [None]:
for row in df_0.iterrows():
            indx, data = row
            dfs=pd.DataFrame(np.array(data)).transpose()
        
            sc = SedSASClass.SedSAS( str(name)+transect, dfs, scrns) 
            gs=sc.ComputeFWLogarithmicGraphicStats()

### The End