<h3>USFWS EDA Pea_Island_Beach_Grain_Size_Data_Analysis- by Beach Position </3>

Computes the overall or global mean, standard deviation (sorting), skewness, and kurtosis for each of the 4 or 5 beach position sample sites (S1 thru S4/S5), for the entirety of the three-year, twelve survey duration of the Pea Island Beach Monitoring Project. The project spaned from July, 2014 to August, 2017. Sand samples were collected along these 27 transects with 4, and later 5, samples per transect. For the entire project, this amounts to 324 transects sampled.

Each sand sample was processed using standard dry sieving techniques, with mesh spacing, bin sie, set at:

phis=['phi-1','phi-0.5','phi_0','phi_0.5','phi_1','phi_1.25','phi_1.5','phi_1.75','phi_2','phi_2.5','phi_3','phi_3.5','phi_4','remainder']

The resulting individual bin, or sieved fraction weights were recorded in Microsoft Excel spreadsheets--12 in all. In this notebook we extract those sample weights, bring them together and then compute, sample by sample, the four statistical moments, recording each of these in a separate pandas dataframe. The resulting statistics are then pooled and mean values computed for each pooled statistic, yielding the final global values: a pooled sample size mean, standard eviation (sorting), skewness, and kurtosis for the entire 3-year long project.


**SUPPORTS (Not all of this stuff is used/needed):**

transects=['C11','C10','C9','C8','C7','C6','C5','C4','C3','C2','C1','T1','T2','T3','T4','T5','T6','T7','T8','T9','T10','T11','T12','T13','T14','T15','T16']

surveys=[201407, 201409, 201504, 201508, 201602, 201605, 201608, 201610 ]


whole phis=['phi_-1','phi_0','phi_1','phi_2','phi_3','remainder']

samples=['S1','S2','S3','S5','S4']
<br /><br />

--Notebook created circa: 1/2018
Author: Paul P


<h3>Load the requisite libraries</h3>

In [1]:
# import requiste Python libraries:
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sys.path.append('/Users/paulp/GoogleDrive/projects/SedSAS/')
import SedSASClass

%matplotlib inline

<h3>Load data from sources into a pandas dataframe</h3>

Then, load each survey dataframe into a single Pthon dictionary

In [3]:
fp='/Users/paulp/GoogleDrive/projects/PeaIslandBeachMonitoring/data/FinalDataSets/'

surveys=[201407, 201409, 201504, 201508, 201510, 201602, 201605, 201608, 201610, 
         201702, 201704, 201708 ]
transects=['C11','C10','C9','C8','C7','C6','C5','C4','C3','C2','C1','T1','T2','T3','T4',
           'T5','T6','T7','T8','T9','T10','T11','T12','T13','T14','T15','T16']
samples=['S1','S2','S3','S4','S5']  
scrns=[-1.0,-0.5,0.0,0.5,1.0,1.25,1.5,1.75,2.0,2.5,3.0,3.5,4.0,5.0]

###############################################################################

d={}
for survey in surveys:
    fn='PINWR_'+str(survey)+'_GrainSize.xlsx'
    df_=pd.read_excel(fp+fn, header=0)
    
    ### delete all non-essential columns and handle known issues:
    df_=df_.drop(['sheet_code','sample_date','pan_weight','pan+wet_weight',
                'pan+dry_weight'], axis=1).copy()
    df_=df_.rename(columns = {'transect_id':'Transect', 'sample_number':'Sample'})
    df_=df_.replace('-',0).copy()
    df_=df_.replace('.',0).copy()
    
    ### load prepped dataframe to dictionary, d:
    d[survey]=df_
   

In [6]:
d.get(201407)

Unnamed: 0,Transect,Sample,phi_-1,"phi_-0,5",phi_0,"phi_0,5",phi_1,"phi_1,25","phi_1,5","phi_1,75",phi_2,"phi_2,5",phi_3,"phi_3,5",phi_4,remainder
0,T1,S1,34.23,17.65,14.59,9.94,5.97,1.24,1.14,1.04,1.37,0.76,0.06,0.01,,0.01
1,T1,S2,1.37,1.85,4.81,9.56,17.23,6.84,6.39,3.92,3.72,1.31,0.12,0.01,,0.01
2,T1,S3,0.10,0.05,0.45,2.35,12.08,9.17,10.01,8.53,11.16,6.55,0.61,0.12,,0.01
3,T1,S4,5.38,1.70,1.55,3.70,12.08,9.31,13.83,12.59,13.10,6.23,4.81,1.17,,0.01
4,T10,S1,1.76,4.80,11.93,14.29,12.55,4.54,5.08,4.45,6.49,4.22,0.33,0.01,,0.01
5,T10,S2,2.37,4.38,8.20,9.50,10.40,4.44,5.96,5.83,8.61,5.66,0.68,0.10,,0.01
6,T10,S3,0.18,0.29,1.34,3.04,6.69,4.23,6.29,6.89,13.51,16.39,4.12,0.30,,0.05
7,T10,S4,6.05,11.54,13.69,14.42,19.80,8.31,8.39,6.00,5.71,2.08,0.36,0.07,,0.01
8,T7,S1,15.36,24.77,13.44,2.60,0.29,0.05,0.07,0.11,0.09,0.02,0.01,0.01,,0.01
9,T7,S2,2.12,7.67,11.60,9.09,7.72,3.29,3.95,3.47,5.15,3.58,0.40,0.01,,0.01


### Compute statistics for each sample site (S1-S4/S5), across all surveys


In [4]:
####### For each transect, for each survey, compute the mean, sorting, skewness and
####### kurtosis. The put the output into a Python dictionary--we'll build a new stats
####### dataframe in the next cell.

statsD={}

for sample in samples:
    statsL=[]
    for name, df in d.items():
        df_=df.loc[df['Sample']==sample].iloc[:,2:].dropna(how='all')
        
        for row in df_.iterrows():
            indx, data = row
            dfs=pd.DataFrame(np.array(data)).transpose()
        
            sc = SedSASClass.SedSAS( str(name)+sample, dfs, scrns) 
            gs=sc.ComputeFWLogarithmicGraphicStats()
    
            statsL.append(gs)

    # convert the stats list for the current transect, with all the transect stats inside,
    # to a numpy array, then compute the mean down each column
    # mean: stats[0]; sorting stats[1];  skew: stats[2];  Kurtosis: stats[3]
    statsA=np.mean( np.array(statsL), axis=0 )
        
    statsD[sample]=statsA
    
print('Done')    

----------------------------------------------------------------------
Particle-Size Composition Analysis. Processing Sample ID: 201504S1
----------------------------------------------------------------------
----------------------------------------------------------------------
Particle-Size Composition Analysis. Processing Sample ID: 201504S1
----------------------------------------------------------------------
----------------------------------------------------------------------
Particle-Size Composition Analysis. Processing Sample ID: 201504S1
----------------------------------------------------------------------
----------------------------------------------------------------------
Particle-Size Composition Analysis. Processing Sample ID: 201504S1
----------------------------------------------------------------------
----------------------------------------------------------------------
Particle-Size Composition Analysis. Processing Sample ID: 201504S1
--------------------------

### Create a new dataframe with all of the sampling site statistics inside
### Then, have a look...

Can you capture the sf_stats dataframe into some type of external table?

In [5]:

df_stats=pd.DataFrame.from_dict(statsD, orient='index')
df_stats.columns=['Mean','Sorting','Skew','Kurt']
df_stats.to_csv('/Users/paulp/GoogleDrive/projects/PeaIslandBeachMonitoring/data/StatsbyBchPosition.csv')
df_stats

Unnamed: 0,Mean,Sorting,Skew,Kurt
S5,1.36473,0.660897,-0.053676,1.119595
S4,1.45199,0.624753,-0.007078,1.130377
S3,1.372001,0.605336,-0.095181,1.090749
S1,-0.353533,1.084008,-0.028772,0.934325
S2,0.744195,0.823889,-0.098603,1.012697


<h3>Error checking...</h3>

If something's amiss in the resulting statistics, use this cell to investigate...

In [87]:
#'T16'  'C6'
for name, df in d.items():
    print(name)
    print(df.loc[df['Transect']=='T15'].iloc[:,2:])
   


201504
     phi_-1  phi_-0,5  phi_0  phi_0,5  phi_1  phi_1,25  phi_1,5  phi_1,75  \
100   15.89     12.92  17.60    18.62  14.35      2.10     0.86      0.28   
101    0.13      0.68   2.51     5.91  25.27     15.28    11.32      4.86   
102    0.08      0.41   0.91     2.52  10.44      9.76    12.80      9.99   
103    0.01      0.17   1.12     4.48  10.78      5.32     4.99      3.59   

     phi_2  phi_2,5  phi_3  phi_3,5  phi_4  remainder  
100   0.26     0.10   0.06     0.01    NaN       0.06  
101   2.11     0.39   0.01     0.01    NaN       0.01  
102   9.75     4.18   0.84     0.13    NaN       0.02  
103   3.96     2.39   0.63     0.20    NaN       0.01  
201409
     phi_-1  phi_-0,5  phi_0  phi_0,5  phi_1  phi_1,25  phi_1,5  phi_1,75  \
100    4.73      2.37   3.71     6.96  18.47      9.56     9.80      7.59   
101    0.22      0.10   0.54     2.18  12.38     10.61    12.42      8.89   
102    0.00      0.05   0.01     0.14   2.12      3.92     8.72     12.10   
103   12.03 

### The End