### SedSAS - Quantile Extrapolation Testing

Tests the accuracy of extrapolation approach(es) used to generate quantile values from cumulative weight curves when the data is undetermined (partially bounded) for one or more of the requisite quantile values.

It is common when dry sieving sediment in geologic particle size analyses to capture excess material in the largest aperture sieve, or find an excess of sediments falling through below the smallest aperture into the collection pan at the bottom of the stack. By excess we mean an amount that, when compiling the sample's cumuative weight density curve the data endpoints do not extend sufficiently such that all quantiles required for moment computations can be determined via interpolation methods. 

#### Import the required Python libraries, modules, and magics:

In [3]:
import sys
import numpy as np
import pandas as pd

sys.path.append( '/Users/paulp/GoogleDrive/projects/SedSAS' )
import SedSASClass

#### The Data:

Data comes from two sources. First, a weights file (./PINWR_Coarse_Sieve_Weights.txt) has been created containing selected Pea Island Beach Monitoring samples which have been resieved using a modified stack that includes several gravel-class fractions. The samples sieved were selected based on the amount of extrapolation required to generate the quantile set required for grain-size moment statistics computations. In the specfic case of these samples, all nine quantiles required extrapolation. 

The second source are the survey raw weight files compiled during initial dry sieving operations. These files are located in my GoogleDrive space at: ./projects/PeaIslandBeachMonitoring/data/FinalDataSets/Beach_Sand_Grain_Size (formatted as: PINWR_'survey date'_GrainSizeWeights). 

In [55]:
fp='/Users/paulp/GoogleDrive/projects/PeaIslandBeachMonitoring/data/FinalDataSets/Beach_Sand_Grain_Size/'

# read in the coarse sieve fraction data...
dfB=pd.read_csv('./PINWR_Coarse_Sieve_Weights.txt')
dfB

Unnamed: 0,'Survey','Transect','Sample','phi-4.5','phi-4','phi-3','phi-2.25','phi-2','phi-1','pan'
0,201407,C10,S1,0.0,0.0,2.37,12.51,8.14,41.91,2.91
1,201407,T15,S1,0.0,0.0,0.89,6.49,3.42,24.29,34.31
2,201407,C9,S1,0.0,0.0,1.39,3.98,3.38,31.51,32.92


In [57]:
for index, row in dfB.iterrows():
    survey=row[0]
    transect=row[1]
    sample=row[2]
    dataB=list( row[3:] )
    dfA=pd.read_csv(fp+'PINWR_'+str(survey)+'_GrainSizeWeights.csv')
    
    #recA=dfA.loc[(dfA['transect_id'].as == 'C10')]
    recA=dfA[(dfA['transect_id'] == transect) & (dfA['sample_number']==sample)]

    print(recA)


      sheet_code sample_date transect_id sample_number  pan_weight  \
36  july2014_C10     7/15/14         C10            S1       11.51   

    pan+wet_weight  pan+dry_weight  phi_-1  phi_-0,5  phi_0    ...      phi_1  \
36           73.42           67.67   65.05      2.33   0.06    ...       0.01   

    phi_1,25  phi_1,5  phi_1,75  phi_2  phi_2,5  phi_3  phi_3,5  phi_4  \
36      0.01     0.01      0.01   0.01     0.01   0.01     0.01    NaN   

    remainder  
36        0.0  

[1 rows x 21 columns]
       sheet_code sample_date transect_id sample_number  pan_weight  \
100  july2014_T15     7/15/14         T15            S1        5.67   

     pan+wet_weight  pan+dry_weight  phi_-1  phi_-0,5  phi_0    ...      \
100           74.45             NaN   35.67     14.86  11.83    ...       

     phi_1  phi_1,25  phi_1,5  phi_1,75  phi_2  phi_2,5  phi_3  phi_3,5  \
100   0.94      0.17     0.22      0.26   0.43     0.12   0.05     0.01   

     phi_4  remainder  
100    NaN       0.03  

In [34]:
# sheet_code,sample_date,transect_id,sample_number,pan_weight,pan+wet_weight,
# pan+dry_weight,phi_-1,"phi_-0,5",phi_0,"phi_0,5",phi_1,"phi_1,25","phi_1,5",
# "phi_1,75",phi_2,"phi_2,5",phi_3,"phi_3,5",phi_4,remainder

hdrA=[ 'phi-1','phi-0.5','phi 0','phi 0.5','phi 1','phi 1.25','phi 1.5',
     'phi 1.75','phi 2','phi 2.5','phi 3','phi 3.5','phi 4','pan' ]
hdrB=['phi-4.5','phi-4','phi-3','phi-2.25','phi-2','phi-1']
screensA=[-1.0,-0.5,0.0,0.5,1.0,1.25,1.50,1.75,2.0,2.5,3.0,3.5,4.0,5.0]
screensB=[-4.5,-4.0,-3.0,-2.25,-2.0]

A=[ 65.05,2.33,0.06,0.02,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.00 ]
B=[ 0.0,0.0,2.37,12.51,8.14,41.91]   #,41.91,2.91 ]

print(sum(A))
#print(sum(B))

dfA=pd.DataFrame( {'Wgt':A} ).T
dfA.columns=hdrA

dfB=pd.DataFrame( {'Wgt':B} ).T
dfB.columns=hdrB
dfB

67.54000000000003


Unnamed: 0,phi-4.5,phi-4,phi-3,phi-2.25,phi-2,phi-1
Wgt,0.0,0.0,2.37,12.51,8.14,41.91


#### Instantiate SedSAS class instance for dataframe dfA:

In [36]:
ssc=SedSASClass.SedSAS('1', dfA, screensA)
qntA=ssc.GetQuantileList()
qntA




----------------------------------------------------------------------
Particle-Size Composition Analysis. Processing Sample ID: 1
----------------------------------------------------------------------

NOTE THAT ONE OR MORE QUANTILE VALUES MUST BE DETERMINED BY EXTRAPOLATION. 



([-1.948, -1.896, -1.834, -1.74, -1.481, -1.221, -1.128, -1.066, -1.014],
 [3.859, 3.722, 3.565, 3.341, 2.791, 2.332, 2.185, 2.093, 2.019])

### Instantiate SedSAS class instance for dataframe df = dfA + dfB:

In [37]:
screensB.extend(screensA)
df=pd.concat( [dfB,dfA.iloc[:,1:] ], axis=1 )

ssc2=SedSASClass.SedSAS('2', df, screensB)
qntf=ssc2.GetQuantileList()
qntf




----------------------------------------------------------------------
Particle-Size Composition Analysis. Processing Sample ID: 2
----------------------------------------------------------------------


([-2.94, -2.738, -2.495, -2.189, -1.745, -1.343, -1.198, -1.101, -1.021],
 [7.674, 6.671, 5.639, 4.561, 3.352, 2.536, 2.294, 2.146, 2.029])

In [39]:
np.subtract(qntA,qntf)

array([[ 0.992,  0.842,  0.661,  0.449,  0.264,  0.122,  0.07 ,  0.035,
         0.007],
       [-3.815, -2.949, -2.074, -1.22 , -0.561, -0.204, -0.109, -0.053,
        -0.01 ]])