# Sediment Size Analysis by Sieve (SedSAS) Class : Example

### Processing sediment data stored in a Microsoft Excel (xls or xlsx) file:

## Part I: Introduction:

SedSAS is a class object written in the Python programming (scripting) language. Its purpose is to provide a basic set of statistical and visualization tools for analyzing unconsolidated sediment size-fraction samples collected in the field and separated using either mechanical sieves or any other analog partition-by-size methods. 

#### Multi-Sample Processing:

In this notebook we look at how the class is used to computer grain-size statistics for a one or more samples, where subsample weights are contained a separate Microsoft Excel spreadsheet file. The data from the spreadsheet will be read into this notebook and processed.

You can use this notebook not only as a learning tool or reference but also as a template from which to directly conduct your own analyses. To do the latter, simply download a copy and replace the existing data file path(s) and name(s), then run.

The downloaded notebook can be further modified, or copied and then the copy modified, as you see fit. 

#### What we'll do...

1.) Load data from a Microsoft Excel spreadsheet into a Pandas dataframe <br/>
2.) Rework the contents of the new Pandas dataframe so as to meet the input and format requirements of SedSAS <br/>
3.) Loop thru each sample in the dataframe to compute grain-size statistics <br/>
4.) Write the results to a new Pandas dataframe and display the contents in the notebook <br/>
5.) Save the contents of the new results dataframe to a new csv file <br/>

### Preliminaries:
- The SedSAS class, this notebook, and all supporting materials were developed in a Python 3.x environment. Neither the class nor any of these supporting materials have been tested using a Python 2.x distro. *Backward compatibility is expected, but not assured.*
    
- If you do not have a Python interpreter installed on your computer (from the factory Linux UNIX, and MacOS users do; Windows users likely do not), your best bet for a trouble-free installation experience can be had here: https://anaconda.org/anaconda/python. If you're more adventurous, try the Python Software Foundation: https://www.python.org/getit/. Even if your OS does come with a Python interpreter already in place, it's probably a good idea to get a more up to date release (most OS installs are a version, or more, behind the current release). Again, I suggest the Anaconda distribution. Be sure to get a copy of Python version 3.5 or later.  
- Required external Python Libraries and modules\*\*:
    - sys (basic Python methods for retrieving information from the host operating system)
    - numpy (numeric Python library for array and matrix operations)
    - pandas (Python Data Analysis library for in-memory data storage and analysis)
    - matlotlib (Matrix Plotting Library for plot generation) 
    
    
- Loading SedSAS.py into your script: to use the class it must first be loaded into your script or notebook environment. If the class script file is located in the same directory as your analysis script then to load it you need only do the following:
    
    import  sedSAS
    
If the SedSAS.py file is located elsewhere on your computer or server you'll have to point Python to its location by amending the local path variable\*\*\*:

    import sys
    sys.path.append(/full/path/to/directory/where/class/file/is/located/)
    
For example, if your copy of SedSAS.py is located in the directory: 

    /Users/Documents/projects/ 
    
enter this into the sys.path.append method as:

    import sys
    sys.path.append(/Users/Documents/projects/)


______________
\*\*Note that these, and many, many other libraries, are included in the default Anaconda Python distribution.

\*\*\* Note that you are not changing your operating system's global PATH variable, only a copy of it that is assigned to the environment tied to your script or notebook. The global PATH variable is not altered

#### Required User Inputs:
Initial data input to SedSAS during the instantiation process consists of:

1. a listing of all the sieve apertures (in $\phi$ units) used in the analysis (sorted descending by size)
2. the weight of sediment material captured by each sieve 

These data must be passed to SedSAS inside a Pandas dataframe where the first column contains the aperture sizes in order as in the actual stack and where the second column contains the sediment weight.

A unique identifier for the sample can optionally be passed to SedSAS at the time of instantiation. Note that SedSAS doesn't really care if you provide a unique identifier, nor is it particular about how you choose to format the id. The identifier is more for the user and tracking than for code execution and so from a functional point of view, it's not even a requiremment. Nevertheless, it makes sense to provide something that helps to track what's in process, especially if you want to distinguish between samples when multiple samples are run in succession. If you wish to forego an id, the class will assign the default value: '1' as id each time the the class is instantiated.

## Part II: Example: using the class:

#### Computing grain-size statistics from sample data housed in a Microsoft Excel file located in the subdirectory ./Sample_Data/. 

- The spreadsheet file to be processed is: Sample_Data.xlsx

- The sieves used (in stack order) in this example are: -2.25$\phi$, -1.0$\phi$, -0.5$\phi$, 0.0$\phi$, 0.5$\phi$, 1.0$\phi$, 1.5$\phi$, 2.0$\phi$, 2.5$\phi$, 3.0$\phi$, 3.5$\phi$, 4.0$\phi$, and the pan fraction which we'll designate as 5.0$\phi$

- The SedSAS.py script file is located in the same directory as is my (this) Jupyter notebook.

All the required data resides in Sample_Data.xlsx, but the format is not yet ready for direct ingest into SedSAS. Thus, we'll perform some minor preliminary adjustments to get the data in the required format. You will probably have to perform similar adjustment operations in order to prepare your own data for analysis. If you should elect to use SedSAS in your work just remember that Google can be your good friend in finding out how to do something with Python, numpy, and/or Pandas dataframes. Don't be afraid to experiment, and learn. 

Finally, SedSAS is verbose. As you will see, during execution there is potentially going to be a lot of feedback written back to the console (your computer screen). You are advised to at least scan through this, voluminous as it might be, for important warnings, errors, and other information that might have a direct bearing on the integrity of the results. 

With all that said, let's get started...

### Read the contents of the spreadsheet file into a pandas dataframe and reformat as needed for SedSAS ingest:

In this particular example some columns from the source file are dropped, it was neceaary to transpose the new dataframe to arrange it so that each column represented a single sample and each row a single weight observation in a sample. For other files, with other data content and formatting a different preparation will be required. The objective, however, is a simple dataframe containing one column that lists all the sieve screen apertures used in your analysis, and another second column that carries the individual subsample weights stopped by each concomitant sieve screen. 

Note that, as will be demonstrated here, the aperture column can be proxied by the dataframe's index.


In [44]:
import pandas as pd      
import numpy as np
import sys
import SedSAS

## list of sieve apertures used in the analysis. We'll use these as dataframe index values
Apertures=[-2.25,-1.0,-0.5,0.0,0.5,1.0,1.5,2.0,2.5,3.0,3.5,4.0,5.0 ]

## 1.), read the raw file into a pandas dataframe:
file_path='../Sample_Data/'
file='Sample_Data.xlsx'
df=pd.read_excel(file_path+file, header=1)

## 2.) Drop the Sample Name column from the dataframe. We don't need it. Sinceit's in the 
## first position (aLL rows of the first column, in Python slicing parlance: [:,0:1]):
df.drop(df.iloc[:,0:1], axis=1, inplace=True)

## 3.) Rename the 'Pan' column to 5:
df.rename(columns={'Pan':5}, inplace=True)

## (4.) Then, then, we'll transpose the dataframe to change it from sample row into sample 
## columnar order
df=df.T.copy()

df
## each column in df is now a single sample. each row in df is a single weight observation
## in the sample. We'll retrieve the needed aperture data from the dataframe index.

## Now, the dataframe's ready to go, so let do the analysis.

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
-2.25,2.74,1.16,2.16,3.46,2.61,3.09,1.94,3.47,5.09,2.69,4.66,1.62,3.44
-1.0,2.74,1.81,1.63,2.66,2.84,2.35,2.35,1.99,1.5,1.31,1.23,1.27,1.7
-0.5,1.01,1.14,0.98,1.04,1.27,1.4,1.18,1.08,1.08,1.1,0.89,1.02,0.84
0.0,1.16,1.22,1.02,1.31,1.61,1.58,1.23,1.34,1.16,1.35,1.12,1.41,0.96
0.5,1.3,1.39,1.28,1.4,1.76,1.78,1.54,1.6,1.48,1.61,1.54,1.82,1.27
1.0,1.89,2.2,2.14,2.07,2.59,2.55,2.34,2.46,2.36,2.35,2.57,2.95,2.24
1.5,1.97,2.29,2.35,1.99,2.45,2.54,2.26,3.14,2.44,2.25,2.54,3.02,2.23
2.0,3.16,3.74,3.9,3.29,4.07,4.1,4.1,4.06,3.72,3.71,3.97,4.82,3.85
2.5,3.92,4.69,4.86,4.08,5.19,5.23,5.2,4.95,4.82,4.64,4.81,5.8,4.97
3.0,3.33,4.02,3.16,3.33,4.12,4.11,4.01,4.21,3.66,3.71,3.8,4.6,4.15


### Compute grain-size statistical metrics for each sample in the dataframe

Iterate through the new dataframe, column by column (sample by sample), passing the sample to SedSAS for processing. Note that SedSAS is designed to handle only a single sample at a time. SedSAS analysis results for each and all samples in the input data file are returned in a new dataframe. This new dataframe can be the basis for continued analysis, or can be written out to a comma-separated values (csv) file--see "Save results to a Comma-Separated Values (csv) text file" at the end of this notebook for the details--for archive or in use eslewhere. 

This script uses a convenience method ComputeGSStats() to generate statistics (mean, sorting, skewness, and kurtosis) using 4 different computational approaches. Included in these appraoches are the logarithmic and geometric inclusive graphics from Folk and Ward (1957) and Folk, (1980) and the geomtric and logartihmic method of moments from Krumbien and Pettijohn,(1938). A fifth approach is represented in SedSAS, computation of the arithmetic method of moments, but as this is considered a much less robust and much less commonly used algorithm, it is omitted from ComputeGSStats(). If desired, the user can alter ComputeGSStats() to include this fifth method. 

In addition to ComputeGSStats() the user has the option to call methods which compute statistics using the five approaches represented in SedSAS (four of which are built into ComputeGSStats()) individually to customize analyses based on user need and experimental design. See the GitHub readme for more information.

Plots (histograms and CDFs) can also be generated for each sample and captured in individual (by sample) Portable Network Graphics (PNG) files. To do this add the following line of PLOTting code inside the for loop struct:    
    
    ## create a class instance and generate statistics using the convenience method ComputeGSStats():         <-- existing line
    out_list.append( ssc.ComputeGSStats() )                                                                   <-- existing line
    ssc.PLOTDualSampleWeightPercents(printTo='file')                                                          <-- add this line

In [45]:
## set a counter variable to be used to create and pass a unique identifier to SedSAS for each sample
c=0

out_list=[]        # a temporary Python list to hold the dictionary of results returned for each sample in df

## loop thru the dataframe df, reading it column by column (by sample) and creating dataframe df_
## anew with each iteration. df_ will contain, for each iteration, an aperture column and a weight
## column. The latter is unique to the sample. This is passed to the SedSAS initializer.

for sample in df:
    df_= df[sample].to_frame()    # convert the single column from a series to a dataframe called df_
    df_.columns=['Weight']        # label the single column 'Weight'
    df_['Aperture']=df_.index     # copy the dataframe index to a new column 'Aperture'
    df_.set_index( [np.arange(0,len(df_['Aperture']),1)], inplace=True  )  # reset index to sequential numbers
    
    ID = 'Sample '+str(c)          # create an arbitrary unique idenifier for the current sample
    c=c+1
    
    ## create a class instance and generate statistics using the convenience method ComputeGSStats().
    ## ComputeGSStats() returns results in a Python dictionary that is then appended onto out_list
    ssc = SedSAS.SedSAS(df_, ID )
    out_list.append( ssc.ComputeGSStats() )
 
## create the output dataframe and write to screen:
df_out=pd.DataFrame(out_list).replace(np.nan, value='-')
df_out



 Processing sample:  Sample 0
10.52 percent. This exceeds 5% of total by weight.
Values in excess of 5% can introduce significant error in some analyses.



 Processing sample:  Sample 1
multimodal sample distribution are unreliable or possibly even nonsensical.


 Processing sample:  Sample 2
8.15 percent. This exceeds 5% of total by weight.
Values in excess of 5% can introduce significant error in some analyses.

multimodal sample distribution are unreliable or possibly even nonsensical.


 Processing sample:  Sample 3
12.64 percent. This exceeds 5% of total by weight.
Values in excess of 5% can introduce significant error in some analyses.

multimodal sample distribution are unreliable or possibly even nonsensical.


 Processing sample:  Sample 4
8.29 percent. This exceeds 5% of total by weight.
Values in excess of 5% can introduce significant error in some analyses.

multimodal sample distribution are unreliable or possibly even nonsensical.


 Processing sample:  Sample 5
9.7 pe

Unnamed: 0,FWLogKurt,FWLogKurtClass,FWLogMean,FWLogSizeClass,FWLogSkew,FWLogSkewClass,FWLogSort,FWLogSortCLass,McLogKurt,McLogKurtClass,...,MoMLogSkew,MoMLogSkewClass,MoMLogSort,MoMLogSortCLass,PrimaryMode_mm,PrimaryMode_phi,SecondaryMode_mm,SecondaryMode_phi,TertiaryMode_mm,TertiaryMode_phi
0,0.878,Platykurtic,0.911,Coarse Sand,-0.396,Strongly coarse-skewed,2.055,Very poorly sorted,-,-,...,-0.552,Coarse skewed,1.964,Poorly sorted,0.176777,2.5,-,-,-,-
1,1.134,Leptokurtic,1.422,Medium Sand,-0.361,Strongly coarse-skewed,1.689,Poorly sorted,-,-,...,-0.797,Coarse skewed,1.671,Poorly sorted,0.176777,2.5,2,-1,-,-
2,1.192,Leptokurtic,1.251,Medium Sand,-0.399,Strongly coarse-skewed,1.832,Poorly sorted,-,-,...,-0.823,Coarse skewed,1.798,Poorly sorted,0.176777,2.5,4.75683,-2.25,-,-
3,0.85,Platykurtic,0.791,Coarse Sand,-0.389,Strongly coarse-skewed,2.104,Very poorly sorted,-,-,...,-0.516,Coarse skewed,2.002,Very poorly sorted,0.176777,2.5,4.75683,-2.25,0.5,1
4,0.972,Mesokurtic,1.046,Medium Sand,-0.395,Strongly coarse-skewed,1.905,Poorly sorted,-,-,...,-0.651,Coarse skewed,1.841,Poorly sorted,0.176777,2.5,2,-1,0.5,1
5,0.991,Mesokurtic,1.051,Medium Sand,-0.393,Strongly coarse-skewed,1.929,Poorly sorted,-,-,...,-0.663,Coarse skewed,1.875,Poorly sorted,0.176777,2.5,4.75683,-2.25,0.5,1
6,1.074,Mesokurtic,1.203,Medium Sand,-0.407,Strongly coarse-skewed,1.814,Poorly sorted,-,-,...,-0.75,Coarse skewed,1.774,Poorly sorted,0.176777,2.5,2,-1,0.5,1
7,1.043,Mesokurtic,1.002,Medium Sand,-0.403,Strongly coarse-skewed,1.961,Poorly sorted,-,-,...,-0.719,Coarse skewed,1.895,Poorly sorted,0.176777,2.5,4.75683,-2.25,-,-
8,0.884,Platykurtic,0.621,Coarse Sand,-0.446,Strongly coarse-skewed,2.221,Very poorly sorted,-,-,...,-0.596,Coarse skewed,2.054,Very poorly sorted,4.756828,-2.25,0.176777,2.5,-,-
9,1.092,Mesokurtic,1.176,Medium Sand,-0.393,Strongly coarse-skewed,1.836,Poorly sorted,-,-,...,-0.772,Coarse skewed,1.829,Poorly sorted,0.176777,2.5,4.75683,-2.25,0.5,1


#### Save results to a Comma-Separated Values (csv) text file:

In [46]:
## saves the contents of the df_out dataframe to a csv file:
df_out.to_csv('./multi_sample_results.csv')

## the file is written to the local subdirectory. You can change this by modifyng the 
## path string in the .to_csv directive.