# Using the GrainSizeTools script through Jupyter notebook: a step-by-step example

> IMPORTANT NOTE: This Jupyter notebook example only applies to GrainSizeTools v3.0+ Please check your script version before using this notebook. You will be able to reproduce all the results shown in this tutorial using the dataset provided with the script, the ```file data_set.txt```

## Running the script

The first step when working with Jupyter notebooks is to run the script. For this we use the following code snippet:

In [1]:
# Run the script (change the path to GrainSizeTools_script.py!)
%run C:/Users/marco/Documents/GitHub/GrainSizeTools/grain_size_tools/GrainSizeTools_script.py

module plot imported
module averages imported
module stereology imported
module piezometers imported
module template imported

Welcome to GrainSizeTools script v3.0beta1
GrainSizeTools is a free open-source cross-platform script to visualize and characterize
the grain size in polycrystalline materials and estimate differential stress via
paleopizometers.

Get a list of the main methods using: get.function_list()



Note that for the above code execution to work you must set your own path that indicates where the file ``GrainSizeTools_script.py`` is located in your system. If the script was executed correctly you will see that all GrainSizeTools (GST) modules have been loaded correctly and a welcome message.

## Import the example dataset and create a toy dataset

The second step is to import the file data_set.txt that comes with the GST script and contains a example dataset to interact with the script. You can use your own dataset if you prefer.

In [2]:
# we will use the pandas method read_csv and store the dataset in a variable named df
dataset = pd.read_csv('DATA/data_set.txt', sep='\t')
dataset

FileNotFoundError: [Errno 2] File DATA/data_set.txt does not exist: 'DATA/data_set.txt'

Some important things to note about the code snippet used above are:
- We used the ``read_csv`` method to import the file. By default, this method sets the separator to a comma (note that csv means comma-separated values), in this case the imported file is a text file separated by tabs and then we must set ``'\t'`` as separator.
- We didn't have to define the full path to find the file *data_set.txt*. This is because when we run the script, the reference working directory is the one that contains the script. Since our file *data_set.txt* is within this directory, specifically within the "DATA" folder, it is not necessary to define the complete path but the relative one. You can define the full path if you wish.
- When calling the variable ``df`` it returs a visualization of the dataset imported, which is a tabular-like dataset with 2661 entries and 11 columns with different grain properties.

In Python, this type of tabular-like objects are called (Pandas) dataframes and allow a flexible and easy to use data analysis. Just for checking:

In [None]:
type(dataset)  # show the variable type

If you want to view or interact with one of the columns you can do it as follows:

In [None]:
dataset['Area']  # view the column 'Area'

In [None]:
dataset.head()  # show the first columns of the dataset

As you can see above, our dataset does not contain any column with the grain diameters and therefore we have to estimate them. For that we are going estimate the apparent diameters from the section areas using the equivalent circular diameter (ECD) formula which is

$ECD = 2 \cdot \sqrt{areas / \pi}$

In [None]:
# Estimate the ECDs and store them in a column named 'diameters'
dataset['diameters'] = 2 * np.sqrt(dataset['Area'] / np.pi)
dataset.head(10)

You can see a new column named diameters. In addition, to play with the script we will create a synthetic lognormal dataset of size 500 with a scale (geometric mean) equal to 20 and a lognormal shape (MSD) of 1.5.

In [None]:
scale = np.log(20)  # set sample geometric mean to 20
shape = np.log(1.5)  # set the lognormal shape to 1.5

# generate a random lognormal population of size 500
np.random.seed(seed=1)  # this is to generate always the same population
toy_dataset = np.random.lognormal(mean=scale, sigma=shape, size=500)

## Getting a description of the grain size population

We are going to describe the properties of the grain size population. First, we are going to visualize the list of main functions.

In [None]:
get.functions_list()

So, the method to use for getting the properties of the data population is named ``summarize``. For this we call the function ```summarize()``` and pass the dataset. Let's do it with the synthetic data first:

In [None]:
summarize(toy_dataset)

The ``summarize`` method returns different central tendency estimators (averages), the confidence intervals, and several distribution features such as dispersion and shape. Also a Shapiro-Wilk warning telling as that our distribution is not normally distributed, which is to be expected since we know that this is a lognormal distribution. Note that the geometric mean and the lognormal shape are very close to the values used to generate the syntethic dataset. Let's do the same using the dataset that comes from a real rock, for this we have to pass the column with the diameters:

In [None]:
summarize(dataset['diameters'])

TODO
We will also visualize the grain size population using the ``plot.distribution()`` function of the plot module:

In [None]:
plot.distribution(dataset['diameters'])  # this is just to show the plot in the notebook

In [None]:
# If you want to save the plot in your hard disk
plt.savefig("test_distribution.png", dpi=150)