# General Usage

*py-smps* is a general-purpose library meant for making the analysis of size-resolved aerosol data a bit easier. It is not meant to cover all possible instruments or use cases (at least not yet!). This guide should give you an overview of the capabilities of the software, but will not be completely comprehensive. Please read through the API documentation if you have any questions or post to the discussions on the GitHub repository.

## Importing Data

The data format for each type of sensor/analyzer is going to be different depending on manufacturer's specifications. There are a few helper functions for common sensors (i.e., the SMPS from TSI), and generally, we need the data to meet the following requirements:

  * the raw data should be a DataFrame (`pd.DataFrame`). If you're unfamiliar with DataFrame's in python, it may be a good idea to read up on the `pandas` library before moving on
  * the index of the DataFrame should be a time series
  * there should be a unique column for every particle size bin
  
So long as the above requirements are met, you should be able to analyze any sensor or instrument!

In [None]:
import warnings
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl
import smps

sns.set("notebook", style="ticks", font_scale=1.25, palette='colorblind')
smps.set()

warnings.simplefilter(action='ignore', category=FutureWarning)

%matplotlib inline

print (smps.__version__)

To get started, we need data! Here, we use the `smps.io.load_sample` function to import a sample SMPS data set:

In [None]:
# Load the 'boston' example
bos = smps.io.load_sample("boston")

If you are trying to analyze data from an SMPS, there is a file loader function available (`smps.io.load_file`); however, each version of the TSI AIM software is different, making it incredibly difficult to make a general-purpose loader. If this function doesn't work for your version of AIM data outputs, please raise an issue in the GitHub repository and paste your file and we can come up with a solution.

If your data is simply a csv, you can use native pandas to load your data and ensure it's in the correct format. There are also some sample data files located in the `tests` directory if needed.

In [None]:
# Load a sample data set from an 
df = pd.read_csv("https://raw.githubusercontent.com/quant-aq/py-smps/master/tests/datafiles/MOD-PM-SAMPLE.csv")

# Conver the timestamp to be a datetime object
df["timestamp"] = df["timestamp"].map(pd.to_datetime)

# Set the index to be a timestamp
df.set_index("timestamp", inplace=True)

df.info()

In the data above, from a QuantAQ MODULAIR-PM sensor, you will see that there are a number of bins labeled in format "bin<x>" - these are the particle concentrations at various size ranges and is the raw data we need. You will notice there are quite a few other columns - this is totally fine! These will all be treated as 'meta' columns by the software.
    
Now that we have the data in the proper format, we can go ahead and initiate the `GenericParticleSizer` class.

## The `GenericParticleSizer` Object

The heart of the *py-smps* program is the `GenericParticleSizer` object. The `GenericParticleSizer` is the base class for all available particle sizing instruments. It contains all of the basic functionality and methods used for making calculations and/or figures. To initialize the object, you must provide the `data` and `bins` arguments. `data` must be a pandas DataFrame with a datetime index, and `bins` must be a 3xn array including the left boundary, midpoint, and right boundary for each size bin. **NOTE: There is a helper function to make the 3xn array from a list of endpoints (see `smps.utils.make_bins`)**.

There are several additional classes that inherit directly from the `GenericParticleSizer` pertaining to individual products. As of February 2022, these include:

  * `SMPS`
  * `Grimm11D`
  * `POPS`
  * `ParticlesPlus`
  * `AlphasenseOPCN2`
  * `AlphasenseOPCN3`
  * `Modulair`
  * `ModulairPM`

We will use the above MODULAIR-PM data we loaded to walk through some of the functionality of the `GenericParticleSizer` class. We will begin by exploring some of the models attributes before moving on to the methods.

### Attributes

First, we will initialize the object:

In [None]:
obj = smps.models.ModulairPM(data=df)

To start, you can access the bins for the device using the `bins` attribute:

In [None]:
obj.bins

You can access just the midpoints using the `midpoints` attribute:

In [None]:
obj.midpoints

To access any meta data/additional columns that do not belong as the raw data, use the `scan_stats` attribute:

In [None]:
obj.scan_stats.head()

Finally, there are a number of dataframes available that should provide access to all sorts of fun data. These include:

  * `dn` - particle number concentration by bin
  * `ds` - surface area by bin
  * `dv` - volume by bin
  * `dndlogdp` - normalized number by bin
  * `dsdlogdp` - normalized surface area by bin
  * `dvdlogdp` - normalized volume by bin
  
Here, we show the number concentration by bin as an example:

In [None]:
obj.dn.head()

### Methods

There are several primary methods available under the `GenericParticleSizer`, including:

  * `copy` - create a copy of the existing model
  * `resample` - resample the data to be on a different time basis
  * `slice` - slice the data between specific start and stop times
  * `stats` - calculate the total number of particles, surface area, volume, and mass
  * `dump` - save a copy of the model to file
  * `integrate` - calculate the total number, surface area, volume, or mass of particles between two diameters

Above, our data is on a 1-minute time base. What if we want it on a 15min time base? No problem! Use `resample`:

In [None]:
obj.resample("15min", inplace=True)

# Show the results
obj.dn.head()

What if we want to count the total number of particles between 0 µm and 1 µm?

In [None]:
obj.integrate(weight='number', dmin=0., dmax=1.)

What about computing PM2.5?

In [None]:
obj.integrate(weight='mass', dmin=0., dmax=2.5, rho=1.65)

The `integrate` method can also be incredibly useful if you are trying to calculate the total number of particles between two random diameters - say, you're trying to compare a low-cost optical particle counter with large bins to an SMPS with small bins. No problem! 

In [None]:
obj.integrate(weight='number', dmin=0.54, dmax=1.05)

Next, let's go ahead and compute the statistics! The `stats` method will compute the total number of particles, total surface area, total volume, total mass, arithmetic mean diameter, geometric mean diameter, mode diameter, and geometric standard deviation. **NOTE: This computation can take a little while if your data set is extremely large.**

In [None]:
obj.stats(weight='number')

You can also weight it differently if you so choose! If you want the volume-weighted geometric mean diameter:

In [None]:
obj.stats(weight='volume')

Finally, if you want to save your model so that you don't have to re-do your work later, you can use the `dump` method:

In [None]:
obj.dump("obj-modulair-pm.sav")

## Visualization

Making common figures is easy with `py-smps`. There are two primary, out-of-the-box figures that can be made:

  * `smps.plots.histplot`
  * `smps.plots.heatmap`
  
These plots are all made using `matplotlib`, so you can easily modify them or create your own.

### Heatmap

The heatmap function makes it easy to visualize how the particle size distribution is changing over time, allowing you to observe growth/nucleation events, etc. To use the `heatmap` function, you must provide three arguments:

  * `X`: the time axis
  * `Y`: the bin midpoints
  * `Z`: the data you wish to plot, typically `obj.dndlogdp`
  
You may not agree with the default colormap choice (`viridis`), but you can easily change that as you see fit. Please don't use `jet`!

In [None]:
X = bos.dndlogdp.index
Y = bos.midpoints
Z = bos.dndlogdp.T.values

ax = smps.plots.heatmap(
    X, Y, Z,
    cmap='viridis',
    fig_kws=dict(figsize=(14, 6))
)

# Let's make the x-axis look a little nicer
import matplotlib.dates as dates

ax.xaxis.set_minor_locator(dates.HourLocator(byhour=[0, 6, 12, 18]))
ax.xaxis.set_major_formatter(dates.DateFormatter("%d\n%b\n%Y"))

# Go ahead and change things!
ax.set_title("Cambridge, MA Wintertime SMPS Data", y=1.02, fontsize=20);

### Particle Size Distribution

To visualize the particle size distribution, use the `smps.plots.histplot` function. To plot the histogram, you must provide two pieces of information:

  1. `histogram` - your histogram data! it can be provided as an array or as a dataframe in which case it will be averaged over time
  2. `bins` - you must provide an array of the bins
  
There are plenty of ways to customize these plots by providing keyword arguments for the matplotlib bar chart (`plot_kws`) or the figure itself (`fig_kws`). You can also plot onto an existing axis by providing the axis as an argument.

Here, we will make a simply plot showing the Boston dataset from the beginning of this tutorial (it's more exciting than a low-cost OPC!):

In [None]:
ax = smps.plots.histplot(
    bos.dndlogdp, 
    bos.bins, 
    plot_kws=dict(linewidth=0.01),
    fig_kws=dict(figsize=(12, 6))
)

# Set the title and y-axis label
ax.set_title("Cambridge, MA Wintertime Particle Size Distribution", y=1.05, fontsize=20)
ax.set_ylabel("$dN/dlogD_p \; [cm^{-3}]$")

# Remove the right and top spines
sns.despine()

Next, let's plot the same data, but plot the particle size distribution for each day separately.

In [None]:
import itertools

dates = ["2016-11-23", "2016-11-24", "2016-11-25"]

ax = None
cp = itertools.cycle(sns.color_palette())

for date in dates: 
    ax = smps.plots.histplot(
        bos.dndlogdp[date],
        bos.bins,
        ax=ax,
        plot_kws=dict(alpha=0.6, color=next(cp), linewidth=0.),
        fig_kws=dict(figsize=(12, 6))
    )
    
# Add a legend
ax.legend(dates, loc='best')

# Set the axis label
ax.set_ylabel("$dN/dlogD_p \; [cm^{-3}]$")

# Remove the right and top spines
sns.despine()

That about covers the general overview! Feel free to check out the guide on curve fitting next.