# Displacement Correction for Sensor Compliance

## Last Updated: December 19, 2021

This document was prepared by Leah Ginsberg, a member of the [Ravichandran Research Group](https://www.ravi.caltech.edu/) at [Caltech](http://www.caltech.edu) in collaboration with [Professor Eleftheria Roumeli](https://sites.google.com/uw.edu/roumeli-research-group/) from [University of Washington](https://www.washington.edu/). 

<img src="caltech_uw_logo.png">

*These instructions were generated from a Jupyter notebook.  You can download the notebook [here](microcompress_corr_displ.ipynb).*

In [1]:
import numpy as np               # general math operations
import pandas as pd              # pandas dataframes
import os                        # for opening .csv files in different folders
import altair as alt             # pretty plotting
import altair_catplot as altcat  # pretty box & whisker plots
from scipy import signal, stats, fftpack  # filtering and statistics
import matplotlib.pyplot as plt  # plotting histogram

alt.data_transformers.enable('data_server') # uses a background server for Altair plots (helps with performance for large data sets)

DataTransformerRegistry.enable('data_server')

In this notebook, I will detail the preprocessing of force-indentation data from micro-compression tests on *Nicotiana tabacum* (BY2) cells in various drug treatments. I will explain each step in detail, including the use of the very handy Python package `pandas`. This package allows us to use Pandas `DataFrames`, which are convenient structures for storing data.

The data preprocessing steps executed in this notebook are as follows:
1. Import the raw data from microcompression experiments ($Z$, $F$)
2. Calculate the sensor stiffness from filtered calibration data ($S$)
3. Calculate the corrected displacement
4. Export corrected displacement data for contact finding

### 1. Import the raw data from microcompression experiments ($Z$, $F$)

In this step, we will load data stored on disk into a Python data structure.  We will use `pandas` to read in CSV (comma separated value) files and store the results in the very handy Pandas `DataFrame`. First, we need to organize the data according to the treatment each cell received. If you downloaded our data directly from Github, the files should already be organized in folders that are named according to the drug treatment the cell received. We can use that to organize our data. You'll probably need to change the first line to specify the path where the files are contained on your local computer.

In [2]:
path = 'microcompress_data'
dir_list = os.listdir(path) # prints a list of strings with all folder names in the specified path

# folder names for each drug treatment
sorbitol_folderName = path + '/BY2_in_sorbitol/'
C2_folderName = path + '/BY2_in_C2/'
ory_folderName = path + '/BY2_in_oryzalin/'
LatB_folderName = path + '/BY2_in_LatB/'
ory_sorbitol_folderName = path + '/BY2_in_oryzalin_sorbitol/'
LatB_sorbitol_folderName = path + '/BY2_in_LatB_sorbitol/'
Ref_folderName = path + '/Refs/'

# combine all the lists of folder names into one list
folderNames = [C2_folderName, ory_folderName, LatB_folderName, sorbitol_folderName, ory_sorbitol_folderName, LatB_sorbitol_folderName, Ref_folderName]

# print everything to make sure it worked
print(folderNames)

['microcompress_data/BY2_in_C2/', 'microcompress_data/BY2_in_oryzalin/', 'microcompress_data/BY2_in_LatB/', 'microcompress_data/BY2_in_sorbitol/', 'microcompress_data/BY2_in_oryzalin_sorbitol/', 'microcompress_data/BY2_in_LatB_sorbitol/', 'microcompress_data/Refs/']


Next, we need to determine which files are calibration data, which are sample data, and which are neither (images).

In [3]:
# Get names of all files to be used as calibration data or sample data
sample_fileNames = []
calib_fileNames = []
for folderName in folderNames:
    dir_list = os.listdir(folderName)
    for fileName in dir_list:
        if fileName.startswith('ref') and not ('.' in fileName) and not ('_' in fileName):
            calib_fileNames.append(folderName + fileName)
        elif fileName.startswith('cell') and not ('.' in fileName) and not ('_' in fileName):
            sample_fileNames.append(folderName + fileName)
            
# Check how many files it found
len(calib_fileNames), len(sample_fileNames)

(16, 98)

If all went right, you should have 16 entries in `calib_fileNames` and 98 entries in `sample_fileNames`. Now we can use `pd.read_csv()` to load the data set.  The data are stored in a **DataFrame**, which is one of the data types that makes `pandas` so convenient for use in data analysis.  `DataFrame`s offer mixed data types, including incomplete columns, and convenient slicing, among many, many other convenient features. They are like spreadsheets, only a lot better.

Let's load the calibration data into one dataframe, and the sample data into another.

In [4]:
# Organize dataframe by column with additional information about experiment not found in .csv file (date, day of growth, etc.)
df_calib = pd.DataFrame(columns=['Index','Time (s)','Displacement (um)','Pos X (um)','Pos Y (um)','Pos Z (um)','Force A (uN)','Force B (uN)','fileName','indenting'])

# loop through all calibration data
for i, fileName in enumerate(calib_fileNames):
    # read data from file to dataframe
    df = pd.read_csv(fileName, header=3, sep='\s+', names=['Index', 'Time (s)', 'Displacement (um)', 'Pos X (um)', 'Pos Y (um)', 'Pos Z (um)', 'Force A (uN)', 'Force B (uN)'])

    df['fileName']=fileName
    df['indenting']=df['Index']==1.001 # in data files, Index 1.001 indicates indenting data, and Index 2.001 indicates retraction data

    # concatenate all calibration dataframes into one dataframe
    df_calib = pd.concat([df_calib, df])

# Take a look at the dataframe we created
df_calib.head()

Unnamed: 0,Index,Time (s),Displacement (um),Pos X (um),Pos Y (um),Pos Z (um),Force A (uN),Force B (uN),fileName,indenting
0,1.001,0.0,-0.00025,-325.623,-458.0295,-4.81675,-9.51622,0.00301,microcompress_data/Refs/ref1,True
1,1.001,0.108,0.03925,-325.623,-458.0295,-4.85625,-8.83212,-0.00463,microcompress_data/Refs/ref1,True
2,1.001,0.209,0.063,-325.623,-458.02975,-4.88,-9.00553,0.00142,microcompress_data/Refs/ref1,True
3,1.001,0.31,0.0765,-325.623,-458.03,-4.8935,-8.93871,-0.00686,microcompress_data/Refs/ref1,True
4,1.001,0.409,0.09125,-325.623,-458.03,-4.90825,-9.00394,-0.00956,microcompress_data/Refs/ref1,True


In [5]:
# Organize dataframe by column with additional information about experiment not found in .csv file (date, day of growth, etc.)
df_sample = pd.DataFrame(columns=['Index','Time (s)','Displacement (um)','Pos X (um)','Pos Y (um)','Pos Z (um)','Force A (uN)','Force B (uN)','fileName','indenting'])

# loop through all sample data
for j, fileName in enumerate(sample_fileNames):
    # read data from file to dataframe
    df = pd.read_csv(fileName, header=4, sep='\s+', names=['Index', 'Time (s)', 'Displacement (um)', 'Pos X (um)', 'Pos Y (um)', 'Pos Z (um)', 'Force A (uN)', 'Force B (uN)'])

    df['fileName']=fileName
    df['indenting']=df['Index']==1.001 # in data files, Index 1.001 indicates indenting data, and Index 2.001 indicates retraction data

    # Take information about experiment from filename and put into dataframe
    if 'sorbitol' in fileName:
        df['plasmolyzed'] = True
    else:
        df['plasmolyzed'] = False
    
    if 'LatB' in fileName:
        df['treatment'] = 'LatB'
    elif 'oryzalin' in fileName:
        df['treatment'] = 'oryzalin'
    else:
        df['treatment'] = 'C2'

    # concatenate all sample dataframes into one dataframe
    df_sample = pd.concat([df_sample, df])

# Take a look at the dataframe we created
df_sample.head()

Unnamed: 0,Index,Time (s),Displacement (um),Pos X (um),Pos Y (um),Pos Z (um),Force A (uN),Force B (uN),fileName,indenting,plasmolyzed,treatment
0,1.001,0.123,0.20325,184.00625,-106.001,49.77075,-10.5409,-0.00129,microcompress_data/BY2_in_C2/cell001,True,False,C2
1,1.001,0.228,0.275,184.007,-106.00125,49.699,-10.88703,-0.00336,microcompress_data/BY2_in_C2/cell001,True,False,C2
2,1.001,0.346,0.30325,184.007,-106.00075,49.67075,-10.78654,-0.00192,microcompress_data/BY2_in_C2/cell001,True,False,C2
3,1.001,0.466,0.32,184.00625,-106.001,49.654,-10.74826,0.00253,microcompress_data/BY2_in_C2/cell001,True,False,C2
4,1.001,0.595,0.34275,184.006,-106.00075,49.63125,-10.73709,0.00397,microcompress_data/BY2_in_C2/cell001,True,False,C2


This data is in a format that is known as **tidy**. See this [paper](https://www.jstatsoft.org/article/view/v059i10) by Hadley Wickam for a detailed discussion of tidying data. The basic structure of tidy data is:
1. Each variable is a column
2. Each observation is a row
3. Each type of observational unit is its own table

With our data set up in this way, we can easily find data, say, with a specific treatment using Boolean slicing.

### 2. Calculate the sensor stiffness from calibration ($S$)

To get the corrected displacement we follow: Routier-Kierzkowska et al. (2012). Quoting the methods section of their paper "Cellular Force Microscopy for in Vivo Measurements of Plant Tissue Mechanics": 

>“During measurements on soft samples, the sensor probe indents the sample surface while the sensor’s beam springs bend by an amount depending on the applied load. Thus, for a given position $Z$ of the actuator, the actual probe tip $Z_{corrected}$ position relative to the sample is as follows: $Z_{corr} = Z + F/S$, where $F$ is the load measured by the sensor and $S$ is the sensor stiffness determined by calibration. The sign convention for $Z$ and $F$ vectors is the same (positive is pointing upward).”

First, we need to separate out data from the area of interest. We want an area of steady force increase on the indentation portion of the curve, after the sensor has made contact with the glass slide.

In [6]:
i=0
print('           fileName           | kref (N/m) ')
print('-----------------------------------------')
for fileName in df_calib['fileName'].unique():
    df = df_calib[df_calib['fileName']==fileName] # separate dfs by fileName
    df = df[df['indenting']] # only take indentation data

    # // glass kref
    # fit the last 0.5um of data to a line
    end_displ = df['Displacement (um)'].iloc[-1]
    df_glass = df[df['Displacement (um)']>=end_displ-0.5].copy()

    # Now perform a linear fit
    f = np.polyfit(df_glass['Displacement (um)'], df_glass['Force A (uN)'], 1)

    # Add linear fit to dataframe for plotting
    df_glass['linear fit F (uN)'] = df_glass['Displacement (um)']*f[0] + f[1]

    kref = f[0]
    print('{:^30}|{:^12}'.format(fileName,str(int(kref))))

    df_calib.loc[df_calib['fileName']==fileName,'kref']=kref

# plot example calibration data with fit
linfit_chart = alt.Chart(df_glass).mark_line().encode(
    x='Displacement (um)',
    y='linear fit F (uN)'
)

calib_chart = alt.Chart(df).mark_circle().encode(
    x='Displacement (um)',
    y='Force A (uN)'
)

(linfit_chart + calib_chart).configure_axis(grid=True,
    labelFontSize=15,
    titleFontSize=15)

           fileName           | kref (N/m) 
-----------------------------------------
 microcompress_data/Refs/ref1 |    217     
microcompress_data/Refs/ref10 |    200     
microcompress_data/Refs/ref11 |    233     
microcompress_data/Refs/ref12 |    251     
microcompress_data/Refs/ref13 |    288     
microcompress_data/Refs/ref14 |    263     
microcompress_data/Refs/ref15 |    215     
microcompress_data/Refs/ref16 |    201     
 microcompress_data/Refs/ref2 |    307     
 microcompress_data/Refs/ref3 |    314     
 microcompress_data/Refs/ref4 |    291     
 microcompress_data/Refs/ref5 |    389     
 microcompress_data/Refs/ref6 |    204     
 microcompress_data/Refs/ref7 |    249     
 microcompress_data/Refs/ref8 |    220     
 microcompress_data/Refs/ref9 |    223     


*Note* - If your Altair chart appears blank when you try running the code on your machine, you can try the following:
1. Update your Anaconda distribution and all installed packages by running `conda update conda` and then `conda update --all` in the command line.
2. Turn off any adblockers.
3. Comment out the last line of the first code block in this notebook, which uses a background Altair server to transform the data.

In [7]:
# Use average of all reference measurements as sensor stiffness
kref = np.average(df_calib['kref'].unique())
print('The calculated sensor stiffness is {} N/m.'.format(str(int(kref))))

The calculated sensor stiffness is 254 N/m.


### 3. Calculate the corrected displacement

For our process, the displacement is positive for downward movement, and force is increasing as we push downward, so we should have the transformation as

\begin{equation}
    Z_\text{corrected} = Z – \frac{F}{S}
\end{equation}

where $Z_\text{corrected}$ is the corrected displacement, $Z$ is the uncorrected displacement, $F$ is the uncorrected load measured by the sensor, and $S$ is the sensor stiffness determined by calibration. See the figure below.

Let's try correcting one of our data sets to illustrate the process. Then, we'll construct a loop to perform the correction for all data sets.

In [8]:
fileName = 'microcompress_data/BY2_in_C2/cell001'
df = df_sample[df_sample['fileName']==fileName].copy()
# calculate corrected displacement
Z = df['Displacement (um)']
F = df['Force A (uN)']
Zcorr = Z-F/kref

# save results in dataframe
df['Corrected Displacement (um)'] = Zcorr
df_sample.loc[df_sample['fileName']==fileName,'Corrected Displacement (um)'] = Zcorr

# Plot results
orig_chart = alt.Chart(df).mark_square().encode(
    x='Displacement (um)',
    y='Force A (uN)',
    color=alt.value('black')
)
corr_chart = alt.Chart(df).mark_circle().encode(
    x='Corrected Displacement (um)',
    y='Force A (uN)',
    color=alt.value('red')
)
(orig_chart + corr_chart).configure_axis(grid=True,
    labelFontSize=15,
    titleFontSize=15).interactive()

The uncorrected data is plotted above in black, and the corrected data is plotted in red. The uncorrected, or raw data, appears softer (lower force achieves the same displacement value) than the pure reaction of the cell because the sensor is also compliant. The corrected data subtracts out this sensor compliance.

Let's construct a loop to correct all data sets.

In [9]:
i=0
for fileName in df_sample['fileName'].unique():
    df = df_sample[df_sample['fileName']==fileName].reset_index(drop=True)
    
    # calculate corrected displacement
    Z = df['Displacement (um)']
    F = df['Force A (uN)']
    Zcorr = Z-F/kref
    
    # save results in dataframe
    df_sample.loc[df_sample['fileName']==fileName,'Corrected Displacement (um)'] = Zcorr

Now I'll plot all of the corrected data, separated by treatment.

In [10]:
# Data from turgid cells in C2 (growth media)
alt.Chart(df_sample[(~df_sample['plasmolyzed'].astype('bool'))&(df_sample['treatment']=='C2')]).mark_square().encode(
    x='Corrected Displacement (um)',
    y='Force A (uN)',
    color = 'fileName:N'
).configure_axis(grid=True,
    labelFontSize=15,
    titleFontSize=15).interactive()

In [11]:
# Data from turgid cells in LatB (removes AFs)
alt.Chart(df_sample[(~df_sample['plasmolyzed'].astype('bool'))&(df_sample['treatment']=='LatB')]).mark_square().encode(
    x='Corrected Displacement (um)',
    y='Force A (uN)',
    color = 'fileName:N'
).configure_axis(grid=True,
    labelFontSize=15,
    titleFontSize=15).interactive()

In [12]:
# Data from turgid cells in oryzalin (removes MTs)
alt.Chart(df_sample[(~df_sample['plasmolyzed'].astype('bool'))&(df_sample['treatment']=='oryzalin')]).mark_square().encode(
    x='Corrected Displacement (um)',
    y='Force A (uN)',
    color = 'fileName:N'
).configure_axis(grid=True,
    labelFontSize=15,
    titleFontSize=15).interactive()

In [13]:
# Data from plasmolyzed cells in C2
alt.Chart(df_sample[(df_sample['plasmolyzed'])&(df_sample['treatment']=='C2')]).mark_square().encode(
    x='Corrected Displacement (um)',
    y='Force A (uN)',
    color = 'fileName:N'
).configure_axis(grid=True,
    labelFontSize=15,
    titleFontSize=15).interactive()

In [14]:
# Data from plasmolyzed cells in LatB
alt.Chart(df_sample[(df_sample['plasmolyzed'])&(df_sample['treatment']=='LatB')]).mark_square().encode(
    x='Corrected Displacement (um)',
    y='Force A (uN)',
    color = 'fileName:N'
).configure_axis(grid=True,
    labelFontSize=15,
    titleFontSize=15).interactive()

In [15]:
# Data from plasmolyzed cells in oryzalin
alt.Chart(df_sample[(df_sample['plasmolyzed'])&(df_sample['treatment']=='oryzalin')]).mark_square().encode(
    x='Corrected Displacement (um)',
    y='Force A (uN)',
    color = 'fileName:N'
).configure_axis(grid=True,
    labelFontSize=15,
    titleFontSize=15).interactive()

Notice that plasmolyzed cells generally have a sharper upturn in force-displacement than turgid cells, and less area between the indentation and retraction curves. The area between the curves corresponds to energy dissipated by the cells during compression, and turgid cells dissipate a much more significant amount of energy than plasmolyzed cells.

Finally, let's print the corrected data to new csv files so that we can use the corrected data easily in the .ipynb file for finding the contact point.

In [16]:
for fileName in df_sample['fileName'].unique():
    df = df_sample[df_sample['fileName']==fileName].reset_index(drop=True)
    df.to_csv(fileName+'_corr', sep='\t')

## Computing Environment

In [17]:
%load_ext watermark

In [18]:
%watermark -v -p numpy,pandas,altair,jupyterlab

Python implementation: CPython
Python version       : 3.8.12
IPython version      : 7.29.0

numpy     : 1.21.2
pandas    : 1.3.4
altair    : 4.1.0
jupyterlab: 3.2.1



*Portions of these instructions were adapted from tutorials originally created [Professor Justin Bois](http://bois.caltech.edu/) at [Caltech](http://www.caltech.edu) for his excellent course in [Data Analysis in the Biological Sciences](http://bebi103.caltech.edu.s3-website-us-east-1.amazonaws.com/2018/index.html).* 