# Tutorial
--------------------------------------------------------------------------------
## Application of Logistic Regression on AFM Data to classify surface contact 

This is a tutorial showing an example of applying logistic regression on AFM to classify them into two categories based on whether the AFM tip has made contact with the material surface or not.  


# Basic Information

The source code for this tutorial has been posted on GitHub. 
The tutorial uses Juypter Notebook for  publishing its documentation. 

# Prerequisites 

A GitHub account
Anaconda Navigator 
Python 3.0


# Table of Contents

## 1. Import Basic Functions
## 2. Import Data 
## 3. Organizing into Directory
## 4. Training Logistic Regression
## 5. Testing Logistic Regression 
## 6. Results Visualization  

## How to use this Tutorial

1. Make sure you have all the required pre-requisities written above.
2. Open the README for a basic introduction of the tutorial
3. Download the code file from GitHub on your PC
4. Read through the code and understand the working of each function. Proper documentation is given to for a detailed explanation. 
5. Run the code and analyze the results.



## Contributing 

Please feel free to contribute ideas to further improve this tutorial. 
Pull requests 
I'll link the "main fork" from here.

## Author

Arjun Gupta (gupta568@purdue.edu)
Ryan B Wagner (rbwagner@purdue.edu)





# 1. Importing Packages 


This tutorial requires primarily all the basic functions needed by a mathematical alogrithm. These include the NumPy library, Matplotlib, Panda and Math libraries. Furthemore, the logistic regression tools require the Scikit-Learn library. 

Scikit Learn brief explanation 
Igor function explanation - library for importing ibw file format into python dictionary 
Scipy.inerpolate brief explation 



In [1]:
import matplotlib.pyplot as plt 
import numpy as np
import pandas as pd
#import math 
from sklearn.linear_model import LogisticRegression 
from sklearn.model_selection import train_test_split
#from sklearn.metrics import classification_report, confusion_matrix
import scipy.interpolate as interp
import igor.binarywave as ibw           # https://pypi.org/project/igor/
import glob

# 2. Reading AFM Dynamic Approach Curve Files 

AFM Dynamic Curves Brief description 
We need to import the different classes of data into the code using the glob.glob function. 
We define a dictionary 'filelist' containing all our input data. 


In [2]:
filelist_onsurface = glob.glob('Data\\On_Surface\*.ibw')  
filelist_offsurface = glob.glob('Data\\Off_Surface\*.ibw')  
filelist_unlabeled = glob.glob('Data\\Unlabeled\*.ibw')


filelist = [filelist_onsurface, filelist_offsurface, filelist_unlabeled]



# 3. Organizing Curves into Dictionary 

A Dictionary is a python structure that stores objects referenced by keys, whee a key is a desciptior that returns a specfic object from the dictionary. 

Keys are specified in dictionaries to reference specific file names. Here we create two keys: Key 1: whether file is On Surface or Off Surface file, Key 2: Specific file name. This section also creates a key in the dictionary in case one does not already exist. 

All of this is done to navigate easily through the dictionary and be able to reference a specific object of the dictionary when required. 



In [3]:
data = {}                                           # Master dictionary 

for filetype in filelist:         
    for file in filetype: 
        indata = ibw.load(file)                     # Loads data from ibw file into indata 
        
        
        key1 = file.split('\\')[0]                  # key1 is On or Off surface state 
        key2 = file.split('\\')[1][0:-4]            # key2 is specific file name 
        
        try:                                        # Creates dictionary key if it does not exist 
            data[key1] 
        except: 
            data[key1] = {}            
        
        try: 
            data[key1][key2] 
        except: 
            data[key1][key2] = {}

# 4. Visualizing Amplitude vs Z Curves and Phase vs Z Curves 

Amplitude is the amplitude of the cantilever oscillation
Phase is the phase of the cantilever oscillation relative to the peroidic signal used to excite the cantilever 
Z is the surface displacement ( position of the surface)
Helps us plot the input curves to better visualize the shape and patterns of the different classes of data ( On Surface, Off Surface, Unlabeled) 
Note: There is no steep change in slope of AZ Curves for Off Surface plots i.e no contact whereas there is a sharp dip in AZ Curves of On Surface plots


In [4]:
plt.figure()            # Plots single curve 
plt.plot(data['Off_Surface']['AZ_0101']['ZSnsr'],data['Off_Surface']['AZ_0101']['Amp'])    

count = 1 
plt.figure()            # Plots all curves 
for key1 in data.keys(): 
    plt.subplot(3,1,count)
    plt.title(key1)
    for key2 in data[key1].keys():
        plt.plot(data[key1][key2]['ZSnsr'],data[key1][key2]['Amp']) 
    count = count + 1

plt.tight_layout()

KeyError: 'Off_Surface'

<Figure size 432x288 with 0 Axes>

# 5. Training Logistic Regression Model 

We use the Logistic Regression Function in the Scikit Learn Library to train the model. 
There are two components to training a mode: x_train and y_train. 
Our choice of descriptors for this code is the Amplitudes. 
We regularize the Amplitudes using the concept of interpolation. It is important to regularize the number of desctiptors in each curve as each curve has a different number of points and a logistic model can only take one constant number of desciptors into account for prediction. 

In [None]:
model = LogisticRegression(solver='liblinear', random_state=0)

x_train = []
y_train = []

Zpts = 50 
for key1 in list(data.keys())[0:2]: 
    for key2 in data[key1].keys(): 
        interp_func = interp.interp1d(data[key1][key2]['ZSnsr'] ,data[key1][key2]['Amp']*1e9 )              # Creates interpolation function
        Zmin = np.min(data[key1][key2]['ZSnsr'])                        
        Zmax = np.max(data[key1][key2]['ZSnsr'])
        Zintp = np.linspace(Zmin,Zmax,Zpts)                                                                 # Creates interpolation points
        Ampintp = interp_func(Zintp)                                                                        # Does interpolation 
        if True in np.isnan(Ampintp): 
            print('Warning: Nan on interp:' + key1 + ', ' + key2)        
        else: 
            x_train.append(Ampintp)                                                                         # Builds xtrain 
            if key1 == 'On Surface':                                                                        # Builds ytrain 
                y_train.append(1)
            else: 
                y_train.append(0)

   

# 6. Testing logistical regression model 

Here we test the trained model over a specific index curve. 
We interpolate the amplitudes over the curves for regularization. 
Both the prediction and the true value for the AZ curve are printed. 


In [None]:
test_index = 55                                                                                             # Test index 
pred_data = model.predict([x_train[test_index]])                                                            # Does test 
print('Prediction: ', pred_data[0], ', Truth: ', y_train[test_index])

key1 = 'Unlabeled'
for key2 in data[key1].keys():
    interp_func = interp.interp1d(data[key1][key2]['ZSnsr'] ,data[key1][key2]['Amp']*1e9 )              # Creates interpolation function
    Zmin = np.min(data[key1][key2]['ZSnsr'])                        
    Zmax = np.max(data[key1][key2]['ZSnsr'])
    Zintp = np.linspace(Zmin,Zmax,Zpts)                                                                 # Creates interpolation points
    Ampintp = interp_func(Zintp) 
    
    pred_data = model.predict([Ampintp])                                                            # Does testing 
    print(key2, '  Prediction : ', pred_data[0], ', Truth: ')

# 7. Results Visualization 

Plots the specific AZ Curve we apply the algorithm on. This makes it much easier for the user to understand the application of the code and easily verify whether the result is right or not. 

In [None]:
plt.figure()            # Plots single curve 
plt.plot(data['Unlabeled']['AZ_0000']['ZSnsr'],data['Unlabeled']['AZ_0000']['Amp'])
plt.title('Unlabeled AZ curve')
plt.xlabel('Z-sensor')
plt.ylabel('Amplitude')
#plt.ylim([8e-8, 9e-8])

# 8. Future Improvements

This code can be made more accurate through two main methods. 
1. The desciptors can be changed ( Currently using amplitudes) such that they offer a more accurate prediction of the given data 
2. The method of regularizing ( currently using interpolation) can be altered to fit the points of each AZ curve better which would make the predictions more accurate. 

Any further inputs on how to further improve this code are welcome. Please contact the Authors at the emails mentioned in the header. 

We hope this tutorial was helpful

Thank you!