# Introduction

This notebook is a more compact introduction to simple and advanced analysis using KiMoPack.<br>
It was developed to support a live workshopand will go through basic analysis and some more <br>
advanced things. <br>

## The steps of an analysis

A typical analysis involves the following steps:

1. Data Import
2. Data Shaping + Plotting
3. (Optional) comparative analysis
4. Modelling
5. Reporting

In KiMoPack the analysis is normally guided by the workflow tools that can be downloaded here: 

``` python
    import KiMoPack
    KiMoPack.download_notebooks()
```
## Python Imports
KiMoPack is a python package that heavily relies on the python data analysis infrastructure.<br>
Each notebook starts with series of imports. Here we add an option that recognizes if this notebook<br>
is run on Colab. In this case we download the necessary datafiles.<br> 

In [None]:
import os,sys
import KiMoPack.plot_func as pf
import pandas as pd
import numpy as np
import matplotlib,lmfit
import matplotlib.pyplot as plt
from importlib import reload
reload(pf)
path_to_files = os.sep.join([os.getcwd(), "Data", "Introduction"])
pf.changefonts()
%matplotlib qt

import sys
if not sys.warnoptions:
    import warnings
    warnings.simplefilter("ignore")

def gauss(t,sigma=0.1,mu=0,scale=1):
	y=np.exp(-0.5*((t-mu)**2)/sigma**2)
	y/=sigma*np.sqrt(2*np.pi)
	return y*scale
FWHM=2.35482

# Data Import

In KiMoPack data is imported and handled in form of a Pandas DataFrame. <br>
The data is either read from disc by one of the import functions or given <br>
to the function in form of a dataframe that is commonly called "ds" for Dataset.<br> 
There are a lot of import options and an import API to adopt to any shape.<br>

to create a single transient absorption object. For work with many single scans please <br> 
see Tutorial 4 "Single scan handling", for comparative work please see tutorial 3 "Compare Fit".<br>

See the documentation with "pf.TA?" or the online documentation under<br>
    https://kimopack.readthedocs.io/en/latest/Opening.html for more details.<br>

## Finding the Filename and path
There are two general ways to provide the name and path to the files that are to be investigated. 
    
1. Either the filename and path to the file is provided. In this case the path is <br>
        either a single word (e.g. "Data" if all the data is in the relative folder <br>
        "data" or a path to files. All the usual ways to handle a path should work. <br>
        I prefer to create a path by creating a platform independent string 
2. instead of a filename the word **gui** is used, in which case a TKinter Gui opens <br> 
    and allows the user to select a file (and path) I recommend that directly after <br>
    the file is opened the use changes the work **gui** into **recent**. Because then the <br> 
    code will reopen last file that was opened before with the gui (permitting restarting)

In this example we will use the filename.

In [None]:
ta=pf.TA("con_1.SIA",path=path_to_files)

If no errors appear the import was most likely successful and we can continue with the inspection and shaping of data. <br> 
If there are troubles during or the data does not look like it should, the imported data is stored in the DataFrame **ta.ds_ori**  <br> 
and checking if this looks correct with "ta.ds_ori.head()" is a very good first step for finding the right input parameters.<br>

## Data Inspection, shaping and RAW plotting
The first step is usually to visually inspect the data. <br> 
In KiMoPack we use three plotting functions for all plotting tasks and three functions for comparative plotting (see Tutorial 3)
``` python
ta.Plot_RAW()
ta.Plot_fit_output()
ta.Plot_Interactive()
```
All plot functions plot in their standard call (as above) multiple plots simultaneously. <br> 
The first argument is a list that calls all the plots that one chooses. For RAW plotting the default is:
``` python
ta.Plot_RAW(range(4))
```

Here we choose to only look at the Matrix, which is plot "0" (see the documentation with "ta.Plot_RAW?" or https://kimopack.readthedocs.io/en/latest/Plotting.html

In [None]:
%matplotlib qt
ta.bordercut=[400,975]
ta.timelimits=[-0.2,500]
#ta.Plot_RAW(0)

And set the interesting wavelength and time points where we want the code to plot Kinetics and spectra respectively. e

During the import these values are set automatically:
``` python
ta.rel_wave = np.arange(300,1000,100)                        #standard
ta.rel_time = [0.2,0.3,0.5,1,3,10,30,100,300,1000,3000,9000] #standard
```
GUI (released soon)

The parameter "wavelength_bin" set the width of the spectral bins. The parameter "time_width_percent" sets a percentual binning for the times.
additional shaping options include rebinning in the spectral range "wave_nm_bin" or in energy scale "equal_energy_bin". See:<br>
https://kimopack.readthedocs.io/en/latest/Plotting.html#plot-shaping-options-without-influence-on-the-fitting and<br>
https://kimopack.readthedocs.io/en/latest/Shaping.html for more details

In [None]:
ta()

**Note** The filename is the standard title of the plots. The Plot function can take other titles. Alternatively changing the variable "ta.filename" can be used to change the title permanently.

``` python
ta.Save_Powerpoint(save_Fit=False,title='Tutorial plot')
```
This would create these plots and place them all on a powerpoint slide and save this slide.

# Fitting of Data
One of the main purposes of KiMoPack is to make a Global analysis of the data. As can be seen from the results of the SVD (last plot in the RAW plotting) the independent extraction of the spectra using the usual approach $ U\,\times\,\sum\,\times\,V^{T} = M$ leads to vectors that are extracted from the data only and not physically meaningfull. <br> 
1. KiMoPack is using a parametric model function **ta.mod** to prepare a "concentration matrix" $C(t)$.
2. In the standard usage the spectral matrix (called "DAC" in KiMoPack) is then calculated with np.linalg.lstsq so that the calculated matrix **AC** is $C(t)\,\times\,DAC(\lambda)=AC$.
3. Then an error matrix **AE** is calculated that is the difference between the (shaped) measured matrix **A** and the calculated matrix **AC**
4. Then the parameter of the **ta.mod** are modulated to minimize the sum of the squared **AE**

This fitting process has a huge amount of options that include the providing of external spectra. If a sufficient amount of external spectra are provided this fitting process changes into a linear combination analysis. named switches like "ext_spectra_scale", "extt_spectra_shift"or 		"ext_spectra_guidechange how they are handled. <br>
e But let's start at the beginning:

## choosing of the model function
KiMoPack can either use one of the 3 **named functions**:
``` python
ta.mod = 'exponential' #or 'paral'
ta.mod = 'consecutive'
ta.mod = 'full_consecutive'
```
or an externally provided **external fitting function** as are provided in the file "function_library.py" and summarized in "Function_library_overview.pdf". The provided external fitting functions include linear and non linear models that cover pretty much any of the possible models with 3 or 4 species including some vibrational models.<br>
Most function know switches like "background" = fit the background, "infinite" = non decaying species, "explicit_gs" = make the bleach an explicit speci. <br> But in general the external function can be pretty much anything that is needed. It gets a time vector, (at which point the function should return an entrance) and a pardf = Dataframe with the parameter from the fitting point. **It is expected to return the C(t)** and anything else is free. <br> We will look into these functions a bit later in this tutorial. Important to mention is that there is no difference in how this is handled.

In this tutorial we start with a **Decay analysis**""

In [None]:
ta.mod='exponential'       # Choose a model here 'exponential' to get simple exponential decays

Next we create a parameter object and choose appropiate guess values.<br>
I use the limiting options "min" and "max" and typically freeze with **vary=False**: <br>
the starting time "t0" and the instrument response time 'resolution'<br>
Usually the RAW plot of the kinetics is a good starting point to read the initial guesses (as are previous informations)<br>

we trigger Fits with:
```python
ta.Fit_Global()
```
I usually use this loop to freeze all parameter and check how good my input parameter actually are.<br>
These cells are larger as I prefer to create and use parameter in the same cell.

In [None]:
#ta.Plot_RAW(1)
par=lmfit.Parameters()                                       # create empty parameter object
par.add('k0',value=1/0.14,vary=True)                  
par.add('k1',value=1/2.35,vary=True)             
par.add('k2',value=1/40,vary=True)   
###-------Adding instrument parameter, here frozen---------------
par.add('t0',value=0,min=-2,max=2,vary=False)                       # Allow the arrival time to adjust? (False here)
par.add('resolution',value=0.086,min=0.04,max=0.5,vary=False)       # Allow the instrument response to adjust (False here

if 0:
    for key in par.keys():
        par[key].vary=False
ta.mod='sequential'
ta.par=par
ta.Fit_Global()                                 # trigger fitting
#ta.Plot_fit_output(4)

The output is stored in **ta.re** (a dictionay) with 
* **"c"** = $c(t)$
* **"DAC"** = $DAC(\lambda)$
* **"A"**, **"AC"**, **"AE"**
* a large number of other things
   
The plots I usually check are:

    1 Decay associated spectra (called **DAC**)
        a. middle: as calculated
        b. right: spectra multiplied by maximum of c(t) for check if it should be visible
        c. left: normalized
    2. The spectral axis summed
    5. The fitted Matrixes
    6. The c(t) that was actually used
Together with the metrix $R^2$ and $\chi^2$<br>
Other plotting options include the residuals(6), kinetics(2), spectra(3)

## What is going on?

**$c(t)$ x $DAC(\lambda)$**=Species Spectra

In [None]:
if 0:
    plt.close('all')
    #plot c(t) and DAC($\lambda$)
    fig,ax=plt.subplots(2,1,figsize=(12,8))
    ta.re['c'].plot(ax=ax[0],lw=3);ta.re['DAC'].plot(ax=ax[1],lw=3)
    ax[0].set_xlim(0.01,500);ax[0].set_xscale('log')
    ax[0].set_ylim(-0.01,1);ax[0].set_yscale('symlog')
    ax[0].legend(title='Notice the intensity \nof species 0')
    ax[1].legend(title='Notice the intensity \nof species 0')
    fig.tight_layout()
    
    species_dictionary=pf.Species_Spectra(ta)
    for key in species_dictionary.keys():
        print(key)
        ta.Plot_RAW(0,ds=species_dictionary[key],title='Species %i'%key)

## Target Analysis
The next model one typically tries is a consecutive model (A->B->C) This only requires to change the model name

In [None]:
ta.mod='consecutive'
ta.par=par
ta.Fit_Global()
#ta.Plot_fit_output([0,1,4,5])

In [None]:
FWHM=2.35482
def manual_consecutive(times,pardf):								
    c=np.zeros((len(times),4),dtype='float') 						#creation of matrix that will hold the concentrations
    g=gauss(times,sigma=pardf['resolution']/FWHM,mu=pardf['t0']) 	#creating the gaussian pulse that will "excite" our slu
    sub_steps=10   #sub_steps=pardf['sub_steps']                    #defining how many extra steps will be taken between the main time_points
    
    #The loop
    for i in range(1,len(times)):									#iterate over all tre will be
        dc=np.zeros(4,dtype='float')							    #the initial change for each chere will be
        dt=(times[i]-times[i-1])/(sub_steps)						# as we are taking smaller steps the time intervals need to be adapted
        c_temp=c[i-1,:]												#temporary matrix holdinro in the end)
        for j in range(int(sub_steps)):
            dc[0]=-pardf['k0']*dt*c_temp[0]+g[i]*dt					#excite a small fraction with g[i] and decay with 'k0'
            dc[1]=pardf['k0']*dt*c_temp[0]-pardf['k1']*dt*c_temp[1]	#form with "k0" and decay with "k1"
            dc[2]=pardf['k1']*dt*c_temp[1]-pardf['k2']*dt*c_temp[2]	#form with "k1" a
            dc[3]=pardf['k2']*dt*c_temp[2]
            c_temp=c_temp+dc
            c_temp[c_temp<0]=0
        c[i,:] =c_temp												#store the temporary concentrations into the main matrix
    c=pd.DataFrame(c,index=times)								#write back the right indexes
    c.index.name='time'						
    
    #Shaping options						
    c.columns=['$^3MLCT$','$^3MC$','$^5MC$','Inf']
    if not 'infinite' in list(pardf.index.values):	c.drop('Inf',axis=1,inplace=True)
    if 'explicit_GS' in list(pardf.index.values):   c['GS']=-c.sum(axis=1)
    if 'background' in list(pardf.index.values):    c['background']=1
    return c														#return the conations to the global fitting conceo the global fitting

In [None]:
ta.mod=manual_consecutive     # identical to 'full_consecutive'
ta.Fit_Global()
#ta.Plot_fit_output([0,1,4,5],title='Manually defined function')

Here it is still visible that the "species" have a bleach and a stimulated emission.<br> 
Next I would try to add the ground state explicitely by adding the keyword **explicit_GS**<br>
I usually keep the fitting cell together as one unit and work on the parameters.<br>

**compare the last plot containing the c(t) to understand the difference** 

## Adding external Spectra (e.g. Ground state)

In [None]:
wave=ta.re['A'].columns.values
df=pd.DataFrame(gauss(t=wave),index=wave)
df.columns=['GS']
df.sort_index(inplace=True)
#df.plot()

and add it as the ground state spectrum to the modelling.<br>
Here we use the switch **par.add('ext_spectra_guide')** That converts the spectrum from "must be like this" to "should be some close to this"

In [None]:
ta.mod='consecutive'    
par=lmfit.Parameters()                                       # create empty parameter object
par.add('k0',value=1/0.1,vary=True)                  
par.add('k1',value=1/2.5,vary=True)             
par.add('k2',value=1/40,vary=True)   
par.add('t0',value=0,min=-2,max=2,vary=True)                       # Allow the arrival time to adjust? (False here)
par.add('resolution',value=0.086081,min=0.04,max=0.5,vary=False)       # Allow the instrument response to adjust (False here)
par.add('explicit_GS')

#par.add('ext_spectra_scale',value=1,vary=True)
#par.add('ext_spectra_shift',value=0,vary=False)
#par.add('ext_spectra_guide')

ta.par=par
ta.Fit_Global(ext_spectra=df)
#ta.Fit_Global()
#plt.close('all')
ta.Plot_fit_output([0,4])

In [None]:
ta1=pf.TA('con_5.SIA',path=path_to_files)
chirp=[-1.29781491e-11,4.72546618e-08,-6.36421133e-05,3.77396295e-02,-8.08783621e+00]
ta1.Cor_Chirp(fitcoeff=chirp)
ta1.intensity_range=0.005
ta1.log_scale=True
ta1.timelimits=[-0.2,500]
ta1.bordercut=[390,1150]
ta1.rel_wave=[430,487,525,640,720,820,900,950]
ta1.rel_time=[-0.1,-0.02,0.035,0.2,0.5,2,14,22,92,160]
ta1.ignore_time_region=[-0.15,0.1]                      #ta1.ignore_time_region=[[-0.15,0.1],[10,12]]
ta1.scattercut=[525,580]                                #ta1.scattercut=[[525,580],[750,755]]
#ta1.Plot_RAW(0,title='without laser scatter')

In [None]:
ta1.mod='consecutive'    
ta1.par=ta.par_fit
ta1.Fit_Global(ext_spectra=df)
plt.close('all')
#ta1.Plot_fit_output([0,4])

# Advanced modelling, Multimodal same technique

### Create data

In [None]:
import function_library as func
reload(func)
ta.mod=func.P13

for key in par.keys():
    par[key].vary=False
lines={'GS':(550,50,2),'A':(800,60,1),'B':(600,80,1),'C':(750,70,1)}
ds=pd.concat([pd.DataFrame(gauss(ta.ds.index.values,sigma=lines[key][1],mu=lines[key][0],scale=lines[key][2]),index=ta.ds.index.values,columns=[key]) for key in lines.keys()],axis=1)

par=lmfit.Parameters()                                       # create empty parameter object
par.add('k0',value=1/2,vary=True)                  
par.add('k1',value=1/10,vary=True)             
par.add('k2',value=1/4,vary=True)
par.add('k3',value=1/4,vary=True)                    
par.add('t0',value=0,min=-2,max=2,vary=True)                       # Allow the arrival time to adjust? (False here)
par.add('resolution',value=0.086081,min=0.04,max=0.5,vary=False)       # Allow the instrument response to adjust (False here)
par.add('explicit_GS')

for key in par.keys():
    par[key].vary=False
ta.par=par

ta_list=[]
for a in [1,5,25]:
    ta1=ta.Copy()
    ta1.filename='r2 x %i'%a
    ta1.par['k2'].value=par['k2'].value*a
    ta1.Fit_Global(ext_spectra=ds)
    ta1.ds=ta1.re['AC']+np.random.normal(scale=5e-2,size=ta1.re['AC'].shape)
    ta1.factor=a
    ta_list.append(ta1)

### Compare the data

In [None]:
ta1.Compare_at_time(other=ta2,rel_time=[1,5])
window=[0.7,1.3,545,555]
ta1.Compare_at_time(other=ta2,rel_time=[1,5],norm_window=window)

In [None]:
ta1.Compare_at_wave(other=[ta2,ta3],rel_wave=[800])
window=[0.1,0.2,780,820]
ta1.Compare_at_wave(other=[ta2,ta3],rel_wave=[800],norm_window=window)

### Fitting a single spectrum

In [None]:
par=lmfit.Parameters()                                    # create empty parameter object
par.add('k0',value=1/2,vary=True)                  
par.add('k1',value=1/10,vary=True)             
par.add('k2',value=1/4,vary=True)
par.add('k3',value=1/4,vary=True)                    
par.add('t0',value=0,min=-2,max=2,vary=True)                       # Allow the arrival time to adjust? (False here)
par.add('resolution',value=0.086081,min=0.04,max=0.5,vary=False)       # Allow the instrument response to adjust (False here)
par.add('explicit_GS')
ta.par=par
ta.mod=func.P13
ta.Fit_Global(ext_spectra=df)
ta.Plot_fit_output()

### Copy paste fit function and make small adjustments

In [None]:
def P13_split(times,pardf):								
	'''P13 with splitting'''
	c=np.zeros((len(times),4),dtype='float') 						#creation of matrix that will hold the concentrations
	g=gauss(times,sigma=pardf['resolution']/FWHM,mu=pardf['t0']) 	#creating the gaussian pulse that will "excite" our sample
	if 'sub_steps' in list(pardf.index.values):
		sub_steps=pardf['sub_steps']
	else:
		sub_steps=10  													#defining how many extra steps will be taken between the main time_points
	for i in range(1,len(times)):									#iterate over all timepoints
		dc=np.zeros(4,dtype='float')							#the initial change for each concentration, the "3" is representative of how many changes there will be
		dt=(times[i]-times[i-1])/(sub_steps)						# as we are taking smaller steps the time intervals need to be adapted
		c_temp=c[i-1,:]												#temporary matrix holding the changes (needed as we have sub steps and need to check for zero in the end)
		for j in range(int(sub_steps)):
			dc[0]=-pardf['k0']*dt*c_temp[0]-pardf['k2']*pardf['f0']*dt*c_temp[0]+g[i]*dt		
			dc[1]=pardf['k0']*dt*c_temp[0]-pardf['k1']*dt*c_temp[1]
			dc[2]=pardf['k2']*pardf['f0']*dt*c_temp[0]-pardf['k3']*dt*c_temp[2]
			dc[3]=pardf['k1']*dt*c_temp[1]+pardf['k3']*dt*c_temp[2]
			c_temp=c_temp+dc
			c_temp[c_temp<0]=0
		c[i,:] =c_temp												#store the temporary concentrations into the main matrix
	c=pandas.DataFrame(c,index=times)								#write back the right indexes
	c.index.name='time'												#and give it a name
	c.columns=['A','B','C','Inf']									#this is optional but very useful. The species get names that represent some particular states
	if not 'infinite' in list(pardf.index.values):
		c.drop('Inf',axis=1,inplace=True)
	if 'explicit_GS' in list(pardf.index.values):
		c['GS']=-c.sum(axis=1)
	if 'background' in list(pardf.index.values):					#optional but usefull, allow the keyword "background" to be used to fit the background in the global analysis
		c['background']=1											#background always there (flat)
	return c

### Adding the parametert

In [None]:
par=lmfit.Parameters()                                       # create empty parameter object
par.add('k0',value=1/2,vary=True)                  
par.add('k1',value=1/40,vary=True)             
par.add('k2',value=1/4,vary=True)
par.add('k3',value=1/4,vary=True)                    
par.add('t0',value=0,min=-2,max=2,vary=True)                       # Allow the arrival time to adjust? (False here)
par.add('resolution',value=0.086081,min=0.04,max=0.5,vary=False)       # Allow the instrument response to adjust (False here)
par.add('explicit_GS')
ta.par=par

ds=pd.concat([pd.DataFrame(gauss(ta.ds.index.values,sigma=lines[key][1],mu=lines[key][0],scale=lines[key][2]),index=ta.ds.index.values,columns=[key]) for key in ['GS']],axis=1)

ta_list=[]
for a in [1,5,25]:
    ta1=ta.Copy()
    ta1.par=par
    ta1.par.add('f0',value=a,vary=True)
    ta1.filename='r2 x %i'%a
    ta_list.append(ta1)

ta1=ta_list[0]
plt.close('all')
ta1.Fit_Global(multi_project=ta_list[1:],unique_parameter=['f0'],same_DAS=True,ext_spectra=df)
ta1.Plot_fit_output(0,title='combined fitting')

In [None]:
for t in ta1.multi_projects:
    ta.Plot_fit_output(0)

## Let's look on some oscillations

In [None]:
ta1=pf.TA('con_6.SIA',path=path_to_files)
chirp=[-1.29781491e-11,4.72546618e-08,-6.36421133e-05,3.77396295e-02,-8.08783621e+00]
ta1.Cor_Chirp(fitcoeff=chirp)

ta1.intensity_range=0.005
ta1.log_scale=False
ta1.timelimits=[-0.2,500]
ta1.bordercut=[390,1150]
ta1.scattercut=[525,580]
ta1.rel_wave=[430,487,525,640,720,820,900,950]
ta1.rel_time=[-0.1,-0.02,0.035,0.2,0.5,2,14,22,92,160]
ta1.ignore_time_region=[-0.15,0.1]
ta1.Plot_RAW(0)

In [None]:
ta1.mod='full_consecutive'    
par=lmfit.Parameters()                                       # create empty parameter object
par.add('k0',value=1/0.100143,vary=True)                  
par.add('k1',value=1/2.496702,vary=True)             
par.add('k2',value=1/39.963222,vary=True)   
par.add('t0',value=0,min=-2,max=2,vary=False)                       # Allow the arrival time to adjust? (False here)
par.add('resolution',value=0.086081,min=0.04,max=0.5,vary=False)
par.add('explicit_GS')
ta1.par=par
 
ta1.Fit_Global()
ta1.Plot_fit_output([0,4])

Now we have the main Kinetics and can subtract them. Here simply use the residuals as the next matrix to be fitted and adjust the shaping parameters

In [None]:
ta2=ta1.Copy()
ta2.ds=ta1.re['AE']

ta2.intensity_range=3e-4
ta2.rel_wave=[620,700,740,800,830,860]
ta2.timelimits=[0.1,10]
ta2.Plot_RAW(0,scale_type='linear')

As an alternative one could have subtracted all (or some) of the contributions using this approach:
``` python
    dicten=pf.Species_Spectra(ta1) # Extract each of the species as a matrix
    ta3=ta1.Copy()                 # Make a copy of the project to test
    ta3.ds=ta1.re['A']-dicten[1]-dicten[2]-dicten['GS'] #subtract one or multiple of the species.
```

Now we load the function file and select a model from it. <br>
Optimizing follows the same procedure

In [None]:
import function_library as func
ta2.mod=func.oscil_comp   

In [None]:
par=lmfit.Parameters()                                       # create empty parameter object
par.add('f0',value=1.00561,vary=True)                  
par.add('tk0',value=1/2.8725,vary=True,min=1/4,max=4)
par.add('S0',value=0.975956,vary=True ,min=0,max=1)
ta2.par=par
ta2.ignore_time_region=[-0.15,0.25]
#ta2.Fit_Global(other_optimizers='least_squares')
ta2.Fit_Global()

plt.close('all')
ta2.error_matrix_amplification=1
ta2.Plot_fit_output([0,4],scale_type='linear')

As does calculating the errors

In [None]:
#This takes about 2min
#ta2.Fit_Global(confidence_level=0.95)

ta2=pf.TA('Fitted_Oscillations_with_confidence.hdf5',path=path_to_files)
ta2.Print_Results()

Finally  we combine the normal model and the oscillation model and make a combined fit.

In [None]:
reload(func)
ta1.mod=func.manconsec_oscil
ta1.par=ta1.par_fit
for key in ['f0','tk0','S0']:
    ta1.par.add(key,value=ta2.par_fit[key].value)
    ta1.par[key].vary=False
ta1.par['S0'].min=0
ta1.par['S0'].max=1
ta1.Fit_Global()
ta1.Plot_fit_output()