# NSORT with Python
- Author: Tyler Martin 
- Contact: tyler.martin@nist.gov
- Last updated: 03/18/19
- Version: 0.3-dev

The goal of this notebook is to allow users to interactively stitch together **reduced** ABS files produced from the NCNR Igor macros. It should be highlighted that this file only does the NSORT portion of the reduction process i.e. the combining of reduced scattering data from multiple configurations into a single curve. 

## Global Instructions

- This notebook should be worked through linearly from top to bottom
- All cells can be run by using the 'play' symbol in the toolbar or by pressing [Shift] + [Enter] simultaneously
- Sections headers denote user interaction
    - !> cells in this section require interaction/modification by user
    - \>\> cell in this section should just be run and output checked


## >> Setting up environment

These two cells may take up to a minute or two to finish running.

In [43]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import ipywidgets
import pathlib
import re
import time

from typySANS.InteractiveTrimPlot import *

In [44]:
sns.set(context='notebook',style='ticks',palette='bright')
%matplotlib widget
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## !> Pick path to .ABS files

The title says it! Write in the path to your reduced ABS files. Use [Tab] to autocomplete the paths as you type them

In [3]:
ABS_path = pathlib.Path('../dev/1804-BottleBrush/2018-11-08 NGBSANS83 - solvents/reduction')
ABS_path = pathlib.Path('../dev/1804-BottleBrush/2018-08-17-Burns-Bottlebrushes/BOTTLEBRUSH/RAW/')

## >> Scan the .ABS files and build label table

In [21]:
dfLabel =[]
for file_path in ABS_path.glob('*ABS'):
    file_name = file_path.parts[-1]
    
    with open(file_path,'r') as f:
        lines = [f.readline() for _ in range(4)]
        
    ## We don't want COMBINED ABS files
    if 'COMBINED' in lines[0]:
        continue
        
    ## Parse the LABEL: row
    raw_label = lines[1].strip().split(':')[-1]
    dfLabel.append([file_name,file_path,raw_label])

dfLabel = pd.DataFrame(dfLabel,columns=['file_name','file_path','label'])
dfLabel = dfLabel.set_index('file_name').squeeze()
dfLabel.head()

Unnamed: 0_level_0,file_path,label
file_name,Unnamed: 1_level_1,Unnamed: 2_level_1
AUG17031.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,AC5-116-42k dPS 4p7m 5A Scatt T=25
AUG17038.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,AC5-118-42k dPS 4p7m 16A Scatt T=25
AUG17037.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,AC5-117-42k dPS 4p7m 16A Scatt T=25
AUG17068.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,AC5-122-CHXd12 1p15m 5A Offset Scatt T=41.4
AUG17056.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,AC5-116-42k dPS 1p15m 5A Offset Scatt T=25


## !> Build regex to extract trial label

The goal here is to construct a 'regular expression' (i.e. a regex) that will extract the **non-configuration** portion of the trial label. This entire notebook works by finding the portion of the label that is common between the different instrument configurations and combining them. The key is to precisely construct this regex so that the correct measurements will be combined togeher. 

Some general regex  notes:
- A period "." represents any alphanumeric character
- A star "*" denotes that the **previous** character can be repeated any number of times (including zero times)
- A star "?" denotes that the **previous** character occurs either 0 or 1 times
- Parenthesis () denote 'caputure' groups. These is how we extract substrings
- Brackets [] denote character lists i.e. [mA] is a single character equal to m **or** A

Example:

- Full Sample Label: AC5-116-42k dPS 4p7m 5A Scatt T=25
- Regex: (.*) dPS 
    - Explanation: Capture 0 or more characters which precede the characters dPS
- Captured Groups: (' AC5-116-42k')


**Note**: This code will always use the *first* capture group as the trial label


In [68]:
# this is a general regex that works well for many samples coming off of the 10m
regex_init = '(.*) (.*)[mA] (.*)[mA] (?:Offset)?\s?(?:Scatt|Trans) (.*)'

label = ipywidgets.Dropdown(options=dfLabel['label'].values,description='Label:',layout={'width':'450px'})
regex = ipywidgets.Text(value=regex_init,description='Regex:',layout={'width':'600px'})
output = ipywidgets.Output()

def match(event):
    output.clear_output()
    with output:
        try:
            re_result = re.search(regex.value,label.value)
        except re.error:
            print('Error! Bad regular-expression!')
        else:
            if re_result is None:
                print('Error! No match!')
            else:
                groups = re_result.groups()
                print('\n')
                print('All Groups: {}'.format(groups))
                print()
                print('Extracted Trial Label: {}'.format(groups[0]))

label.observe(match)
regex.observe(match)
match(None)
display(ipywidgets.VBox([label,regex,output]))

VBox(children=(Dropdown(description='Label:', layout=Layout(width='450px'), options=(' AC5-116-42k dPS 4p7m 5A…

## >> Gather trial labels and configuration information

If you created a correct regex for *all* trials above, this cell should correctly produce a table with the extracted trial label along with the sample to detector distance (SDD) and wavelength (LAM) as well.

In [39]:
#This is hopefully a somewhat generic regex
cre = re.compile(regex.value)

## get lambda and SDD 
dfABS =[]
for file_name,sdf in dfLabel.iterrows():
    file_path = sdf['file_path']
    label = sdf['label']
    
    ABS,config = readABS(file_path)
    LAMBDA = float(config['LAMBDA'])
    SDD = float(config['DET DIST'])
    
    ## Parse the LABEL: row
    re_result = cre.search(label)
    if not re_result: #if regex doesn't match, skip
        print('Warning: skipping {} because regex failed!'.format(file_name))
        continue
    label   = re_result.groups()[0].strip()
    
    dfABS.append([label,SDD,LAMBDA,file_name,file_path])

dfABS = pd.DataFrame(dfABS,columns=['label','SDD','LAM','file_name','file_path'])
dfABS = dfABS.sort_values(['label','SDD','LAM'])
dfABS.head()

Unnamed: 0,label,SDD,LAM,file_name,file_path
4,AC5-116-42k dPS,1.15,5.0,AUG17056.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...
0,AC5-116-42k dPS,4.7,5.0,AUG17031.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...
10,AC5-116-42k dPS,4.7,10.0,AUG17051.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...
30,AC5-116-42k dPS,4.7,16.0,AUG17036.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...
16,AC5-116-CHXd12,1.15,5.0,AUG17062.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...


## >> Create NSORT Table

Now the real magic: Using the power of [Pandas](https://pandas.pydata.org/), we can automatically group the above table by the label column. If the regex was properly constructed, this cell will output a table which lists all of the individual instrument configurations for each sample label.

In [40]:
dfNSORT = []
for i, sdf in dfABS.groupby('label'):
    dd = {'label':sdf.label.iloc[0]}
    for j,ssdf in sdf.iterrows():
        dd[ssdf.SDD,ssdf.LAM,'fname'] = ssdf.file_name
        dd[ssdf.SDD,ssdf.LAM,'fpath'] = ssdf.file_path
    dfNSORT.append(dd)

dfNSORT = pd.DataFrame(dfNSORT)
dfNSORT.set_index('label',inplace=True)
dfNSORT.columns = pd.MultiIndex.from_tuples(dfNSORT.columns.tolist(),names=['SDD','LAM','datatype'])
dfNSORT.sort_index(axis=0,inplace=True)
dfNSORT.sort_index(axis=1,inplace=True)
dfNSORT.head().T

Unnamed: 0_level_0,Unnamed: 1_level_0,label,AC5-116-42k dPS,AC5-116-CHXd12,AC5-117-42k dPS,AC5-117-CHXd12,AC5-118-42k dPS
SDD,LAM,datatype,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1.15,5.0,fname,AUG17056.ABS,AUG17062.ABS,AUG17057.ABS,AUG17063.ABS,AUG17058.ABS
1.15,5.0,fpath,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...
4.7,5.0,fname,AUG17031.ABS,AUG17071.ABS,AUG17032.ABS,AUG17072.ABS,AUG17033.ABS
4.7,5.0,fpath,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...
4.7,10.0,fname,AUG17051.ABS,AUG17080.ABS,AUG17052.ABS,AUG17081.ABS,AUG17053.ABS
4.7,10.0,fpath,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...
4.7,16.0,fname,AUG17036.ABS,,AUG17037.ABS,,AUG17038.ABS
4.7,16.0,fpath,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...


## !> Choose Global Trim Params

Next, trim and shift parameters need to be chosen to be applied to all trials. Use the widget produced by the cell below to demo trim parameters and shift-factors for different systems. 

In [69]:
#extra out only full path information
dfNSORTPath = dfNSORT.xs('fpath',level='datatype',axis=1)

plt.figure(figsize=(6,3))
tp = InteractiveTrimmingPlot(dfNSORTPath)
tp.run_widget()

FigureCanvasNbAgg()

VBox(children=(HBox(children=(Dropdown(description='System:', options=('AC5-116-42k dPS', 'AC5-116-CHXd12', 'A…

## >> Check Shift Factors

Ensure that the shift factors below make sense for all systems/configurations. Ideally, the factors should be between 0.95-1.05.

In [100]:
df_trim = tp.df_trim
shiftConfig = eval(tp.shift_config.value)

shifts=[]
for label,df in dfNSORTPath.iterrows():
    df_xy = []
    index = []
    for i,(config,fpath) in enumerate(df.iteritems()):
        if pd.isna(fpath):
            continue
        index.append(config)
        sdf = readABS(fpath)[0]
        df_xy.append(sdf.set_index('q',drop=False)[['q','I','dI']])
    df_xy = pd.Series(df_xy,index=pd.MultiIndex.from_tuples(index))
    df_xy = df_xy.sort_index(axis=0)
    
    dfShift = buildShiftTable(df_xy,df_trim,shiftConfig)
    shifts.append(dfShift.values)
    
df_shift = pd.DataFrame(shifts,index=dfNSORTPath.index,columns=dfNSORTPath.columns)
df_shift.sort_values(by=dfNSORTPath.columns.tolist(),axis=0)

SDD,1.15,4.70,4.70,4.70
LAM,5.0,5.0,10.0,16.0
label,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
AC5-119-CHXd12,1.0,0.927453,0.930785,
AC5-123-CHXd12,1.0,0.928436,0.922909,
AC5-117-42k dPS,1.0,0.92871,0.943743,1.043819
AC5-116-42k dPS,1.0,0.931868,0.943168,1.033339
AC5-117-CHXd12,1.0,0.932363,0.921107,
AC5-118-CHXd12,1.0,0.932483,0.923687,
AC5-118-42k dPS,1.0,0.933207,0.947528,1.053908
AC5-116-CHXd12,1.0,0.93367,0.930071,
AC5-122-CHXd12,1.0,0.933753,0.942216,
AC5-120-CHXd12,1.0,0.935275,0.923767,


## Write all ABS Files

In [458]:
AUTONSORT_path = ABS_path / 'AUTONSORTED'
if not AUTONSORT_path.exists():
    AUTONSORT_path.mkdir()
    
for label,sdfABS in dfNSORTPath.iterrows():
    sdfShift = dfShift.loc[label]
    fname = label.strip() + '.ABS'
    print('--> Writing {}'.format(basePath/fname))
    writeABS(fname,sdfABS,sdfShift,shiftConfig,trimLo,trimHi,path=AUTONSORT_path,shift=True)

--> Writing FEB19/AUTONSORTED/4k 6 10.4%B2  dTol T=30C.ABS
--> Writing FEB19/AUTONSORTED/4k 6 10.4%B3  dTol T=30C.ABS
--> Writing FEB19/AUTONSORTED/4k 6 10.4%D6  dTol T=30C.ABS
--> Writing FEB19/AUTONSORTED/4k 6 10.4%D7  dTol T=30C.ABS
--> Writing FEB19/AUTONSORTED/4k 6 10.4%D8  dTol T=30C.ABS
--> Writing FEB19/AUTONSORTED/4k 8 9.3%B2  dTol T=30C.ABS
--> Writing FEB19/AUTONSORTED/4k 8 9.3%B3  dTol T=30C.ABS
--> Writing FEB19/AUTONSORTED/4k 8 9.3%B4  dTol T=30C.ABS
--> Writing FEB19/AUTONSORTED/4k 8 9.3%D6  dTol T=30C.ABS
--> Writing FEB19/AUTONSORTED/4k 8 9.3%D7  dTol T=30C.ABS
--> Writing FEB19/AUTONSORTED/4k 8 9.3%D8  dTol T=30C.ABS
--> Writing FEB19/AUTONSORTED/AC5-116 0.9k dPS T=30C.ABS
--> Writing FEB19/AUTONSORTED/AC5-117 0.9k dPS T=30C.ABS
--> Writing FEB19/AUTONSORTED/AC5-118 0.9k dPS T=30C.ABS
--> Writing FEB19/AUTONSORTED/JMS 10.8%D2  dTol T=30C.ABS
--> Writing FEB19/AUTONSORTED/PS1 10.8%D6  dTol T=30C.ABS
--> Writing FEB19/AUTONSORTED/PS1 10.8%D7  dTol T=30C.ABS
--> Writing 

## Check AUTO-NSORTED ABS Files

In [457]:
AUTO_ABS = list(basePath.glob('*ABS'))
def plotABS(fname=AUTO_ABS[0]):
    df,config = readABS(fname)
    ser = pd.Series(index=df['q'].values,data=df['I'].values,name=fname.parts[-1])
    ser.plot(**plot_kw)

ipywidgets.interact(plotABS,fname=AUTO_ABS);

interactive(children=(Dropdown(description='fname', options=(PosixPath('FEB19/AUTONSORTED/PS1 10.8%D7  dTol T=…