# NSORT with Python
- Author: Tyler Martin 
- Contact: tyler.martin@nist.gov
- Last updated: 03/18/19
- Version: 0.3-dev

The goal of this notebook is to allow users to interactively stitch together **reduced** ABS files produced from the NCNR Igor macros. It should be highlighted that this file only does the NSORT portion of the reduction process i.e. the combining of reduced scattering data from multiple configurations into a single curve. 

This notebook works by comparing the trial/sample portion of the trial label and combining those measurements together. For example, if you had the following sets of measurement labels

- AC5-116 1p15m 5A Offset Scatt
- AC5-116 4p7m 5A Scatt
- AC5-116 4p7m 12A Scatt
- AC5-117 1p15m 5A Offset Scatt
- AC5-117 4p7m 5A Scatt
- AC5-117 4p7m 12A Scatt

..your goal would be to construct a regular-expression (regex) to extract the "AC5-11x" portion of the measurement labels so that the first three and last three measurements could be combined into a single curve. 

## Global Instructions

- This notebook should be worked through linearly from top to bottom
- All cells can be run by using the 'play' symbol in the toolbar or by pressing [Shift] + [Enter] simultaneously
- Sections headers denote user interaction
    - !> cells in this section require interaction/modification by user
    - \>\> cell in this section should just be run and output checked


## >> Setting up environment

The next several cells may take up to a minute or two to finish running.

In [33]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import ipywidgets
import pathlib
import re
import time

from typySANS.InteractiveTrimPlot import *

The next cell is non-essential and can be skipped if it fails (i.e. if Seaborn is not installed) 

In [34]:
#if this fails, change widget --> notebook
import seaborn as sns
sns.set(context='notebook',style='ticks',palette='bright')

If the next cell fails either
    
    a) Install ipympl via conda or pip (conda install -c conda-forge ipympl)
    
    b) Change widget --> notebook

In [35]:
%matplotlib widget 

In [36]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [37]:
#hack in the typySANS directory to the PYTHONPATH (for now)
import sys
sys.path.insert(0,'../')

## !> Pick path to .ABS files

The title says it! Write in the path to your reduced ABS files. Use [Tab] to autocomplete the paths as you type them

In [38]:
ABS_path = pathlib.Path('../dev/1804-BottleBrush/2018-11-08 NGBSANS83 - solvents/reduction')
ABS_path = pathlib.Path('../dev/1804-BottleBrush/2018-08-17-Burns-Bottlebrushes/BOTTLEBRUSH/RAW/')

## >> Scan the .ABS files and build label table

In [39]:
dfLabel =[]
for file_path in ABS_path.glob('*ABS'):
    file_name = file_path.parts[-1]
    
    with open(file_path,'r') as f:
        lines = [f.readline() for _ in range(4)]
        
    ## We don't want COMBINED ABS files
    if 'COMBINED' in lines[0]:
        continue
        
    ## Parse the LABEL: row
    raw_label = lines[1].strip().split(':')[-1].strip()
    dfLabel.append([file_name,file_path,raw_label])

dfLabel = pd.DataFrame(dfLabel,columns=['file_name','file_path','label'])
dfLabel = dfLabel.set_index('file_name').squeeze()
dfLabel.head()

Unnamed: 0_level_0,file_path,label
file_name,Unnamed: 1_level_1,Unnamed: 2_level_1
AUG17031.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,AC5-116-42k dPS 4p7m 5A Scatt T=25
AUG17038.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,AC5-118-42k dPS 4p7m 16A Scatt T=25
AUG17037.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,AC5-117-42k dPS 4p7m 16A Scatt T=25
AUG17068.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,AC5-122-CHXd12 1p15m 5A Offset Scatt T=41.4
AUG17056.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,AC5-116-42k dPS 1p15m 5A Offset Scatt T=25


## !> Build regex to extract trial label

The goal here is to construct a 'regular expression' (i.e. a regex) that will extract the **non-configuration** portion of the trial label. This entire notebook works by finding the portion of the label that is common between the different instrument configurations and combining them. The key is to precisely construct this regex so that the correct measurements will be combined together. 

Some general regex  notes:
- A period "." represents any alphanumeric character
- A star "\*" denotes that the **previous** character can be repeated any number of times (including zero times)
- A question mark "?" denotes that the **previous** character occurs either 0 or 1 times
- Parenthesis () denote 'caputure' groups. These is how we extract substrings
- Brackets [] denote character lists i.e. [mA] is a single character equal to m **or** A

Example 1:

    Consider the following set of measurement labels
    
    - AC5-116 1p15m 5A Offset Scatt
    - AC5-116 4p7m 5A Scatt
    - AC5-116 4p7m 12A Scatt
    - AC5-117 1p15m 5A Offset Scatt
    - AC5-117 4p7m 5A Scatt
    - AC5-117 4p7m 12A Scatt
    
    ..your goal would be to construct a regular-expression (regex) to extract the "AC5-11x" portion of the measurement labels so that the first three and last three measurements could be combined into a single curve. The following regular expressions would work in this case:
    
    - (AC5.*)\s
    - (.*)\s
    - ([0-9a-zA-Z-]\*)
    - (.{7})\s
    
Example 2:

    - Full Sample Label: AC5-116-42k dPS 4p7m 5A Scatt T=25
    - Regex: (.*) dPS 
        - Explanation: Capture 0 or more characters which precede the characters dPS
    - Captured Groups: (' AC5-116-42k')


**Note**: This code will always use the *first* capture group as the trial label

In [41]:
# this is a general regex that works well for many samples coming off of the 10m
# regex_init = '(.*)\s*(.*)[mA]\s*(.*)[mA]'
regex_init = '(.*)\s*(.*)[mA]\s*(.*)[mA]'

label = ipywidgets.Dropdown(options=dfLabel['label'].values,description='Label:',layout={'width':'450px'})
regex = ipywidgets.Text(value=regex_init,description='Regex:',layout={'width':'600px'})
output = ipywidgets.Output()

def match(event):
    output.clear_output()
    with output:
        try:
            re_result = re.search(regex.value,label.value)
        except re.error:
            print('Error! Bad regular-expression!')
        else:
            if re_result is None:
                print('Error! No match!')
            else:
                groups = re_result.groups()
                print('\n')
                print('All Groups: {}'.format(groups))
                print()
                print('Extracted Trial Label: {}'.format(groups[0]))

label.observe(match)
regex.observe(match)
match(None)
display(ipywidgets.VBox([label,regex,output]))

VBox(children=(Dropdown(description='Label:', layout=Layout(width='450px'), options=('AC5-116-42k dPS 4p7m 5A …

## >> Gather trial labels and configuration information

If you created a correct regex for *all* trials above, this cell should correctly produce a table with the extracted trial label along with the sample to detector distance (SDD) and wavelength (LAM) as well.

In [42]:
#This is hopefully a somewhat generic regex
cre = re.compile(regex.value)

## get lambda and SDD 
dfABS =[]
for file_name,sdf in dfLabel.iterrows():
    file_path = sdf['file_path']
    label = sdf['label']
    
    ABS,config = readABS(file_path)
    LAMBDA = float(config['LAMBDA'])
    SDD = float(config['DET DIST'])
    
    ## Parse the LABEL: row
    re_result = cre.search(label)
    if not re_result: #if regex doesn't match, skip
        print('Warning: skipping {} because regex failed!'.format(file_name))
        continue
    label   = re_result.groups()[0].strip()
    
    dfABS.append([label,SDD,LAMBDA,file_name,file_path])

dfABS = pd.DataFrame(dfABS,columns=['label','SDD','LAM','file_name','file_path'])
dfABS = dfABS.sort_values(['label','SDD','LAM'])
dfABS.head()

Unnamed: 0,label,SDD,LAM,file_name,file_path
4,AC5-116-42,1.15,5.0,AUG17056.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...
0,AC5-116-42,4.7,5.0,AUG17031.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...
10,AC5-116-42,4.7,10.0,AUG17051.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...
30,AC5-116-42,4.7,16.0,AUG17036.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...
16,AC5-116-CH,1.15,5.0,AUG17062.ABS,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...


## >> Create NSORT Table

Now the real magic: Using the power of [Pandas](https://pandas.pydata.org/), we can automatically group the above table by the label column. If the regex was properly constructed, this cell will output a table which lists all of the individual instrument configurations for each sample label.

In [43]:
dfNSORT = []
for i, sdf in dfABS.groupby('label'):
    dd = {'label':sdf.label.iloc[0]}
    for j,ssdf in sdf.iterrows():
        dd[ssdf.SDD,ssdf.LAM,'fname'] = ssdf.file_name
        dd[ssdf.SDD,ssdf.LAM,'fpath'] = ssdf.file_path
    dfNSORT.append(dd)

dfNSORT = pd.DataFrame(dfNSORT)
dfNSORT.set_index('label',inplace=True)
dfNSORT.columns = pd.MultiIndex.from_tuples(dfNSORT.columns.tolist(),names=['SDD','LAM','datatype'])
dfNSORT.sort_index(axis=0,inplace=True)
dfNSORT.sort_index(axis=1,inplace=True)
dfNSORT.head().T

Unnamed: 0_level_0,Unnamed: 1_level_0,label,AC5-116-42,AC5-116-CH,AC5-117-42,AC5-117-CH,AC5-118-42
SDD,LAM,datatype,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1.15,5.0,fname,AUG17056.ABS,AUG17062.ABS,AUG17057.ABS,AUG17063.ABS,AUG17058.ABS
1.15,5.0,fpath,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...
4.7,5.0,fname,AUG17031.ABS,AUG17071.ABS,AUG17032.ABS,AUG17072.ABS,AUG17033.ABS
4.7,5.0,fpath,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...
4.7,10.0,fname,AUG17051.ABS,AUG17080.ABS,AUG17052.ABS,AUG17081.ABS,AUG17053.ABS
4.7,10.0,fpath,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...
4.7,16.0,fname,AUG17036.ABS,,AUG17037.ABS,,AUG17038.ABS
4.7,16.0,fpath,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...,,../dev/1804-BottleBrush/2018-08-17-Burns-Bottl...


## !> Choose Global Trim Params

Next, trim and shift parameters need to be chosen to be applied to all trials. Use the widget produced by the cell below to demo trim parameters and shift-factors for different systems. 

In [44]:
#extra out only full path information
dfNSORTPath = dfNSORT.xs('fpath',level='datatype',axis=1)

plt.figure(figsize=(6,3))
tp = InteractiveTrimmingPlot(dfNSORTPath)
tp.run_widget()

FigureCanvasNbAgg()

VBox(children=(HBox(children=(Dropdown(description='System:', options=('AC5-116-42', 'AC5-116-CH', 'AC5-117-42…

## >> Check Shift Factors

Ensure that the shift factors below make sense for all systems/configurations. Ideally, the factors should be between 0.95-1.05.

In [45]:
df_trim = tp.df_trim
shiftConfig = eval(tp.shift_config.value)

shifts=[]
for label,df in dfNSORTPath.iterrows():
    df_xy = []
    index = []
    for i,(config,fpath) in enumerate(df.iteritems()):
        if pd.isna(fpath):
            continue
        index.append(config)
        sdf = readABS(fpath)[0]
        df_xy.append(sdf.set_index('q',drop=False)[['q','I','dI']])
    df_xy = pd.Series(df_xy,index=pd.MultiIndex.from_tuples(index))
    df_xy = df_xy.sort_index(axis=0)
    
    dfShift = buildShiftTable(df_xy,df_trim,shiftConfig)
    shifts.append(dfShift.values)
    
df_shift = pd.DataFrame(shifts,index=dfNSORTPath.index,columns=dfNSORTPath.columns)
df_shift = df_shift.sort_values(by=dfNSORTPath.columns.tolist(),axis=0)
df_shift

SDD,1.15,4.70,4.70,4.70
LAM,5.0,5.0,10.0,16.0
label,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
AC5-119-CH,1.0,0.92177,0.925081,
AC5-123-CH,1.0,0.923192,0.917696,
AC5-117-42,1.0,0.925983,0.940972,1.041399
AC5-117-CH,1.0,0.926491,0.915306,
AC5-118-CH,1.0,0.927178,0.918431,
AC5-116-CH,1.0,0.92762,0.924044,
AC5-122-CH,1.0,0.928999,0.937418,
AC5-116-42,1.0,0.929071,0.940337,1.033693
AC5-118-42,1.0,0.929268,0.943529,1.045986
AC5-120-CH,1.0,0.931825,0.92036,


## >> Write all ABS Files

In [13]:
AUTONSORT_path = ABS_path / 'AUTONSORTED'
if not AUTONSORT_path.exists():
    AUTONSORT_path.mkdir()
    
for label,sdfABS in dfNSORTPath.iterrows():
    sdfShift = df_shift.loc[label]
    fname = label.strip() + '.ABS'
    print('--> Writing {}'.format(AUTONSORT_path/fname))
    writeABS(fname,sdfABS,sdfShift,shiftConfig,df_trim,path=AUTONSORT_path,shift=True)

--> Writing ../dev/1804-BottleBrush/2018-08-17-Burns-Bottlebrushes/BOTTLEBRUSH/RAW/AUTONSORTED/AC5-116-42k dPS.ABS
--> Writing ../dev/1804-BottleBrush/2018-08-17-Burns-Bottlebrushes/BOTTLEBRUSH/RAW/AUTONSORTED/AC5-116-CHXd12.ABS
--> Writing ../dev/1804-BottleBrush/2018-08-17-Burns-Bottlebrushes/BOTTLEBRUSH/RAW/AUTONSORTED/AC5-117-42k dPS.ABS
--> Writing ../dev/1804-BottleBrush/2018-08-17-Burns-Bottlebrushes/BOTTLEBRUSH/RAW/AUTONSORTED/AC5-117-CHXd12.ABS
--> Writing ../dev/1804-BottleBrush/2018-08-17-Burns-Bottlebrushes/BOTTLEBRUSH/RAW/AUTONSORTED/AC5-118-42k dPS.ABS
--> Writing ../dev/1804-BottleBrush/2018-08-17-Burns-Bottlebrushes/BOTTLEBRUSH/RAW/AUTONSORTED/AC5-118-CHXd12.ABS
--> Writing ../dev/1804-BottleBrush/2018-08-17-Burns-Bottlebrushes/BOTTLEBRUSH/RAW/AUTONSORTED/AC5-119-42k dPS.ABS
--> Writing ../dev/1804-BottleBrush/2018-08-17-Burns-Bottlebrushes/BOTTLEBRUSH/RAW/AUTONSORTED/AC5-119-CHXd12.ABS
--> Writing ../dev/1804-BottleBrush/2018-08-17-Burns-Bottlebrushes/BOTTLEBRUSH/RAW/A

## !> Check AUTO-NSORTED ABS Files

In [26]:
AUTO_ABS = list(AUTONSORT_path.glob('*ABS'))

def plotABS(fname=AUTO_ABS[0]):
    df,config = readABS(fname)
    line.set_xdata(df['q'].values)
    line.set_ydata(df['I'].values)
    ax.relim()
    ax.autoscale_view()

fig,ax = plt.subplots()
line = plt.matplotlib.lines.Line2D([],[])
plotABS() 
ax.add_line(line)
ax.set_xscale('log')
ax.set_yscale('log')
line.set(marker='o',ms=3,ls='-',)
ipywidgets.interact(plotABS,fname=AUTO_ABS);

FigureCanvasNbAgg()

interactive(children=(Dropdown(description='fname', options=(PosixPath('../dev/1804-BottleBrush/2018-08-17-Bur…