# Updating models using data from Citrination.

To update the models using data from https://slac.citrination.com , **the user must have**:
* access to SLAC Citrination account 
* apy_key

Assume that we got a new dataset and now we want to update our models using new data. Since training "from scratch" took significant amount of time (specially, for regression models) we will use train_classifiers_partial() and train_regressors_partial().

In [1]:
from citrination_client import CitrinationClient
from saxskit.saxskit.saxs_models import get_data_from_Citrination
from time import time

import warnings
warnings.filterwarnings("ignore")

from saxskit.saxskit.saxs_models import train_classifiers_partial, train_regressors_partial

In [2]:
with open("citrination_api_key_ssrl.txt", "r") as g:
    api_key = g.readline()

a_key = api_key.strip()

cl = CitrinationClient(site='https://slac.citrination.com',api_key=a_key )

SAXSKIT provides two options for training:
* training from scratch
* updating existing models using additional data

"training from scratch" is useful for initial training or when we have a lot of new data (more than 30%). It is recommended to use "hyper_parameters_search = True." 

Updating existing models is recommended when we have some new data (less than 30%). Updating existing models takes significant less time than "training from scratch"

#### Step 1. Get new data from Citrination

In [4]:
new_data = get_data_from_Citrination(client = cl, dataset_id_list= [16]) # [16] is a list of datasets ids

#### Step 2 (optional). Get all available data from Citrination

If we want to update not just models, but also accuracy records, we need to specify the data we want to use to calculate the accuracy. It is recommended to use all available data, including the data that was used for initial training.   

In [3]:
all_data = get_data_from_Citrination(client = cl, dataset_id_list= [1,15,16])

#### Step 3. Update Classifiers

In [6]:
t0 = time()
train_classifiers_partial(new_data, yaml_filename = None, all_training_data = all_data)
print("Updating took about", (time()-t0)/60, " minutes.")

Updating took about 0.12016316254933675  minutes.


In [7]:
with open("saxskit/saxskit/modeling_data/accuracy.txt", "r") as g:
    accuracy = g.readline()    
accuracy

"{'diffraction_peaks': 0.98447795672704952, 'guinier_porod': 0.74100823366667412, 'spherical_normal': 0.9840544296725201, 'unidentified': 0.98859691602846556}"

#### Step 4. Update rergession models

In [8]:
t0 = time()
train_regressors_partial(new_data, yaml_filename = None, all_training_data = all_data)
print("Updating took about", (time()-t0)/60, " minutes.")

Updating took about 0.683045748869578  minutes.


In [9]:
with open("saxskit/saxskit/modeling_data/accuracy_regression.txt", "r") as g:
    accuracy = g.readline()    
accuracy

"{'r0_sphere': 0.26510273493240721, 'rg_gp': 1.9332349538350049, 'sigma_sphere': 0.55638931402969005}"

#### Step 5. Compare accuracy and re-train models if it is needed.

If new accuracy is worth than accuracy we had before updating, it is recommended to retrain the models from scratch using all available data:

In [4]:
from saxskit.saxskit.saxs_models import train_classifiers, train_regressors

train_classifiers(all_data,  hyper_parameters_search = True)

In [5]:
with open("saxskit/saxskit/modeling_data/accuracy.txt", "r") as g:
    accuracy = g.readline()    
accuracy

"{'unidentified': 0.98662964095680705, 'spherical_normal': 0.98921031716337415, 'guinier_porod': 0.8234882586415645, 'diffraction_peaks': 0.97733857620106368}"

In [6]:
train_regressors(all_data,  hyper_parameters_search = True)

In [7]:
with open("saxskit/saxskit/modeling_data/accuracy_regression.txt", "r") as g:
    accuracy = g.readline()    
accuracy

"{'r0_sphere': 0.14301895118612798, 'sigma_sphere': 0.6480025842788546, 'rg_gp': 0.23343661949244066}"