# <span style="color:darkgreen">Tutorial: adding "cleaned" data to the database</span> 

This tutorial shows how to add "cleaned" data to the database. 

For example, existing spike trains may still have duplicate recordings of spikes. After running the spike trains through a cleaning algorithm, you may want to add the new data to the database while still retaining the original "raw" spike trains. This tutorial gives an example of how to do so, within our framework. 

In [2]:
import os
import numpy as np
import random 
from scipy import signal

from database.db_setup import *

Please enter DataJoint username:  root
Please enter DataJoint password:  ···············


Connecting root@localhost:3306


### <span style="color:darkblue"> Create "Cleaned" data </span> 

Simulate a data-cleaning algorithm by simple downsampling from the pre-generated mock data. 

In [14]:
# Set "algo" save  names
algo_name = "spkclean"
annotator = "p1" # taken from config file

# Load patient ids and session numbers
patient_ids, session_nrs = MovieSession.fetch("patient_id", "session_nr")

# Iterate through patients and making a directory for later saving the cleaned data to
for index_session in range(0, len(patient_ids)):
    path_binaries = '{}/unit_level_data_cleaning/'.format(config.PATH_TO_DATA)
    folder = path_binaries + str(patient_ids[index_session]) + '/session_' + str(
        session_nrs[index_session]) + "/"
    if not os.path.exists(folder):
        os.makedirs(folder)
    
    # Iterate over each unit for a given patient, apply the "cleaning" algorithm, and save to above directory
    patient_units = get_unit_ids_for_patient(patient_ids[index_session])
    for unit in patient_units: 
        unit_spikes = get_spiking_activity(patient_ids[index_session], session_nrs[index_session], unit)
        
        # Run "cleaning" algo on the spikes
        n_samples = len(unit_spikes) - random.randint(20,len(unit_spikes)) # Set the number of spikes to remove ("downsample") from the original unit
        unit_clean = signal.resample(unit_spikes, n_samples)
        
        # Set the save name 
        save_path = os.path.join(folder, "{}_unit{}_{}.npy".format(algo_name, unit, annotator))
        np.save(save_path, unit_clean)

### <span style="color:darkblue"> Add cleaned data to the database </span> 

In `database/db_setup.py`, there is a function called `UnitLevelDataCleaning()`. Here, the database table of the same name is set-up. 

To add the cleaned data to this table, just populate the table. The table function expects the relevant data to be formulated with:
* filename saved as `<algorith name or cleaning process>_unit<number>_<annotator id>.npy`
> from this, the function extracts the relevant information and uses it to populate the table items 
* files saved in `<PATH_TO_DATA>/unit_level_data_cleaning/<patient_id>/session_<number>/*`
> where `PATH_TO_DATA` is set in the `database/config.py` file 

In [None]:
UnitLevelDataCleaning.populate()

### <span style="color:darkblue"> Check that the table was correctly filled by printing it </span> 


In [3]:
UnitLevelDataCleaning()

patient_id  patient ID,unit_id  unique ID for unit (for respective patient),session_nr  session ID,annotator_id  unique ID for each annotator,name  unique name for data cleaning,data  actual data,description  description of data cleaning
1,0,1,p1,spkclean,=BLOB=,
1,1,1,p1,spkclean,=BLOB=,
1,2,1,p1,spkclean,=BLOB=,
1,3,1,p1,spkclean,=BLOB=,
1,4,1,p1,spkclean,=BLOB=,
1,5,1,p1,spkclean,=BLOB=,
1,6,1,p1,spkclean,=BLOB=,
1,7,1,p1,spkclean,=BLOB=,
1,8,1,p1,spkclean,=BLOB=,
1,9,1,p1,spkclean,=BLOB=,
