# Raw Data Pickling Function
This code pickles raw data downloaded from [Chen et al](https://janelia.figshare.com/articles/Whole-brain_light-sheet_imaging_data/7272617) and converts them into Pandas, Numpy, and Python objects.

To ensure function, load all raw data into [../data/data_raw_from_paper](../data/data_raw_from_paper). Also, make sure that all single-digit subject folders (ie subject_1) are changed into double-digit folders (ie subject_01).

In [1]:
import pandas as pd
import numpy as np
import scipy.io as scio
import h5py
from util_functions import list_subjects, pickle_file, starting_run, finished_run

### Notebook Parameters

In [2]:
output_data_dir = 'data_raw_pickled/'
data_raw_pickled_df = pd.DataFrame(columns=['subject', 'stimulus', 'rel_path', 'timepoints_count',  'timepoints'])

### Notebook Functions

In [3]:
def save_mat_data(path, subject):
    mat = scio.loadmat(path)
    mat_dict = {
        "cell_XYZ": mat['data'][0][0][3],
        "stim_full": mat['data'][0][0][14][0]
    }
    pickle_file(output_data_dir + subject + '_mat_dict.pickle', mat_dict)

In [4]:
def save_h5_data(path, subject):
    f = h5py.File(path, 'r')
    for key in f.keys():
        save_path = output_data_dir + subject + '_' + key + '.pickle'
        pickle_file(save_path, np.array(f[key]))
        if key == 'CellRespZ':
            data_raw_pickled_df.loc[subject] = [subject, None, save_path, np.array(f[key]).shape[0], None]

In [5]:
for subject in list_subjects():
    subject_input_path = '../data/data_raw_from_paper/'+subject+'/'+subject
    starting_run('save ' + subject)
    save_mat_data(subject_input_path + '/data_full.mat', subject)
    save_h5_data(subject_input_path + '/TimeSeries.h5', subject)
finished_run('saving raw data')
pickle_file('data_meta_pickled/data_raw_pickled_df.pickle', data_raw_pickled_df)

Starting save subject_01 20:16:38.918438
Starting save subject_02 20:16:49.482354
Starting save subject_03 20:17:00.972249
Starting save subject_04 20:17:09.642639
Starting save subject_05 20:17:20.325889
Starting save subject_06 20:17:32.167726
Starting save subject_07 20:17:45.036303
Starting save subject_10 20:17:51.388923
Starting save subject_12 20:18:07.438592
Starting save subject_13 20:18:21.811426
Starting save subject_14 20:18:36.458737
Starting save subject_15 20:18:50.093715
Starting save subject_16 20:19:01.447278
Starting save subject_17 20:19:07.983831
Starting save subject_18 20:19:24.142043
Finished saving raw data 20:19:40.951216
