# ETL Workspace

The purposes of this notebook is to document and explore prelminary ETL (or possibly ELT) processes necessary for efficient storage, retrieval, and analysis of Degu Lab ephys data.

In [1]:
import os

from degpy.session import Session
from degpy.scraper import Scraper

### Source Data
This is data in original file structure from degu lab. Follows the following structure:
```
data
    080602
        080602_ps01_160614
            2016-06-14_09-39-10
                LFP1.ncs
                LFP2.ncs
                E1.ncs
```         

In [2]:
SOURCE_PATH = '/Volumes/Backup Plus/data'
data = Scraper.crawl_files(SOURCE_PATH) # this is tuple of file paths, data size

In [10]:
print('Number of datafiles: {}'.format(len(data[0])))
print('Total size of files: {:.2f}GB'.format(data[1]))

Number of datafiles: 2418
Total size of files: 94.41GB


The above files were all migrated into a single directory for ease of extraction, using the `Scraper.move_files()` function. Need to validate all files were migrated.

In [11]:
DEST_PATH = 'ephys'

print('Numer of datafiles in dest: {}'.format(len(os.listdir(DEST_PATH))))

Numer of datafiles in dest: 2418


In [12]:
os.listdir(DEST_PATH)

['270101_270101_ps10_151015_2015-10-15_09-15-40_LFP4.ncs',
 '210402_210402_ps26_160907_2016-09-07_17-00-31_LFP5.ncs',
 '080602_080602_ps01_160614_2016-06-14_09-39-10_LFP6.ncs',
 '270102_270102_ps20_160212_2016-02-12_09-23-39_LFP2.ncs',
 '080602_080602_ps04_160618_2016-06-18_16-10-33_LFP8.ncs',
 '210402_210402_ps02_160806_2016-08-06_15-13-23_LFP10.ncs',
 '080602_080602_ps06_160621_2016-06-21_13-24-49_LFP11.ncs',
 '080602_080602_ps01_160614_2016-06-15_09-57-59_LFP8.ncs',
 '270102_270102_ps14_160206_2016-02-06_11-32-36_Audio.ncs',
 '210402_210402_ps32_160916_2016-09-16_13-01-55_LFP10.ncs',
 '210402_210402_ps08_160812_2016-08-12_12-48-18_E1.ncs',
 '080602_080602_ps09_160624_2016-06-24_12-14-13_LFP9.ncs',
 '270102_270102_ps10_160202_2016-02-02_12-03-16_CRB.ncs',
 '210402_210402_ps22_160901_2016-09-01_09-38-18_LFP8.ncs',
 '210402_210402_ps05_160809_2016-08-09_09-37-25_LFP10.ncs',
 '080602_080602_ps02_160616_2016-06-17_13-33-35_Audio.ncs',
 '210402_210402_ps30_160913_2016-09-13_10-59-21_LFP8.

This isn't a very robust means of validating migration, but is fine for this exploration phase. Can't compare filenames because I changed filenames to following convention 
    
    '270101_270101_ps10_151015_2015-10-15_09-15-40_LFP4.ncs'

For production migration, will need to write something more robust.

## Testing Session module 

In [16]:
dest_files = [os.path.join('ephys', x) for x in os.listdir('ephys')]

In [19]:
from degpy import neuralynx_io

test_ncs = neuralynx_io.load_ncs(dest_files[0])

In [20]:
test_ncs

{'file_path': '/Users/maxcopeland/PycharmProjects/degpy/ephys/270101_270101_ps10_151015_2015-10-15_09-15-40_LFP4.ncs',
 'raw_header': b'######## Neuralynx Data File Header\r\n## File Name C:\\CheetahData\\270101_ps10_151015\\2015-10-15_09-15-40\\LFP4.ncs\r\n## Time Opened (m/d/y): 10/15/2015  (h:m:s.ms) 9:15:40.726\r\n## Time Closed (m/d/y): 10/15/2015  (h:m:s.ms) 11:10:8.441\r\n\r\n-FileType CSC\r\n-FileVersion 3.3.0\r\n-RecordSize 1044\r\n\r\n-CheetahRev 5.6.3 \r\n\r\n-HardwareSubSystemName AcqSystem1\r\n-HardwareSubSystemType DigitalLynx\r\n-SamplingFrequency 2034.75\r\n-ADMaxValue 32767\r\n-ADBitVolts 0.000000061037020770982053\r\n\r\n-AcqEntName LFP4\r\n-NumADChannels 1\r\n-ADChannel 1\r\n-InputRange 2000\r\n-InputInverted True\r\n\r\n-DSPLowCutFilterEnabled True\r\n-DspLowCutFrequency 0.3\r\n-DspLowCutNumTaps 0\r\n-DspLowCutFilterType DCO\r\n-DSPHighCutFilterEnabled True\r\n-DspHighCutFrequency 500\r\n-DspHighCutNumTaps 64\r\n-DspHighCutFilterType FIR\r\n-DspDelayCompensation Dis