# Getting Started with CML Readers

In [1]:
import cmlreaders as cml

## Finding Files on Rino

The PathFinder helper class can be used to locate files on RHINO. It's sole responsibility is to locate and return the file path of the file. In many cases, a file could be located in more than one location. In these situations, PathFinder will search over the list of possible locations and return the path where the file is first found. Implicitly, this assumes that the order of the file locations is prioritized such that the preferred location comes before a fall-back location. 

In [2]:
# If not working on RHINO, specify the mount point
rhino_root = "/Volumes/RHINO/"

# Instantiate the finder object
finder = cml.PathFinder(subject="R1389J", experiment="catFR5", session="1", 
                        localization="0", montage="0", rootdir=rhino_root)

### What can you request?

The PathFinder has a few built-in properties to help you understand what data types are currently supported. Different file types require that the finder be instantiated with different fields. For example, if you are planning to request localization files, there is no need to specify an experiment, session, or montage. However, it is not a problem to specify too many fields, as an extraneous ones will simply be ignored if the data type does not require that it be given. The following properties are defined:
- requestable_files: All supported data types
- localization_files: Files related to localization
- montage_files: Files associated with a specific montage
- session_files: Files that are specific to a session. This files could be processed events, Ramulator files, etc.

In [3]:
finder.requestable_files

['protocols_database',
 'voxel_coordinates',
 'prior_stim_results',
 'electrode_coordinates',
 'jacksheet',
 'area',
 'electrode_categories',
 'good_leads',
 'leads',
 'classifier_excluded_leads',
 'localization',
 'pairs',
 'contacts',
 'session_summary',
 'classifier_summary',
 'math_summary',
 'target_selection_table',
 'trained_classifier',
 'all_events',
 'task_events',
 'math_events',
 'ps4_events',
 'sources',
 'experiment_log',
 'session_log',
 'ramulator_session_folder',
 'event_log',
 'experiment_config',
 'raw_eeg',
 'odin_config',
 'used_classifier',
 'excluded_pairs',
 'all_pairs']

In [4]:
finder.localization_files

['voxel_coordinates',
 'prior_stim_results',
 'electrode_coordinates',
 'jacksheet',
 'good_leads',
 'leads',
 'area',
 'classifier_excluded_leads',
 'localization',
 'electrode_categories']

In [5]:
finder.montage_files

['pairs', 'contacts']

In [6]:
finder.session_files

['session_summary',
 'classifier_summary',
 'math_summary',
 'used_classifier',
 'excluded_pairs',
 'all_pairs',
 'experiment_log',
 'session_log',
 'event_log',
 'experiment_config',
 'raw_eeg',
 'odin_config',
 'all_events',
 'task_events',
 'math_events',
 'ps4_events']

### Finding File Paths

In [7]:
# Find some example files
example_data_types = ['pairs', 'task_events', 'voxel_coordinates']
for data_type in example_data_types:
    print(finder.find(data_type=data_type))

/Volumes/RHINO/protocols/r1/subjects/R1389J/localizations/0/montages/0/neuroradiology/current_processed/pairs.json
/Volumes/RHINO/protocols/r1/subjects/R1389J/experiments/catFR5/sessions/1/behavioral/current_processed/task_events.json
/Volumes/RHINO/data10/RAM/subjects/R1389J/tal/VOX_coords_mother.txt


## Loading Data

In most cases, the end goal is to load the data into memory rather than just locating the file. In this case, CML Readers provides a handy class to unify the API for loading data. By default, the location will be determined automatically based on the file type using the PathFinder class. However, a custom path can be given by using the file_path keyword. Each data type can be loaded into one of three common data structures for performing analyses:
1. Pandas Dataframe
2. Python Dictionary
3. Numpy recarray

Although recarray's are supported, users are **highly** encouraged to switch to using pandas dataframes as they provide a much richer and easier to use API. In fact, pandas was created specifically to replace recarrays. If you can find a data-analysis related use case that a recarray provides that is not available in pandas, I will personally give you $1. 

In [8]:
reader = cml.CMLReader(subject="R1389J", experiment="catFR5", session="1", 
                       localization="0", montage="0", rootdir=rhino_root)

In [9]:
reader.load("electrode_coordinates")

In [10]:
# Data Frame
electrode_coord_df = reader.as_dataframe()
electrode_coord_df.head()

Unnamed: 0,contact_name,contact_type,x,y,z,atlas,orient_to
0,oA1,D,0.867173,-0.3088,-0.077759,monopolar_orig,oA2
1,oA8,D,1.265359,-0.102247,-0.048478,monopolar_orig,
2,oA2,D,0.913253,-0.278048,-0.083519,monopolar_orig,oA3
3,oA3,D,0.984687,-0.248392,-0.071328,monopolar_orig,oA4
4,oA4,D,1.048341,-0.212167,-0.074716,monopolar_orig,oA5


In [11]:
# Dictionary
electrode_coord_dict = reader.as_dict()
electrode_coord_dict[:5]

[{'contact_name': 'oA1',
  'contact_type': 'D',
  'x': 0.867173,
  'y': -0.3088,
  'z': -0.07775927156960215,
  'atlas': 'monopolar_orig',
  'orient_to': 'oA2'},
 {'contact_name': 'oA8',
  'contact_type': 'D',
  'x': 1.2653590000000001,
  'y': -0.1022474,
  'z': -0.048477514623626164,
  'atlas': 'monopolar_orig',
  'orient_to': nan},
 {'contact_name': 'oA2',
  'contact_type': 'D',
  'x': 0.9132530000000001,
  'y': -0.278048,
  'z': -0.08351888482911221,
  'atlas': 'monopolar_orig',
  'orient_to': 'oA3'},
 {'contact_name': 'oA3',
  'contact_type': 'D',
  'x': 0.9846870000000001,
  'y': -0.248392,
  'z': -0.07132760474565615,
  'atlas': 'monopolar_orig',
  'orient_to': 'oA4'},
 {'contact_name': 'oA4',
  'contact_type': 'D',
  'x': 1.0483410000000002,
  'y': -0.2121674,
  'z': -0.07471577238168621,
  'atlas': 'monopolar_orig',
  'orient_to': 'oA5'}]

In [12]:
# DO NOT USE: Recarray
electrode_coord_recarray = reader.as_recarray()
electrode_coord_recarray[:5]

rec.array([(0, 'oA1', 'D', 0.867173, -0.3088   , -0.07775927, 'monopolar_orig', 'oA2'),
           (1, 'oA8', 'D', 1.265359, -0.1022474, -0.04847751, 'monopolar_orig', nan),
           (2, 'oA2', 'D', 0.913253, -0.278048 , -0.08351888, 'monopolar_orig', 'oA3'),
           (3, 'oA3', 'D', 0.984687, -0.248392 , -0.0713276 , 'monopolar_orig', 'oA4'),
           (4, 'oA4', 'D', 1.048341, -0.2121674, -0.07471577, 'monopolar_orig', 'oA5')],
          dtype=[('index', '<i8'), ('contact_name', 'O'), ('contact_type', 'O'), ('x', '<f8'), ('y', '<f8'), ('z', '<f8'), ('atlas', 'O'), ('orient_to', 'O')])

## Saving Data

CML Readers supports saving data to at most 3 file formats:
1. CSV
2. JSON
3. HDF5

However, depending on the data type, not all output formats may be supported. In particular, HDF5 output is only supported for a minimal number of data types. Data that has been saved using one of the built-in methods can always be reloaded using it's corresponding reader. Since locations are automatically determined based on the data type, if loading from a custom location, be sure to specify the file_type parameter when loading the data.

In [13]:
reader.to_csv("electrode_coordinates_test.csv")

In [14]:
reader.load(data_type="electrode_coordinates", file_path="./electrode_coordinates_test.csv")

In [15]:
electrode_coord_df = reader.as_dataframe()
electrode_coord_df.head()

Unnamed: 0,contact_name,contact_type,x,y,z,atlas,orient_to
0,oA1,D,0.867173,-0.3088,-0.077759,monopolar_orig,oA2
1,oA8,D,1.265359,-0.102247,-0.048478,monopolar_orig,
2,oA2,D,0.913253,-0.278048,-0.083519,monopolar_orig,oA3
3,oA3,D,0.984687,-0.248392,-0.071328,monopolar_orig,oA4
4,oA4,D,1.048341,-0.212167,-0.074716,monopolar_orig,oA5
