# Data Retrieval

In order to retrieve the VNC dataset, you need to instantiate a neuprint Client, using the API key given to you when you sign up with a google account. 

As of June 07 2023, the MANC (manc:v1.0) or male VNC dataset is publicly available at https://neuprint.janelia.org/. To use this dataset see the following:

In [None]:
from neuprint import Client

MANC_client = Client('neuprint.janelia.org', dataset='manc:v1.0', token='Your-API-Key')

The client essentially acts as a bridge between the specific database and the API, so we can also create a client object for the hemibrain dataset:

In [None]:
hemibrain_client = Client('neuprint.janelia.org', dataset='hemibrain:v1.2.1', token='Your-API-Key')

# Data Importing/Storage
A wrapper class, ConnDF, provided within data_processing.py provides the core data import, storage and processing functions. 

Different instances of ConnDF can be made which parallel the client. i.e.) One instance for the MANC dataset, and one for the hemibrain dataset.

In [None]:
from data_processing import ConnDF

# Here you would pass in the MANC_client you created above. 
MANC_data = ConnDF(client=MANC_client)
# Instead you pass in the hemibrain_client. 
hemibrain_data = ConnDF(client=hemibrain_client)

The connectome datasets can be retrieved everytime an instance of ConnDF is called, however downloading this takes a long time. The datasets themselves come in the form of three .csvs, which are not that large in size, so storing this to local disk and then loading it is a more viable option. This is provided as an option for the function for loading/downloading the datasets.

To download the dataset, simply call the extract_full() method of your ConnDF instance with no parameters.

In [None]:
MANC_data.extract_full()

This will retrieve all neuron metadata and all 'Traced' connections in the dataset, then save it to a default path in the current working directory (using the default path defined by neuprint.fetch_adjacencies). Using this path, one can then reload the dataset in future sessions by calling extract_full(file_path='default')

In [None]:
MANC_data.extract_full(file_path='default') # Throws 

To update the dataset, call the extract_full() method again, which redownloads the latest dataset and overwrites the .csvs in the default path.

In [None]:
MANC_data.extract_full()

# Data Processing/Filtering

The available processing and filtering methods are a set of wrapped pandas queries and operations for convenience and particular use in the analyses and simulations performed by other packages of the CAST toolbox. These operate on variables of ConnDF instances which store the neuron metadata dataframe in neuron_master and connection edgelist dataframe in conn_master. Any filtering or processing step is performed on a copy of the conn_master dataframe, stored in conn_filter. 

In [None]:
# To access the neuron metadata dataframe:
MANC_data.neuron_master

# To access the connection edgelist dataframe:
MANC_data.conn_master