# How to get data using pyNS!

In this tutorial, we will demonstrate how to get data directly from pyNS, the Neuroscout python API. Useful available data include pre-extracted predictors from the naturalistic movies and confounds from fMRIprep for all of the subjects.

To access some specific predictors, we will need to know which datasets are available, and within them the specific tasks, subjects and runs, that we want to get the corresponding predictors for.

In [18]:
from pyns import Neuroscout
api = Neuroscout()
datasets=api.datasets.get()

At the time of writing this, there were 12 datasets available in neuroscout. We can print the dataset count here:

In [19]:
print(f'dataset count = {len(api.datasets.get())}\n')

dataset count = 12



Next, we can print the names of these datasets and their unique IDs. We can use the dataset name or dataset ID to query the API.

In [20]:
print('Datasets and IDs:\n')
for i in datasets:
    print(i['name'], i['id'])

Datasets and IDs:

studyforrest 11
Raiders 10
SchematicNarrative 20
SherlockMerlin 5
Sherlock 21
narratives 30
Life 9
ParanoiaStory 18
LearningTemporalStructure 19
Budapest 27
NaturalisticNeuroimagingDatabase 28
ReadingBrainProject 29


Some datasets have multiple tasks like the SherlockMerlin dataset, which has both the Merlin and Sherlock TV shows. Notice that "MerlinMovie" has task "id": 4 and Sherlock has task ID 45


In [38]:
api.datasets.get(5)['tasks']

[{'TR': 1.5,
  'avg_run_duration': 1543,
  'id': 4,
  'n_runs_subject': 1,
  'n_subjects': 18,
  'name': 'MerlinMovie',
  'summary': 'AV Presentation: Merlin Episode'},
 {'TR': 1.5,
  'avg_run_duration': 1475,
  'id': 45,
  'n_runs_subject': 1,
  'n_subjects': 18,
  'name': 'SherlockMovie',
  'summary': 'AV Presentation: Sherlock Episode'}]

budapest on the other hand only has one task called 'movie'

In [40]:
api.datasets.get(27)['tasks']

[{'TR': 1.0,
  'avg_run_duration': 610,
  'id': 48,
  'n_runs_subject': 5,
  'n_subjects': 25,
  'name': 'movie',
  'summary': 'Movie watching'}]

In [41]:
#we can see here that in the budapest dataset, there are 25 subjects and each subject has 5 runs

#list the subjects in a dataset

In [23]:
#list the runs, explain why runs are important

In [47]:
#list the first few run ids from the budapest dataset
api.datasets.get(27)['runs'][:5]

[1435, 1433, 1434, 1436, 1437]

In [45]:
#get more detail about a specific run:
api.runs.get(1435)

{'acquisition': None,
 'dataset_id': 27,
 'duration': 535.0,
 'id': 1435,
 'number': 3,
 'session': None,
 'subject': 'sid000005',
 'task': 48,
 'task_name': 'movie'}

In [50]:
#or list the first 5 runs from the budapest dataset and their metadata
api.runs.get(dataset_id=27)[:5]

[{'acquisition': None,
  'dataset_id': 27,
  'duration': 535.0,
  'id': 1435,
  'number': 3,
  'session': None,
  'subject': 'sid000005',
  'task': 48,
  'task_name': 'movie'},
 {'acquisition': None,
  'dataset_id': 27,
  'duration': 598.0,
  'id': 1433,
  'number': 1,
  'session': None,
  'subject': 'sid000005',
  'task': 48,
  'task_name': 'movie'},
 {'acquisition': None,
  'dataset_id': 27,
  'duration': 498.0,
  'id': 1434,
  'number': 2,
  'session': None,
  'subject': 'sid000005',
  'task': 48,
  'task_name': 'movie'},
 {'acquisition': None,
  'dataset_id': 27,
  'duration': 618.0,
  'id': 1436,
  'number': 4,
  'session': None,
  'subject': 'sid000005',
  'task': 48,
  'task_name': 'movie'},
 {'acquisition': None,
  'dataset_id': 27,
  'duration': 803.0,
  'id': 1437,
  'number': 5,
  'session': None,
  'subject': 'sid000005',
  'task': 48,
  'task_name': 'movie'}]

Using api.predictors.get() we can obtain a structure that contains all of the predictors corresponding to a specified run.

In [84]:
predictors=api.predictors.get(run_id=1435)

Let's take a look at the meta-information from one of these predictors. This is a predictor called 'csf' that is yhe 'Average signal in CSF mask'. You can see that it's source is fmriprep so this is a subject specific confound from fmriprep.

In [85]:
predictors[3]

{'dataset_id': 27,
 'description': 'Average signal in CSF mask',
 'id': 37048,
 'max': 437.444,
 'mean': 369.345,
 'min': 287.63,
 'name': 'csf',
 'num_na': 0,
 'private': False,
 'source': 'fmriprep'}

We can list all of the predicotrs with the following command:

In [87]:
for p in predictors:
    print(p['name'])


std_dvars
dvars
framewise_displacement
csf
white_matter
global_signal
subtlexusfrequency_FREQcount
subtlexusfrequency_FREQlow
subtlexusfrequency_SUBTLWF
subtlexusfrequency_SUBTLCD
subtlexusfrequency_Dom_PoS_SUBTLEX
subtlexusfrequency_Percentage_dom_PoS
subtlexusfrequency_All_freqs_SUBTLEX
affect_V.Mean.Sum
affect_D.Mean.Sum
aoa_AoA_Kup
massiveauditorylexicaldecision_NumPhones
massiveauditorylexicaldecision_FreqCOCAspok
lancastersensorimotornorms_Auditory.mean
lancastersensorimotornorms_Haptic.mean
lancastersensorimotornorms_Olfactory.mean
lancastersensorimotornorms_Foot_leg.mean
lancastersensorimotornorms_Head.mean
lancastersensorimotornorms_Torso.mean
subtlexusfrequency_CDcount
subtlexusfrequency_Cdlow
subtlexusfrequency_Lg10WF
subtlexusfrequency_Lg10CD
subtlexusfrequency_Freq_dom_PoS_SUBTLEX
subtlexusfrequency_All_PoS_SUBTLEX
subtlexusfrequency_Zipf-value
affect_A.Mean.Sum
concreteness_Conc.M
massiveauditorylexicaldecision_NumSylls
massiveauditorylexicaldecision_Duration
massiveaudit

Wow, there are a lot of features available! But perhaps we do not want to see all of these features from fMRIprep. Here we will print all predictors not from fmriprep along with their cprresponding predictor IDs.

In [88]:
# for p in predictors:
#     if not p['source'] == 'fmriprep' and not p['mean'] == None and str(p['name']).find("bert") < 0:
#         print(p['name'], p['id'])
for p in predictors:
    if not p['source'] == 'fmriprep':
        print(p['name'], p['id'])

subtlexusfrequency_FREQcount 38026
subtlexusfrequency_FREQlow 38028
subtlexusfrequency_SUBTLWF 38030
subtlexusfrequency_SUBTLCD 38032
subtlexusfrequency_Dom_PoS_SUBTLEX 38034
subtlexusfrequency_Percentage_dom_PoS 38036
subtlexusfrequency_All_freqs_SUBTLEX 38038
affect_V.Mean.Sum 38040
affect_D.Mean.Sum 38042
aoa_AoA_Kup 38044
massiveauditorylexicaldecision_NumPhones 38046
massiveauditorylexicaldecision_FreqCOCAspok 38048
lancastersensorimotornorms_Auditory.mean 38050
lancastersensorimotornorms_Haptic.mean 38052
lancastersensorimotornorms_Olfactory.mean 38054
lancastersensorimotornorms_Foot_leg.mean 38056
lancastersensorimotornorms_Head.mean 38058
lancastersensorimotornorms_Torso.mean 38060
subtlexusfrequency_CDcount 38027
subtlexusfrequency_Cdlow 38029
subtlexusfrequency_Lg10WF 38031
subtlexusfrequency_Lg10CD 38033
subtlexusfrequency_Freq_dom_PoS_SUBTLEX 38035
subtlexusfrequency_All_PoS_SUBTLEX 38037
subtlexusfrequency_Zipf-value 38039
affect_A.Mean.Sum 38041
concreteness_Conc.M 38043


In [None]:
for i in predictors:
    if not i['source'] == 'fmriprep' and not i['mean'] == None and str(i['name']).find("bert") < 0:
        predictor_ids.append(i['id'])
        predictor_names.append(i['name'])
        try:
            predictor_modality.append(i['extracted_feature']['modality'])
        except:
            predictor_modality.append(None)

Say you are interested in the "landscape" feature from the budapest dataset. You can choose specify this predictor and dataset using the method below:

In [75]:
api.predictors(dataset_id=27,predictor_name='landscape')

TypeError: 'Predictors' object is not callable

In [76]:
predictors[3] #list a predictor and its metadata

{'dataset_id': 27,
 'description': 'Average signal in CSF mask',
 'id': 37048,
 'max': 437.444,
 'mean': 369.345,
 'min': 287.63,
 'name': 'csf',
 'num_na': 0,
 'private': False,
 'source': 'fmriprep'}

Let's get the events of a specific predictor. Note the stimulus_timing flag. This is important because the beginning of fMRI acquisition does not alwaays directly correspond with the predictor start time. When the flag is set to true, the predictor/confound will be returned with the start time matched relative to fMRI acquisition start time. 

Events are returned in the following format by default.
Soon they will be returned as a pandas dataframe with the following function??? or a tsv file?

In [79]:
an_event=api.predictor_events.get(predictor_id=37048,stimulus_timing=True)
#in the future with the new api:
#an_event=api.predictor_events.get(dataset_name='budapest',predictor_name='landscape',stimulus_timing=True)

In [82]:
an_event[:5]

[{'duration': 1.0,
  'onset': 0.0,
  'predictor_id': 37048,
  'run_id': 1433,
  'stimulus_id': None,
  'value': '435.07973795343895'},
 {'duration': 1.0,
  'onset': 1.0,
  'predictor_id': 37048,
  'run_id': 1433,
  'stimulus_id': None,
  'value': '429.36287047897736'},
 {'duration': 1.0,
  'onset': 2.0,
  'predictor_id': 37048,
  'run_id': 1433,
  'stimulus_id': None,
  'value': '425.50204278063035'},
 {'duration': 1.0,
  'onset': 3.0,
  'predictor_id': 37048,
  'run_id': 1433,
  'stimulus_id': None,
  'value': '425.1020519568309'},
 {'duration': 1.0,
  'onset': 4.0,
  'predictor_id': 37048,
  'run_id': 1433,
  'stimulus_id': None,
  'value': '423.5720765058304'}]

In [25]:
#export the predictor events to a tsv?