# Turning PAR/REC data into BIDS format using bidsify
#### written by Eduard Klapwijk (https://github.com/eduardklap), with input from Philip Brandner, Suzanne van de Groep, Myrna van den Berg 

### When to use this notebook
This notebook can be used to convert data with PAR/REC images (MRI image format generated by Philips scanners) into BIDS. The first step of this notebook will show you how to structure your data into pseudoBIDS format. After this, we use the bidsify tool (https://github.com/NILAB-UvA/bidsify) created by Lukas Snoek to convert the pseudoBIDS data into BIDS. 

### Shoutout to bidsify, Neurohackademy, and the BIDS community
All kudos should go to the bidsify creators, this is just a guide that allows the usage of bidsify with an interactive notebook. When turning my own data (Eduard) into BIDS I got great help at Neurohackademy 2019 (https://neurohackademy.org).

For more information on BIDS visit https://bids.neuroimaging.io and read the Gorgolewski et al. paper about the BIDS format here: https://doi.org/10.1038/sdata.2016.44
Another useful resource is the BIDS starter kit: https://github.com/bids-standard/bids-starter-kit

### Other useful BIDS tools
There are multiple tools out there to convert datasets into Nifti (one of the main ones being Heudiconv: https://github.com/nipy/heudiconv). We use bidsify because this is as far as I am aware the only converter capable of working with PAR/REC files. Edit May 2020: I recently found out about BIDScoin (https://github.com/Donders-Institute/bidscoin), that should also be able to convert PAR/REC into BIDS.

## Step 1: figure out scan sequence, rename files accordingly

Our situation:
We have a longitudinal dataset with 3 timepoints (T1, T2, T3). For every timepoint we have participant folders with scans in par/rec format. The filenames contain the participant ID, and a number for the scan sequence (but because sometimes scans are restarted this number is not always the same for a particular sequence). It looks something like this:

    /projectname/data/
                    ├── T1/
                    |   ├── BT1P0001/
                    |   |   ├── BT1P0001_4_1.PAR
                    |   |   ├── BT1P0001_4_1.REC
                    |   |   ├── BT1P0001_5_1.PAR
                    |   |   ├── BT1P0001_5_1.REC
                    |   |   ├── BT1P0001_12_1.PAR
                    |   |   └── BT1P0001_12_1.REC
                    |   ├── BT1P0002/
                    |   |   ├── BT1P0002_3_1.PAR
                    |   |   ├── BT1P0002_3_1.REC
                    |   |   ├── BT1P0002_4_1.PAR
                    |   |   ├── BT1P0002_4_1.REC
                    |   |   ├── BT1P0002_10_1.PAR
                    |   |   └── BT1P0002_10_1.REC
                    |   └── BT1P0345/
                    |       ├── BT1P0345_3_1.PAR
                    |       ├── BT1P0345_3_1.REC
                    |       ├── BT1P0345_9_1.PAR
                    |       └── BT1P0345_9_1.REC
                    ├── T2/
                    |   ├── BT2P0001/
                    |   |   ├── BT2P0001_4_1.PAR
                    |   |   ├── BT2P0001_4_1.REC
                    |   |   ├── BT2P0001_6_1.PAR
                    |   |   ├── BT2P0001_6_1.REC
                    |   |   ├── BT2P0001_10_1.PAR
                    |   |   └── BT2P0001_10_1.REC
                    |   └── BT2P0345/
                    |       ├── BT1P0345_4_1.PAR
                    |       ├── BT1P0345_4_1.REC
                    |       ├── BT1P0345_11_1.PAR
                    |       └── BT1P0345_11_1.REC
                    |
                    └── T3/
                        ├── BT3P0001/
                        |   ├── et cetera
                        et cetera

For every timepoint we want to rename the files to reflect what kind of scan it is (e.g, T1-weighted, task, rsfmri). It might also be a good idea to copy the files to a new location, so that we do not have to change the original raw data.  

First we have to define the old directory (with the raw data) and the new dir  
Here it is for timepoint 1. We can change it later and do the same for timepoint 2 and timepoint 3

*Note*: you have to change all the directories in the next box that have a red color to make sure that you can use this notebook. 

In [None]:
import os
import sys
import shutil
import glob
from pathlib import Path

rawDir = '/projectname/data/T1'
# it is useful to have one base directory, in which we have the raw (but renamed) data, 
                                                # and later the pseudobids and bids data
base_dir = os.chdir('/projectname/data/bidsify/')
rerawDir = '/projectname/data/bidsify/raw/T1'
pseudoDir = '/projectname/data/bidsify/pseudobids'

We check whether the rerawDir already exists. If not, it will be made

In [None]:
newp = Path(rerawDir)
if not Path.exists(newp):
    Path(newp).mkdir(parents = True)

### make subdirectories in rerawDir

We make a list with all the dirs with data in the rawDir  
Here we restrict it using directory.startswith('BT') --> only dirs starting with BT will be put in the list.  
Using pathlib (which should work on Windows/Mac/Linux systems) we create the subdirectories in the rerawDir

In [None]:
dirlist = []
for root, directories, filenames in os.walk(rawDir):
    for directory in directories:
        if directory.startswith('BT'):
            dirlist.append(directory) 
            
            #create copies of the subdirectories in the rerawDir directory
            if not Path.exists(newp/directory):
                Path(newp/directory).mkdir()
            
print(dirlist)

### Rename and copy files
We have a loop that loops through the directories and files in the root directory.   
When a file is found:  
1. its protocollines are inspected for different protocols (e.g., rsfMRI, 3DT1) in the line. *Note*: in our case the protocol description is on the 14th line, you might have to change this  
2. the file is copied to the new directory  
3. the file is renamed if the name does not exist yet (we always put a 5-digit identifier such as T1mri, btask, rsfmr at the end of the filename to make it easier to work with the filenames later on). 

*Note*: in our case, different tasks have the same protocol name ('fMRI 210 SENSE'), but in other cases there might be a more useful protocolname that denotes the kind of task. In that case it would be helpful to change this in the code to more specific names.

In [None]:
for root, directories, filenames in os.walk(rawDir):
    if filenames != []:
        for file in filenames:
            # extract the file name and path
            name = root + '/' + file
            name = name.replace("\\", "/")
            
            if file[-4:] == '.PAR':
                # open and read the protocolline needed for renaming
                with open(name, 'r') as f:
                    protocolline = f.readlines()
                
                # check what sort of file it is and setup the rename accordingly
                if '3DT1' in protocolline[13]:
                    rename = rerawDir + '/' + root[-8:] + '/' + file[:-4] + '_T1mri' + file[-4:]
                    # Make sure we copy and rename the files in(to) the correct path
                    WhichtoChange = rerawDir + '/' + root[-8:] + '/' + file
                    print('We found a 3DT1 file: ' + file)
                
                elif 'fMRI 210 SENSE' in protocolline[13]:
                    rename = rerawDir + '/' + root[-8:] + '/' + file[:-4] + '_btask' + file[-4:] 
                    WhichtoChange = rerawDir + '/' + root[-8:] + '/' + file
                    print('We found a btask file: ' + file)
                    
                elif 'rsfMRI' in protocolline[13]:
                    rename = rerawDir + '/' + root[-8:] + '/' + file[:-4] + '_rsfmr' + file[-4:]
                    WhichtoChange = rerawDir + '/' + root[-8:] + '/' + file
                    print('We found a rsfmri file: ' + file)
                
                elif 'hires' in protocolline[13]:
                    rename = rerawDir + '/' + root[-8:] + '/' + file[:-4] + '_T2str' + file[-4:]
                    WhichtoChange = rerawDir + '/' + root[-8:] + '/' + file
                    print('We found a T2* file: ' + file)
                
                else:
                    dontRename = file[:-4]
                    continue
                             
                # copy and rename the files in the new folder 
                # (gives warning if the file name already exists in the newDir)
                if not os.path.isfile(rename):
                    shutil.copy(str(name), str((rerawDir + '/' + root[-8:])))
                    os.rename(WhichtoChange, rename)
                else:
                    print('WARNING: file ' + rename + ' already exists in the folder! This file will therefore be skipped!')                       

### Copy and rename REC files

Next, we rename the .REC files based on the corresponding .PAR files (assuming that apart from the extension, they have exactly the same name)   
We will print whether a specific type of file is found, and to where it will be copied

In [None]:
for root, directories, filenames in os.walk(rerawDir):
    if filenames != []:
        for file in filenames:
            #extract the file name and path
            name = root + '\\' + file
            name = name.replace("\\", "/")
             
            if file[-4:] == '.PAR':
                numberPAR = file[-14:-10]
                typePAR = file[-9:-4]
                print('we found a ' + typePAR + ' file')
                nameREC = rawDir + '/' +  file[:8] + '/' + file[:-10] + '.REC'
                renameREC = name[:-4] + '.REC'
                
            else:
                dontRename = file[:-4]
                continue
                
            if not os.path.isfile(renameREC):
                print('REC file will be copied and renamed: ' + renameREC)
                shutil.copy(str(nameREC), str(renameREC))
                
            else:
                print('WARNING: file ' + renameREC + ' already exists in the folder! This file will therefore be skipped!')                   

#### Let's have a look at the files in our newDir:

In [None]:
for root, directories, filenames in os.walk(rerawDir):
    print(root) 
    for filename in filenames:
        print(filename)

## Step 2: Rename participants and put data in pseudoBIDS format

### You can use the code below to make a list with all your participant names in the rerawDir. 

Assuming that your directories contain all the participant names, this can be easily done. 
Note that we strip the name and only use the number (to make it timepoint agnostic) 

In [None]:
dirlist2 = []
for root, directories, filenames in os.walk(rerawDir):
    for directory in directories:
        if directory.startswith('BT'):
            dirlist2.append(directory[4:8]) 

print(dirlist2)

We make the pseudobids dir (if it does not exist yet)  
Next, using this list we do the following:  
For every participant we make a new folder with different sessions underneath (which resembles the bids format). So previously we had T1, T2, T3, we now call these ses-01, ses-02, ses-03 and they are nested within participants.  
After running the code, the structure looks like this:  

    /projectname/data/bidsify/pseudobids/
                                    ├── sub-BTP0001/
                                    |            ├── ses-01/
                                    |            ├── ses-02/
                                    |            └── ses-03/
                                    ├── sub-BTP0002/
                                    |            ├── ses-01/
                                    |            ├── ses-02/
                                    |            └── ses-03/
                                    ├── sub-BTP0345/
                                    |            ├── ses-01/
                                    |            ├── ses-02/
                                    |            └── ses-03/ 
                                    └── sub-etcetera/
                                                 ├── ses-01/
                                                 ├── ses-02/
                                                 └── ses-03/ 

In [None]:
# set prefix (MUST include sub- and you can add some study specific digits, as we did with BTP)
prefix = 'sub-BTP'
newp2 = Path(pseudoDir)

# make pseudoDir
if not Path.exists(newp2):
    Path(newp2).mkdir(parents = True)

# make participant folders
for folder in dirlist2:
    folder = folder.strip()
    if not Path.exists(newp2/(prefix + str(folder))/(str('ses-01'))):
        Path(newp2/(prefix + str(folder))/(str('ses-01'))).mkdir(parents=True)
        print(newp2/(prefix + str(folder))/(str('ses-01')))
    if not Path.exists(newp2/(prefix + str(folder))/(str('ses-02'))):
        Path(newp2/(prefix + str(folder))/(str('ses-02'))).mkdir(parents=True)
        print(newp2/(prefix + str(folder))/(str('ses-02')))
    if not Path.exists(newp2/(prefix + str(folder))/(str('ses-03'))):
        Path(newp2/(prefix + str(folder))/(str('ses-03'))).mkdir(parents=True)
        print(newp2/(prefix + str(folder))/(str('ses-03')))

Let's use the code below to check the directories in our pseudoDir, now we should see a list of participants directories followed by a list of session directories (note that you can check this also in Windows Explorer / Finder if you find this helpful). 

In [None]:
for root, directories, filenames in os.walk(pseudoDir):
    print(root)

### Now use the code below to place every PAR/REC file you want to convert to BIDS format, in the correct sub-directories (e.g. subject/ses-01 dir). 

Note that you have to specify the correct names of your t1 scans and functional scans (for each run) below. You can do that by replacing the file names in red. Note that the file names should contain stars ('*') at the start and the end to make sure that the code runs for all participants (i.e. you only specify what the filenames of all participants have in common). 

In this step, we also add the 'bold' suffix to all bold scans. This will help bidsify to find the correct files later

In [None]:
prefix = 'sub-BTP'
for root, directories, filenames in os.walk(rerawDir):
    if filenames != []:
        for file in filenames:
            #extract the file name and path
            name = root + '/' + file
            name = name.replace("\\", "/")
            dest_bold = os.path.join(pseudoDir, prefix + file[4:8] + '/' + 'ses-01' + '/' + \
                                     file[:-4] + '_bold' + file[-4:])
            dest = os.path.join(pseudoDir, prefix + file[4:8] + '/' + 'ses-01' + '/' + file)


            if file[-9:-4] == 'btask' or file[-9:-4] == 'rsfmr':
                if not os.path.isfile(dest_bold):
                    print('copied file to:' + dest_bold)
                    shutil.copy(name, dest_bold)
                else:
                    print('WARNING: file ' + dest_bold + ' already exists in the folder! This file will therefore be skipped!')
            elif file[-9:-4] == 'T1mri' or file[-9:-4] == 'T2str':
                if not os.path.isfile(dest):
                    print('copied file to:' + dest)
                    shutil.copy(name, dest)
                else:
                    print('WARNING: file ' + dest + ' already exists in the folder! This file will therefore be skipped!')

#### We can check the files in the pseudobids directories to see if this went well:

In [None]:
for root, directories, filenames in os.walk(pseudoDir):
    for directory in directories:
        print(directory)
    for filename in filenames:
        print(filename)   

## Step 3: Install BIDSify to turn pseudoBIDS data into BIDS format

For this step we first have to install bidsify and its dependencies. 

This is what we have to do:
1. Install bidsify:
pip install bidsify
2. Install / check dependencies:
dcm2niix (release v1.0.20181125 or newer)
nibabel
scipy
numpy
joblib (for parallelization)
pandas
3. Make a config-file in either the json or YAML format. 

### Step 3.1: Install bidsify + most dependencies (see for more info https://github.com/NILAB-UvA/bidsify)

In [None]:
!pip install bidsify

### Step 3.2: Install dcm2niix

On local machine install using Conda:

In [None]:
!conda install --yes -c conda-forge dcm2niix 

If you work on a high performance cluster you should install dcm2niix on the cluster.
I did the following on the Shark HPC that we use (https://git.lumc.nl/shark/shark-centos-slurm-user-guide/-/wikis/home). Install it in your user or user/jupyter folder (copy code outside of this notebook):

#### installing dcm2niix on Shark:
cd /home/jupyter/bin  
git clone https://github.com/rordenlab/dcm2niix.git  
cd dcm2niix  
mkdir build && cd build  
#### to avoid an error when using make .., do: 
cmake -DUSE_STATIC_RUNTIME=OFF ..  
make
#### finally copy binary to user/bin:
cp dcmniix etklapwijk/bin

### Step 3.3: install bids-validator

In [None]:
pip install bids_validator

## Step 4: Run Bidsify command to turn data into BIDS format

### Step 4.1: Make a config-file in either the json or YAML format 

In order to run the bidsify script we have to make a config-file in either the json or YAML format. This file contains information about the experiment, such as the types and names of the anatomical and functional scans. The bidsify script needs this information to strucure the data according to the BIDS format. You will have to create this json file yourself as it is specific for each functional MRI task. You can find an example here: https://github.com/eur-synclab/bidsification/blob/master/config.json.

We'll go for json. We can either use R package 'jsonlite' or use http://jsoneditoronline.org. See https://github.com/NILAB-UvA/bidsify#the-config-file for details.
Place the file in the current directory --> see example ./config.json

In [None]:
with open(base_dir + '/config.json', 'r') as config:
    print(config.read())

## Step 4: Run Bidsify command to turn data into BIDS format
see https://github.com/NILAB-UvA/bidsify#how-does-it-work for details

To run the bidsify script, you have to replace the path to your .json file and pseudobids directory below with the path to your own json file and pseudobids directory. 

In [None]:
!bidsify -c /projectname/data/bidsify/config.json \
-d /projectname/data/bidsify/pseudobids

We rename the T2w to T2star (for more clarity, and according to bids specifications)
In the bidsify package there is currently no T2star mapping (as far as I am aware), therefore we simply rename the files in this step. 

In [None]:
for root, directories, filenames in os.walk(base_dir + '/bids'):
    if filenames != []:
        for file in filenames:
            #extract the file name and path
            name = root + '/' + file
            name = name.replace("\\", "/")

            if 'T2w.nii' in file:
                renameT2nii = root + '/' + file[:-7] + 'T2star' + file[-4:]
                if not os.path.isfile(renameT2nii):
                    print('renamenii:' + renameT2nii)
                    os.rename(name, renameT2nii)
                else:
                    print('WARNING: file ' + renameT2nii + ' already exists in the folder! This file will therefore be skipped!')
            if 'T2w.json' in file:
                renameT2js = root + '/' + file[:-8] + 'T2star' + file[-5:]
                if not os.path.isfile(renameT2js):
                    print('renamejs:' + renameT2js)
                    os.rename(name, renameT2js)
                else:
                    print('WARNING: file ' + renameT2js + ' already exists in the folder! This file will therefore be skipped!')                                                         

### If everything went well, we now have a new directory called 'bids' with our data in BIDS format
Let's have a look:

In [None]:
for root, directories, filenames in os.walk(base_dir + '/bids'):
    for directory in directories:
        print(directory)
    for filename in filenames:
        print(filename)     

### To add here: BIDS validator in the notebook to validate whether everything is correct

For now, the online BIDS validator can be used (https://bids-standard.github.io/bids-validator/) using Chrome or Firefox 
