## Capture Data
The goal of this Notebook is to **reorganize, zip and place in the same folder all the NIfTI files downloaded from the [ADNI dataset](http://adni.loni.usc.edu)**.  

The first challenge of working with the ADNI dataset is the total size of all NIfTI files, which is 121.24 GB. The dataset is downloaded in zips for each collection of images. In total there are three collections (ADNI1_Complete_1Yr_1.5T, ADNI1_Complete_2Yr_1.5T and ADNI1_Complete_3Yr_1.5T).

Below it is shown the directory structure after downloading the three collections.

```
Original_files
├── ADNI1_Complete_1Yr_1.5T
│   ├── Zip1
│   ├── Zip2
│   ├── Zip3
│   ├── Zip4
│   ├── Zip5
│   ├── Zip6
│   ├── Zip7
│   ├── Zip8
│   ├── Zip9
│   └── Zip10
├── ADNI1_Complete_2Yr_1.5T
│   ├── Zip1
│   ├── Zip2
│   ├── Zip3
│   ├── Zip4
│   └── Zip5
└── ADNI1_Complete_3Yr_1.5T
    └── Zip1
```

When downloading the ADNI dataset in zips, each zip contains a list of folders. However, inside each folder there could be other folders or directly the NIfTI files. In other words, the structure of all folders change from one to another. Below it is shown an example of two directories in the same zip. As it can be seen, each directory has a different structure.

```
Zip1
├── dir1
│   └── subdir1.1
│       └── subdir1.1.1
│            └── NIfTI file
└── dir2
    ├── subdir2.1
    │   └── NIfTI file
    └── subdir2.2
        └── NIfTI file

``` 

This notebook is structured as follows:
   - Import libraries
   - Define functions
   - Extract NIfTI  files: detect NIfTI  files from all the directories, zip them and place them in a new directory.

### Import libraries

In [1]:
import os
import gzip
import shutil

### Define functions

In [2]:
def extract_images(root, new_root):
    '''
    Function to extract NIfTI images from a directory, zip them and save them in a new folder
    Inputs: original directory, new directory
    Output: NIfTI images zipped and saved in the new directory
    ''' 

    # Create new directory to move files in case it doesn´t exist
    if not os.path.exists(new_root):
        os.makedirs(new_root)

    # Get directions of the file and move them to new folder
    for folder in sorted(os.listdir(root)):

        # Avoid trigerring .DS_Store
        if folder.startswith('.'):
            continue
        
        print('Extracting NIfTI files from folder:', os.path.join(root, folder))
        count_files = 0

        directions = []
        directions.append(os.path.join(root,folder))
        all_files = False

        while all_files == False:
            for index, path in enumerate(directions):

                if (os.path.isfile(path)) == False:

                    for subfolder in sorted(os.listdir(path)):

                        if subfolder.startswith('.'):
                            continue

                        directions.append(os.path.join(path,subfolder))

                    directions.remove(path)

                if index == (len(directions) -1):
                    is_not_file = False

                    for item in directions:
                        if (os.path.isfile(item)) == True:
                            continue
                        else:
                            is_not_file = True

                    if is_not_file == False:
                        all_files = True
                    else:
                        break
    
        # Copy files, compress them and move them to new folder
        for direction in directions:
            new_direction = os.path.join(new_root,direction.split('/')[-1]) + '.gz'

            # Check if file already exists in the new folder
            if os.path.exists(new_direction) == False:
                
                # Copy image file to new folder
                with open(direction, 'rb') as f_in:
                    with gzip.open(new_direction, 'wb') as f_out:
                        shutil.copyfileobj(f_in, f_out)
            else:
                print(f"NIfTI file already exists in the new folder: {new_direction}")
                
            count_files += 1
            
            if count_files % 50 == 0:
                print('[+] Number of NIfTI files processed:', count_files)
              
        print('Total number of NIfTI files processed:', count_files)
        print('*' * 30)

### NIfTI  files extraction
Detect all NIfTI files from a directory (root), zip them and place them in a new directory (new_root).

#### Specify origin and new directories
Origin directories are:  
    
    - "../Datasets/Original_files/ADNI1_Complete_1Yr_1.5T"  
    - "../Datasets/Original_files/ADNI1_Complete_2Yr_1.5T"  
    - "../Datasets/Original_files/ADNI1_Complete_3Yr_1.5T"


In [3]:
# Define root path where the zips downloaded from ADNI dataset are
root = '../Datasets/Original_files/ADNI1_Complete_3Yr_1.5T'
print(f'Number of zip folders in the {root} directory =', len(os.listdir(root)) - 1)

# Define the folder path where to move the Nifti files zipped
new_root = '../Datasets/Extracted_files'

Number of zip folders in the ../Datasets/Original_files/ADNI1_Complete_3Yr_1.5T directory = 1


#### Run process

In [5]:
# Run process - extract Nifti files from folder root and move them to folder new_root
extract_images(root, new_root)

Extracting NIfTI files from folder: ../Datasets/Original_files/ADNI1_Complete_3Yr_1.5T/Zip_1
[+] Number of NIfTI files processed: 50
[+] Number of NIfTI files processed: 100
[+] Number of NIfTI files processed: 150
[+] Number of NIfTI files processed: 200
[+] Number of NIfTI files processed: 250
[+] Number of NIfTI files processed: 300
[+] Number of NIfTI files processed: 350
Total number of NIfTI files processed: 354
******************************
