In most real-world projects, the data files are not orderly placed in a single folder. Instead, to be more organized they are often clustered under diffrent sub-folders. This is the case for the data provided for [siim-covid19-detection](https://www.kaggle.com/c/siim-covid19-detection) competition.

In order to handle this project the very first step is to load the images efficiently. In this notebook, two methods are provided for this step.

This is particulatly important to automate loading large datasets with a few lines of code.

First let's import some libraries

In [None]:
import numpy as np 
import pandas as pd 
import os
from os import path
import itertools

**Method1: Using the files information sheet**:
In the [siim-covid19-detection](https://www.kaggle.com/c/siim-covid19-detection) data, the folder and file names are given in the train_image_level.csv file. We can use this information to load all files.
 
First, let's define the paths and load the .csv file.

In [None]:
InputPath = "../input/siim-covid19-detection"
TrainPath = f"{InputPath}/train"
TestPath = f"{InputPath}/test"

train_image_level = pd.read_csv(f"{InputPath}/train_image_level.csv")
train_image_level.head(
)


In [None]:
#let's change the id to ImageID, and remove the '-image' from the end of each element
train_image_level.rename({"id" : "ImageID"}, axis = 1, inplace = True)
train_image_level["ImageID"] = train_image_level["ImageID"].apply(lambda x:f'{x[:-6]}')

train_image_level.head()

In order to save time, let's pick a sample of 5 images to read for now. You can later apply the same method to read all files.

In [None]:
sample_df = train_image_level.head(5).reset_index(drop = True)
print(sample_df)

In [None]:
for i, rows in sample_df.iterrows():
    dir = os.listdir(TrainPath + "/" + rows["StudyInstanceUID"])
    for k in dir:
        ImagePath_1 = TrainPath + "/" + rows["StudyInstanceUID"] + "/"+ k + "/" + rows["ImageID"] + ".dcm"
        if path.exists(ImagePath_1):
            print(ImagePath_1)
            break

**Method2: Using Pythom method walk()**:
The [walk()](https://www.tutorialspoint.com/python/os_walk.htm) method generates the file names in a directory tree.

More information can be found [here](https://docs.python.org/3/library/os.html).

In [None]:
AllTrainFiles=os.walk(TrainPath)
for root, dirs, files in itertools.islice(AllTrainFiles,15):
    for name in files:
        if name[-4:]=='.dcm':
            ImagePath_2 = os.path.join(root, name)
            print(ImagePath_2)