# OS Module in Python

<p>**os** module of python is prodigiously used for loading the data on the Memory and preprocessing tasks in Machine Learning. Preprocessing may differ depending on what one intends to do. One should be familiar with the ways in which **os** module can be used to suit their needs. </p>

<h3>To list a few functions that are used:</h3>
- *os.listdir*
- *os.walk*
- *os.mkdir*

<h3>Few attributes used:</h3>
- *os.path.sep*

In [12]:
import os

In [13]:
folder_path = '.' # Single dot indicates the path of the CURRENT folder/directory. In UNIX term, single dot is a hard-link to the current directory.

<h3>Print the list of names of all the files and folders/directories present in the CURRENT folder/directory only</h3>



In [14]:
print(os.listdir('.'))

['dummyFolder1', 'filename1.txt', 'filename3.csv', 'Week2Task3(UnixCommands).pdf', '.ipynb_checkpoints', 'datasets', 'filename2.txt', 'Basic usage of os module in Python.ipynb']


<h3>Observe the output of os.walk</h3>

In [20]:
labels = []
validExtensions = ['jpg', 'jpeg', 'png', 'bmp']
for pathName, folderNames, fileNames in os.walk('./datasets/animals'):
    print(pathName, folderNames, fileNames)
    print()

('./datasets/animals', ['cats', 'dogs', 'pandas'], [])
()
('./datasets/animals/cats', [], ['README.MD', 'cat3.jpg', 'cat4.bmp', 'cat2.jpg', 'cat1.jpg'])
()
('./datasets/animals/dogs', [], ['labels.csv', 'dog2.jpg', 'dog1.jpg', 'dog4.jpeg', 'dog3.jpg', 'dog5.png'])
()
('./datasets/animals/pandas', [], ['panda6.png', 'panda2.jpg', 'panda4.bmp', 'panda3.csv', 'panda1.jpg', 'panda5.txt'])
()


<h3>Print all the path names, folder and file names present in datasets folder (recursively)</h3>


In [21]:
print(os.listdir('./datasets'))
for pathName, folderNames, fileNames in os.walk('./datasets'):
    print(pathName, folderNames, fileNames)

['animals', 'df2_file1.csv']
('./datasets', ['animals'], ['df2_file1.csv'])
('./datasets/animals', ['cats', 'dogs', 'pandas'], [])
('./datasets/animals/cats', [], ['README.MD', 'cat3.jpg', 'cat4.bmp', 'cat2.jpg', 'cat1.jpg'])
('./datasets/animals/dogs', [], ['labels.csv', 'dog2.jpg', 'dog1.jpg', 'dog4.jpeg', 'dog3.jpg', 'dog5.png'])
('./datasets/animals/pandas', [], ['panda6.png', 'panda2.jpg', 'panda4.bmp', 'panda3.csv', 'panda1.jpg', 'panda5.txt'])


<h3>Print the path name along with file name for all the files present in the datasets/animals folder (recursively)</h3>


In [22]:
imagePaths = []
for pathName, folderNames, fileNames in os.walk('./datasets/animals'): #http://www.bogotobogo.com/python/python_traversing_directory_tree_recursively_os_walk.php
    for fileName in fileNames:
        imagePaths.append(pathName+'/'+fileName)
        
print(imagePaths)

['./datasets/animals/cats/README.MD', './datasets/animals/cats/cat3.jpg', './datasets/animals/cats/cat4.bmp', './datasets/animals/cats/cat2.jpg', './datasets/animals/cats/cat1.jpg', './datasets/animals/dogs/labels.csv', './datasets/animals/dogs/dog2.jpg', './datasets/animals/dogs/dog1.jpg', './datasets/animals/dogs/dog4.jpeg', './datasets/animals/dogs/dog3.jpg', './datasets/animals/dogs/dog5.png', './datasets/animals/pandas/panda6.png', './datasets/animals/pandas/panda2.jpg', './datasets/animals/pandas/panda4.bmp', './datasets/animals/pandas/panda3.csv', './datasets/animals/pandas/panda1.jpg', './datasets/animals/pandas/panda5.txt']


<h3>Keep only those paths with fileNames ending with .jpg, .jpeg, .png, .bmp</h3>

In [23]:
imagePaths = []
validExtensions = ['jpg', 'jpeg', 'png', 'bmp']
for pathName, folderNames, fileNames in os.walk('./datasets/animals'):
    for fileName in fileNames:
        if fileName.split(".")[-1] in validExtensions: #https://pythonprogramminglanguage.com/split-string/
            imagePaths.append(pathName+'/'+fileName)
            
print(imagePaths)

['./datasets/animals/cats/cat3.jpg', './datasets/animals/cats/cat4.bmp', './datasets/animals/cats/cat2.jpg', './datasets/animals/cats/cat1.jpg', './datasets/animals/dogs/dog2.jpg', './datasets/animals/dogs/dog1.jpg', './datasets/animals/dogs/dog4.jpeg', './datasets/animals/dogs/dog3.jpg', './datasets/animals/dogs/dog5.png', './datasets/animals/pandas/panda6.png', './datasets/animals/pandas/panda2.jpg', './datasets/animals/pandas/panda4.bmp', './datasets/animals/pandas/panda1.jpg']


<h3>Extract the class label assuming that our path has the following format :</h3>
<h4>/path/to/dataset/{class}/{image}.jpg<h4>
<h3>Hint: Look at the output of imagePaths above</h3>

In [24]:
labels = []
for imagePath in imagePaths:
    label = imagePath.split(os.path.sep)[-2]
    if  label not in labels: #os.path.sep refers to path separator. 
        labels.append(label) #On Windows, path separator is '\', where as on Ubuntu it is '/'

print(labels)

['cats', 'dogs', 'pandas']
