# Using the OS Module

<span>This notebook is a combination of little snippets of Python code from Python's OS module that can I found useful for a variety of tasks. These tasks include removing files from directories, moving files from directories, parsing content from the notebook, appending content to files, changing file types, etc. Hopefully, you find some useful code below.</span>

### Import Preliminaries

In [33]:
# Import module
import os

### Get Current Directory

In [34]:
# Get the current directory
cwd = os.getcwd()
cwd

'/Users/kavi/Documents/DataScience/Pipelines'

### Find Path Function

Search the given directory and it's subdirectories for the first instance a specific file. Return the file path of this file.

In [35]:
# Define the find path function
def find_path(name, path):
    '''
    Search the given directory for the first instance a specific file. 
    Return the file path of this file.

    Parameter
    ---------
    name: name of the file (str)
    path: absolute path to the directory to search withing (str)

    Example
    --------
    >>>> find_path('10-15-17 Rescaling Features.ipynb',
          '/Users/Kavi/Documents/DataScience')
    '''
    for root, dirs, files in os.walk(path):
        if name in files:
            return os.path.join(root, name)

In [36]:
# Run our Find Path Punction
find_path('10-15-17 Rescaling Features.ipynb',
          '/Users/Kavi/Documents/DataScience')

'/Users/Kavi/Documents/DataScience/Guides/10-15-17 Rescaling Features.ipynb'


### Breaking Down the Find Path Function

The `os.walk(path)` function is pretty cool. Let's take a moment to break down each variable this function returns the `root` variable is the directory we are searching, `dirs` are the recursive subdirectories that the function is searching, and `files` are the files that exist in the directory. 

In [37]:
# Breaking Down this Function
path = '/Users/Kavi/Documents/DataScience'
name = 'README.md'
result = []

# Print the root, diretories, and files in our path
for root, dirs, files in os.walk(path):
    print('\n\n'+'-'*15)
    print('Root:',root)
    print('\n\n'+'-'*15)
    print('Dirs:',dirs)
    print('\n\n'+'-'*15)
    print('Files:',files)
    print('\n\n'+'-'*15)
    print(os.walk(path))
    break



---------------
Root: /Users/Kavi/Documents/DataScience


---------------
Dirs: ['Predictive Analysis', 'Kaggle', 'FlashCards', 'Whitepapers', 'DateTime', 'Articles', 'Other', 'Competitions', 'Descriptive Analysis', 'Pipelines', 'Books', 'Stack Overflow', 'Visualizations', 'Notes', 'Guides', 'Economics', 'Tutorials', '.git', 'Brainstation', 'Community', 'Interviews', 'Techniques', 'SQL']


---------------
Files: ['.DS_Store', 'Portfolio.md', 'README.md', '.gitignore']


---------------
<generator object walk at 0x1128d0e08>



### Find All Paths Function

Search the given directory and it's subdirectory for the every instance of a specific file. Return the all file paths of this file as a list.

In [38]:
# Define the find all paths function
def find_all_paths_one_file(name, path):
    '''
    Search the given directory and it's subdirectory for the every 
    instance of a specific file. Return the all file paths of this 
    file as a list.

        Parameter
    ---------
    name: name of the file (str)
    path: absolute path to the directory to search withing (str)

    Example
    --------
    >>>> find_all_paths_one_file('README.md','/Users/Kavi/Documents/DataScience')

    '''
    result = []
    for root, dirs, files in os.walk(path):
        if name in files:
            # os.path.join(root, name) is a string
            result.append(os.path.join(root, name))

    return result

In [39]:
# Run our "find all paths" function
find_all_paths_one_file('README.md','/Users/Kavi/Documents/DataScience')

['/Users/Kavi/Documents/DataScience/README.md',
 '/Users/Kavi/Documents/DataScience/Predictive Analysis/README.md',
 '/Users/Kavi/Documents/DataScience/Competitions/README.md',
 '/Users/Kavi/Documents/DataScience/Competitions/DonorsChoose Application/README.md',
 '/Users/Kavi/Documents/DataScience/Descriptive Analysis/README.md',
 '/Users/Kavi/Documents/DataScience/Pipelines/AWS Pipelines/README.md',
 '/Users/Kavi/Documents/DataScience/Notes/Readings/README.md',
 '/Users/Kavi/Documents/DataScience/Tutorials/README.md',
 '/Users/Kavi/Documents/DataScience/Tutorials/Tutorial - Luigi/data-engineering-101-master/README.md',
 '/Users/Kavi/Documents/DataScience/Tutorials/Tutorial - Luigi/data-engineering-101-master/topmodel/README.md',
 '/Users/Kavi/Documents/DataScience/Brainstation/Other/WebDev Class Slack/README.md',
 '/Users/Kavi/Documents/DataScience/Community/README.md',
 '/Users/Kavi/Documents/DataScience/Techniques/README.md',
 '/Users/Kavi/Documents/DataScience/SQL/README.md']


### Find All Path for a list of Files
Search the given directory and it's subdirectory for the every instance of ever file in a list. Return the all file paths for every file as a list.

In [40]:
# Define the "find_all_paths_multi_file" function
def find_all_paths_multil_files(list_of_files, path):
    '''
    Search the given directory and it's subdirectory for the every 
    instance of ever file in a list. Return the all file paths for 
    every file as a list.
    
    Parameter
    ---------
    list_of_files: list of names for the files we are searching for (list)
    path: absolute path to the directory to search withing (str)

    Example
    --------
    >>>> find_all_paths_one_file('README.md','/Users/Kavi/Documents/DataScience')
    
    '''
    result = []
    for name in list_of_files:
        for root, dirs, files in os.walk(path):
            if name in files:
                # os.path.join(root, name) is a string
                result.append(os.path.join(root, name))
                
    return result

In [41]:
# Generating a list of files we all looking for
list_of_files = ['README.md',
                 '02-01-17 4 Time Saving Tricks in Pandas.ipynb']

# Run our "find_all_paths_all_files" function
find_all_paths_all_files(list_of_files,'/Users/Kavi/Documents/DataScience')

['/Users/Kavi/Documents/DataScience/README.md',
 '/Users/Kavi/Documents/DataScience/Predictive Analysis/README.md',
 '/Users/Kavi/Documents/DataScience/Competitions/README.md',
 '/Users/Kavi/Documents/DataScience/Competitions/DonorsChoose Application/README.md',
 '/Users/Kavi/Documents/DataScience/Descriptive Analysis/README.md',
 '/Users/Kavi/Documents/DataScience/Pipelines/AWS Pipelines/README.md',
 '/Users/Kavi/Documents/DataScience/Notes/Readings/README.md',
 '/Users/Kavi/Documents/DataScience/Tutorials/README.md',
 '/Users/Kavi/Documents/DataScience/Tutorials/Tutorial - Luigi/data-engineering-101-master/README.md',
 '/Users/Kavi/Documents/DataScience/Tutorials/Tutorial - Luigi/data-engineering-101-master/topmodel/README.md',
 '/Users/Kavi/Documents/DataScience/Brainstation/Other/WebDev Class Slack/README.md',
 '/Users/Kavi/Documents/DataScience/Community/README.md',
 '/Users/Kavi/Documents/DataScience/Techniques/README.md',
 '/Users/Kavi/Documents/DataScience/SQL/README.md',
 '/Us


### Importing Data From Text File

In [42]:
# Open a text file (which is a list of notebook names)
file = open('Data/Notebooks/notebooks.txt','r', encoding="utf-8")

# Read file
file.read()

'18-09-15 Model Stacking.ipynb\n18-09-10 Linear Regression.ipynb\n18-09-05 Dropping Features.ipynb\n18-09-03 Using TQDM.ipynb\n18-09-03 Isolation Forest Classifier.ipynb\n18-08-31 Confusion Matrices.ipynb\n18-08-02 Downsampling.ipynb\n18-08-02 Import Matlab Data.ipynb\n18-08-01 Plotting Residuals.ipynb\n18-08-01 Histograms.ipynb\n18-08-01 Dimensional Pivot Table.ipynb\n18-08-01 Heatmaps.ipynb\n18-08-01 Standardizations.ipynb\n18-08-01 Handling Missing Data.ipynb\n18-08-01 K-Nearest Neighbours Classifier.ipynb\n18-08-01 Random Forest Classifier.ipynb\n18-08-01 Decision Tree Classifier.ipynb\n18-08-01 Cross Validation and K-Folds.ipynb\n18-08-01 Train-Test Split.ipynb\n18-08-01 Recursive Feature Elimination.ipynb\n18-08-01 Random Grid Search.ipynb\n18-08-01 Full Grid Search.ipynb\n18-08-01 DBSCAN.ipynb\n18-08-01 Tools of a Data Scientist.ipynb\n18-07-31 Fizzbuzz.ipynb\n18-07-31 Regex in Python.ipynb\n18-07-31 Logistic Regression.ipynb\n18-07-29 Iris Analysis.ipynb\n18-07-29 Classificatio


### Import Data From Text FIle into a List

In [43]:
# Open a text file (which is a list of notebook names)
file = open('Data/notebooks.txt','r', encoding="utf-8")

# Read file lines instead
file.readlines()

['17-08-01 Cross Validation and K-Folds.ipynb\n',
 '18-07-03 Writing a File to AWS S3.ipynb\n',
 '18-07-03 Reading a File from AWS S3.ipynb\n',
 '18-07-03 Connecting to a Local Database.ipynb\n',
 '18-07-03 Connecting to a Local Database with SHH.ipynb\n',
 '17-11-07 Dimensional Pivot Table.ipynb\n',
 '18-06-06 Every Matplotlib Plot Linestyle.ipynb\n',
 '18-06-06 Every Matplotlib Plot Marker.ipynb\n',
 '17-08-05 Histograms.ipynb\n',
 '15-02-02 Barplots.ipynb\n',
 '17-10-16 Wordclouds in Python.ipynb\n',
 '17-12-04 Heatmaps.ipynb\n',
 '18-07-07 Styling DataFrames.ipynb\n',
 '18-07-07 Resampling Datetime.ipynb\n',
 '18-07-24 Converting Notebook to Slides.ipynb\n',
 '18-07-24 Label Encoding.ipynb\n',
 '17-10-15 Plotting Residuals.ipynb\n',
 '17-10-15 Correlation Matricesipynb\n',
 '18-07-04 Select Dtypes.ipynb\n',
 '18-03-28 Creating Dummy Variables.ipynb\n',
 '17-10-09 Standardizations.ipynb\n',
 '17-08-01 Random Grid Search.ipynb\n',
 '17-08-01 Full Grid Search.ipynb\n',
 '18-07-28 Note


### Second method to got from a Text FIle into a List

In [44]:
# Open a text file (which is a list of notebook names) # Second Method
with open('Data/notebooks.txt','r', encoding="utf-8") as f:
   
    # Use ".read().splitlines()" instead of  ".readlines()"
    mylist = f.read().splitlines()
    
# View list
mylist

['17-08-01 Cross Validation and K-Folds.ipynb',
 '18-07-03 Writing a File to AWS S3.ipynb',
 '18-07-03 Reading a File from AWS S3.ipynb',
 '18-07-03 Connecting to a Local Database.ipynb',
 '18-07-03 Connecting to a Local Database with SHH.ipynb',
 '17-11-07 Dimensional Pivot Table.ipynb',
 '18-06-06 Every Matplotlib Plot Linestyle.ipynb',
 '18-06-06 Every Matplotlib Plot Marker.ipynb',
 '17-08-05 Histograms.ipynb',
 '15-02-02 Barplots.ipynb',
 '17-10-16 Wordclouds in Python.ipynb',
 '17-12-04 Heatmaps.ipynb',
 '18-07-07 Styling DataFrames.ipynb',
 '18-07-07 Resampling Datetime.ipynb',
 '18-07-24 Converting Notebook to Slides.ipynb',
 '18-07-24 Label Encoding.ipynb',
 '17-10-15 Plotting Residuals.ipynb',
 '17-10-15 Correlation Matricesipynb',
 '18-07-04 Select Dtypes.ipynb',
 '18-03-28 Creating Dummy Variables.ipynb',
 '17-10-09 Standardizations.ipynb',
 '17-08-01 Random Grid Search.ipynb',
 '17-08-01 Full Grid Search.ipynb',
 '18-07-28 Notebook Snippets.ipynb',
 '18-07-29 Iris Analysis


### Copying Files to Another Directory

In [45]:
# Import mode modules
from shutil import copyfile

# generate a list of file paths with one file
file_paths = ['/Users/Kavi/Documents/DataScience/README.md']

# copy file to a different destination for every file in our list
for filepaths in file_paths:
    copyfile(filepaths, '/Users/Kavi/Documents/DataScience/Pipelines/Data/copy_README.md')


### Writing a File to a Directory

In [46]:
# Open text file
with open('Data/sample_file.txt','w') as f2:
    
    # Write some text in the file
    f2.write('sample text')
    
    # Close the file
    f2.close()

Note: Close you filse to save memory when iterating files


### Appending Data to a File

Here we are open a file appending some data to it and saving it again as another file.

In [47]:
# Open an existing file as "f1"
with open('Data/sample_file.txt','r') as f1:
    # Create a new file as "f2
    with open('Data/sample_file2.txt','w') as f2: 
        
        # Copy the text from the first file 
        f2.write(f1.read())
        
        # Write a new string info the file 
        f2.write("\n You are the second version of the same file")
        
        # Close the new file
        f2.close()
    
    # Close the original file
    f1.close()

<br><br>
### Deleting Data in a Diretory

In [48]:
# Remove the text files from of data diretory
os.remove("Data/sample_file2.txt")
os.remove("Data/sample_file.txt")

Author: Kavi Sekhon