# Using File and Directories in Pyhton

Python has very powerful tools to deal with your operating system's filesystem. These tools can be very useful when you need to do complex filesystem operations, but also when you simply want your code to be more robust when handeling files.

You can find more detailed information in the Python documentation: https://docs.python.org/3/library/filesys.html
A nice article on the use of these tools is here: https://realpython.com/working-with-files-in-python/

Most of the items that you need are in the "os" package that is a standard part of Python. If you are using Python version 3.6 or newer, there is also a package called "pathlib". 

In [63]:
import os

## Using some of the os features.

First, let us see which directory your Python script, or notebook in this case, is running in with `os.getcwd()`, then make a directory listing of some of these files in one of the sub-directories with `os.listdir()`. If you are on Python 3.5 or later, you can also use `os.scandir()`, which works similarly, but does not return a list, instead it returns an itterator.

In [64]:
print(os.getcwd())
# Without an arguement, os.listdir() shows all files and dirctories in the current dir.
files = os.listdir("Extra") 
print("Using os.listdir()")
for f in files:
    print(f)

print("")
print("Using os.scandir()")
for f in os.scandir("Extra"):
    print(f.name, " - is a file " if f.is_file() else "- is a dir"  )
# Note that os.scandir returns itterable generator, and each next() returns an object, that contains 
# more information than just the file or directory name. 

/Users/maurik/Library/CloudStorage/OneDrive-USNH/Phys601/Repo/Notebooks
Using os.listdir()
wave_packet_exercise.ipynb
Coins_puzzle.ipynb
Ellipse.ipynb

Using os.scandir()
wave_packet_exercise.ipynb  - is a file 
Coins_puzzle.ipynb  - is a file 
Ellipse.ipynb  - is a file 


## Extracting more information

The submodule os.path has a number of useful functions to extract more information from the filename. Note that you have to make sure that you pass a correct path to these functions.

Examples are `os.path.realpath()`,  which returns the full pathname of a file or directory, and `os.path.isfile()` and `os.path.isdir()` and `os.path.islink()` which identify what type of entry it is.

Also useful is `os.path.join()` to correctly combine paths and filenames, and `os.path.basename()` which extracts the filename from a full directory + filename string, and `os.path.dirname()` that gives the full directory.

In [65]:
full_dir_with_file = os.path.realpath(os.path.join("Extra", files[0]))
print(f"Full path: {full_dir_with_file}")
print(f"File name only: {os.path.basename(full_dir_with_file)}")
print(f"Directory: {os.path.dirname(full_dir_with_file)}")
print(f"Relative path: {os.path.relpath(full_dir_with_file)}")
print(f"Does this file exist? : {os.path.exists(full_dir_with_file)}")

Full path: /Users/maurik/Library/CloudStorage/OneDrive-USNH/Phys601/Repo/Notebooks/Extra/wave_packet_exercise.ipynb
File name only: wave_packet_exercise.ipynb
Directory: /Users/maurik/Library/CloudStorage/OneDrive-USNH/Phys601/Repo/Notebooks/Extra
Relative path: Extra/wave_packet_exercise.ipynb
Does this file exist? : True


### Advanced use os.walk() or os.

Sometimes, you want to search the file systems for something. You could do this with os.listdir() and then check for the directories and recurse (i.e. call your function from your function), but that is a lot of coding. Instead, you can use `os.walk()`, which also returns an itterable generator.

The items returns from os.walk() will be a list with 3 entries. The 0 entry will be the current directory, the 1 entry a list of sub-directories, and the 2 entry a list of files in the current directory.

Let us create a bit of code that finds all the files called "README.md" in the current directory and all those below, and displays their full path.

In [66]:
for curdir, subdirs, files in os.walk("."):
    if "README.md" in files:
        full_path = os.path.realpath(os.path.join(curdir,"README.md"))
        print(f"Found: {full_path}")

Found: /Users/maurik/Library/CloudStorage/OneDrive-USNH/Phys601/Repo/Notebooks/README.md
Found: /Users/maurik/Library/CloudStorage/OneDrive-USNH/Phys601/Repo/Notebooks/Symbolic-computation-Python-master/README.md


More flexible filename matching can be accomplished with the fnmatch package. There are two useful functions here `fnmatch.fnmatch()` and `fnmatch.filter()`. Let's use those to list all the "*.md" files, instead of only the "README.md" files.

In [67]:
import fnmatch
for curdir, subdirs, files in os.walk("."):
    for f in files:
        if fnmatch.fnmatch(f, "*.md"):
            full_path = os.path.realpath(os.path.join(curdir,f))
            print(f"Found: {full_path}")

Found: /Users/maurik/Library/CloudStorage/OneDrive-USNH/Phys601/Repo/Notebooks/README.md
Found: /Users/maurik/Library/CloudStorage/OneDrive-USNH/Phys601/Repo/Notebooks/Symbolic-computation-Python-master/README.md


In [68]:
import fnmatch
for curdir, subdirs, files in os.walk("."):
    match_files = fnmatch.filter(files, "*.md")
    for f in match_files:
        full_path = os.path.realpath(os.path.join(curdir,f))
        print(f"Found: {full_path}")

Found: /Users/maurik/Library/CloudStorage/OneDrive-USNH/Phys601/Repo/Notebooks/README.md
Found: /Users/maurik/Library/CloudStorage/OneDrive-USNH/Phys601/Repo/Notebooks/Symbolic-computation-Python-master/README.md
