# Interaction with the file system

### The standard library

Python comes with a lot of pre-installed modules (standard Python library) which greatly extend the language.

Visit the [Python Module of the Week](https://pymotw.com/3/) website to get a good overview. All these modules are directly shipped with Python, hence «batteries included».

### Importing modules

At the beginning of most Python files, you will see a list of `import` statements.

With the `import` statement we tell Python look for a module and treat that module like a variable.

The **order** where Python looks for modules in the system is as follows:

1. look in the current path
2. look in the paths specified by the `PYTHONPATH` environment variable, if this variable exists
3. look in the standard library path (`lib/python3.x/`)
4. look in the path where all external modules, including those from [pypi.org](https://pypi.org), are installed (usually in `lib/python3.x/site-packages`)

In [None]:
import os
print("the variable 'os' contains a module:", os)

In [None]:
import click  # a module installed from pypi.org
print("'click' can be found here: ", click)

There is a module `my_hello_world` in this subdirectory `my_modules`.
The following will print an error, because the module cannot be found just like that:

In [None]:
import my_hello_world

We can tell Python to look in the local folder by using the `from <folder> import <module>` syntax:

In [None]:
from my_modules import my_hello_world
my_hello_world.say_hello("World!")

`sys.path` tells us where Python is looking for modules

In [None]:
import sys
sys.path

Because the interpreter is already started, we can no longer specify the `PYTHONPATH` variable, but we can change the content of `sys.path`:

In [None]:
# put "my_modules" in the front of everything else
sys.path.insert(0,"my_modules")

now we can import the module directly:

In [None]:
import my_hello_world
my_hello_world.say_hello("This", "is", "Python!")

Often, the module name is rather long to type, so we give it an **alias**:

In [None]:
import my_hello_world as mhw
mhw.say_hello("this", "works", "too!")

### The `sys` module

This module shows a lot of information about the Python interpreter itself.

In [None]:
sys.version

In [None]:
sys.version_info

A typical example how we can avoid a script from being executed with the **wrong Python interpreter**:

In [None]:
if sys.version_info < (3,6):
    sys.exit('Sorry, Python < 3.6 is not supported')

In [None]:
sys.executable

### A few Jupyter tricks

**put a question mark ? directly after any method or module name** and execute the cell to receive the so called _docstring_. 

In [None]:
import os
os?

It becomes especially handy if you can't remember the parameters that you need to provide:

In [None]:
print?


**use Jupyter’s TAB completion to list all methods**

enter the following cell, then hit the tabulator key after the dot: a list of possible methods will appear as a vertical list.

In [None]:
os.path.

### More, more modules! The Python Package Index PyPi

Python comes already with pre-installed modules. But there is much more! The Python Package Index (https://pypi.org) hosts thousands of additional modules which solve almost all possible everyday problems. Simply use the `pip` command line tool, which is being shipped with Python, to install them.

You can put an exclamation mark `!` at the beginning of a code cell to execute the command within Jupyter in a shell. **Example:**

In [None]:
!pip install pandas

The followng does the same, just with the `-m` parameter to tell Python to use a specific module:

In [None]:
!python3 -m pip install pandas

**list installed modules**

Sometimes you need to know which packages you've installed so far, and which versions you used. If you distribute your script, you want to put these in a `requirements.txt` file.

In [None]:
!pip freeze > requirements.txt

then later, people can install exactly the same modules in their exact versions, like this:

`pip install -r requirements.txt`

## Interaction with the file system: the `os` module

**current working directory**

In [None]:
os.getcwd()

**all files in a directory**

In [None]:
os.listdir('.')

**create, rename and delete a file**

In [None]:
!touch _testfile

In [None]:
os.path.exists('_testfile')

In [None]:
os.rename('_testfile', 'testfile')

In [None]:
os.path.exists('_testfile')

In [None]:
os.remove('_testfile')

### setting file access permissions: `chmod`

In [None]:
!touch _test_file_permissions

In [None]:
os.stat('_test_file_permissions')

In [None]:
os.stat('_test_file_permissions').st_mode

get the octal representation of the file permission

In [None]:
oct(os.stat('_test_file_permissions').st_mode)

shorten the octal representation

In [None]:
oct(os.stat('_test_file_permissions').st_mode & 0o777)

change file permissions

In [None]:
os.chmod('_test_file_permissions', 0o666)
oct(os.stat('_test_file_permissions').st_mode & 0o777)

In [None]:
os.remove('_test_file_permissions')

### change file ownership: `chown`

In [None]:
!touch _test_file_ownership

In [None]:
os.stat('_test_file_ownership').st_uid

In [None]:
os.stat('_test_file_ownership').st_gid

In [None]:
os.getgroups()

In [None]:
os.chown('_test_file_ownership', os.getuid(), 400)

In [None]:
os.stat('_test_file_ownership').st_gid

In [None]:
os.remove('_test_file_ownership')

### working with directories

In [None]:
os.mkdir('tmp')

In [None]:
os.makedirs('tmp2/some/more/dirs')

use `os.path.join` to safely join subfolders:

In [None]:
long_path = os.path.join('tmp3/','even/more', 'dirs')
print(long_path)

In [None]:
os.makedirs(long_path)

remove a single (empty) folder

In [None]:
os.rmdir('tmp')

**Remove empty nested folders**: `os.removedirs` will delete all subfolders:

In [None]:
os.removedirs(long_path)

But: does it?

In [None]:
!touch tmp2/this_file_will_survive

In [None]:
os.removedirs('tmp2/some/more/dirs')

No. It **silently fails**, because we have a file somewhere...

In [None]:
os.listdir('tmp2')

**Conclusion: the standard library is not always the best solution, look for alternatives**

In our case, the `shutil` module does it right:

In [None]:
import shutil
shutil.rmtree('tmp2', ignore_errors=True)

### recursively walk a tree

In [None]:
os.makedirs('walk/down/the/tree')

In [None]:
!touch walk/walk01
!touch walk/walk02
!touch walk/down/down01
!touch walk/down/down02
!touch walk/down/the/tree/tree01
!touch walk/down/the/tree/tree02

In [None]:
for dp, dn, filenames in os.walk('walk'):
    for filename in filenames:
        print(os.path.join(dp, filename))

This is doable, but a bit cumbersome, since we have to join the directory path `dp` again with the `os.path.join` command.

**Alternative: use the `pathlib` module**

The `pathlib` module, which we are going to discuss next, offers a nice alternative to the `os.walk` method. The `rglob()` method takes a string `'*'` which is our filter:

In [None]:
import pathlib

for file in pathlib.Path("walk").rglob('*'):
    if file.is_file():
        print(file)

In [None]:
for file in pathlib.Path("walk").rglob('*01'):
    if file.is_file():
        print(file)

In [None]:
shutil.rmtree("walk")