# Interacting with the Filesystem

In this notebook we will focus on how to navigate and interact with the filesystem using libraries such as `os` and `pathlib`.

## 00. Getting Setup

In [10]:
from pathlib import Path
from typing import *

---

## 01. Introduction

A filesystem is more or less a data structure that an operating system uses to store and handle data.

In [11]:
# what is a path

In [12]:
# relative path

In [13]:
# absolute path

In [14]:
# directory

In [15]:
# what is a working directory

In [16]:
# file

---

## 02. Pathlib

There are several Python libraries to simplify interacting with the filesystem, in this workshop we will use `pathlib`.

In [17]:
# we can use pathlib to more easily handle navigating through the filesystem
p = Path("..")

In [18]:
# we can use it to check if the path exists, is a file, or is a directory

In [19]:
# we can split it up into different types of paths

In [20]:
# we can access parent directories

In [21]:
# we can construct child directories

### Activity

Create a `pathlib.Path` that references the file `wave.csv` in `<root>/data/files/` and validate the file exists.

In [22]:
filepath = ...

---

## 03. Programatically Trawling

A common component of data wrangling and process is retrieving and processing files, manually defining each file quickly becomes burdensome, thankfully we can trawl through the filesystem programatically using `pathlib` and regular expression matching.

Regular expression matching is the process of finding strings that match a specified pattern using a regular expression. 

https://docs.python.org/3/howto/regex.html

In [23]:
# lets manually access all the files in `<root>/data/files`

In [24]:
# we can use regular expression matching to access files automatically

A `Generator` is an `Iterable` object which `yield`s an item each time it is iterated over.

In [32]:
# you can also create your own generators
class CustomGenerator:
    def __init__(self, items: Iterable) -> None:
        super(CustomGenerator, self).__init__()
        self.items = items

    def __iter__(self) -> Any:
        items = self.items
        for item in items:
            yield item

In [34]:
# when we use `.glob` we return a `Generator` that we can iterate over

In [35]:
# lets iterate over the files in notebooks

We can use different regular expression patterns to retrieve different items.

In [37]:
# get all items in a directory

In [38]:
# get all items in a directory with a part of a name

In [39]:
# recursively access items in a directory

In [None]:
# lots of different options

### Activity

Create a `pathlib.Path` for the `data` directory in this project and use `.glob(...)` to match all of the files in the `files` subdirectory.

Can you build on this and index over the 2 samples in the `datasets` subdirectory.