# Week 3 - September 6

# Scripts, modules, packages, libraries
No Jupyter notebook for this class. All demonstrations were done in Spyder (see video).

## Recap:

- A **script** is a `.py` file with a sequence of instructions that are executed each time the script is called.
- A **module** is a collection of related code (variables, functions, classes) saved in a `.py` file that can be imported and re-used.
- A **package** is a directory of a collection of related modules. Each package (or subpackage) must have an `__init.py__` file.
- A **library** is a collection  of packages, but is often also used interchangably with package, or as an umbrella term for reusable code.

You can `import` from the Python Standard Library, installed third-party libraries, or from your own user-defined modules/pakages. You can also publish your packages to online repositories so others can use them.

***Remember:***
- Sets of instructions that are called several times should be written inside for better code reusability.
- Functions (or other bits of code) that are called from several scripts should be written inside a module, so that only the module is imported in the different scripts (do not copy-and-paste your functions in the different scripts!).

# File I/O, Directory access, and RegEX

## Opening files

Modes:
- Read-only: `r` (default)
- Write-only: `w` (create a new file or overwrite existing file)
- Append a file: `a`
- Read and write: `r+`
- Binary mode: `b`

In [1]:
f = open("lorem_ipsum.txt", "r")

In [2]:
text = f.read()
f.close()

text

'Lorem ipsum dolor sit amet,\nconsectetur adipiscing elit,\nsed do eiusmod tempor incididunt ut labore et dolore magna aliqua.'

***Important:*** Files must be closed!

**Iterating over a file**

In [3]:
f = open("lorem_ipsum.txt")
for line in f:
    print(line)
f.close()

Lorem ipsum dolor sit amet,

consectetur adipiscing elit,

sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.


**Preferable: using a context manager**

The file is automatically closed after exiting out of the context manager.

In [1]:
with open("lorem_ipsum.txt", "r") as f:
    for line in f:
        print(line)
        
f

Lorem ipsum dolor sit amet,

consectetur adipiscing elit,

sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.


<_io.TextIOWrapper name='lorem_ipsum.txt' mode='r' encoding='cp1252'>

To obtain each line in a list:

In [2]:
with open("lorem_ipsum.txt", "r") as f:
    lines = f.readlines()
lines

['Lorem ipsum dolor sit amet,\n',
 'consectetur adipiscing elit,\n',
 'sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.']

Without the newline (\n) character:

In [3]:
with open("lorem_ipsum.txt", "r") as f:
    lines = f.read().splitlines()
lines

['Lorem ipsum dolor sit amet,',
 'consectetur adipiscing elit,',
 'sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.']

Or more precisely:

In [5]:
with open("lorem_ipsum.txt", "r") as f:
    lines = f.read().split("\n")
lines

['Lorem ipsum dolor sit amet,',
 'consectetur adipiscing elit,',
 'sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.']

**Writing to file:**

In [6]:
with open("hello.txt", "w") as f:
    f.write("Hello world!")

### CSV files

**Reading CSV files**

In [7]:
import csv

teams = []

with open("teams.csv", "r") as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        teams.append({"Name": row["name"], "City": row["city"], "Stadium": row["stadium"]})

teams

[{'Name': 'Arsenal', 'City': 'London', 'Stadium': 'Emirates Stadium'},
 {'Name': 'Manchester City',
  'City': 'Manchester',
  'Stadium': 'City of Manchester Stadium'},
 {'Name': 'Tottenham Hotspur', 'City': 'London', 'Stadium': 'Hotspur Stadium'},
 {'Name': 'Brighton and Hove Albion',
  'City': 'Brighton',
  'Stadium': 'Falmer Stadium'},
 {'Name': 'Chelsea', 'City': 'London', 'Stadium': 'Stamford Bridge'}]

In [8]:
for team in sorted(teams, key=lambda team: team["Name"]):
    print(f"{team['Name']} play at the {team['Stadium']} in {team['City']}")

Arsenal play at the Emirates Stadium in London
Brighton and Hove Albion play at the Falmer Stadium in Brighton
Chelsea play at the Stamford Bridge in London
Manchester City play at the City of Manchester Stadium in Manchester
Tottenham Hotspur play at the Hotspur Stadium in London


**Writing CSV files**

In [17]:
first_name = "Salil"
last_name = "Bavdekar"
course = "Python for Engineers"

with open("instructors.csv", "w") as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=["first name", "last name", "course"])
    writer.writerow({"first name": first_name, "last name": last_name, "course": course})

## Directory access with `pathlib`

In [18]:
from pathlib import Path

Path.cwd()

WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6')

In [19]:
data_dir  = Path.cwd() / "data"
data_dir

WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data')

In [20]:
config_file = data_dir / "date.txt"
config_file

WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/date.txt')

In [21]:
with open(config_file) as f:
    print(f.read())

Date = 09/06/2022
Day = Tuesday


Path objects also have `read_text()` and `write_text()` methods

In [22]:
config_file.read_text()

'Date = 09/06/2022\nDay = Tuesday'

**Various parameters can be extracted from the Path object**

In [23]:
config_file.name

'date.txt'

In [24]:
config_file.stem

'date'

In [25]:
config_file.suffix

'.txt'

In [26]:
config_file.parent

WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data')

In [29]:
parents = config_file.parents
list(parents)

[WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida'),
 WindowsPath('C:/Users/salil.bavdekar'),
 WindowsPath('C:/Users'),
 WindowsPath('C:/')]

In [30]:
config_file.is_dir()

False

In [31]:
data_dir.is_dir()

True

**Path objects can be used to iterate over files**

In [32]:
list(data_dir.iterdir())

[WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/.ipynb_checkpoints'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/date.txt'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/extra'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/name.txt'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_2')]

Using some list comprehension, we can obtain a list of subdirectories

In [33]:
[d for d in data_dir.iterdir() if d.is_dir()]

[WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/.ipynb_checkpoints'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/extra'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_2')]

**Pattern matching with `glob`**

`glob` is a Python package that finds all the pathnames matching a specified pattern according to the rules used by the Unix shell.

`Path` objects also have a `glob` method that does the same thing.

`**` matches everything

In [34]:
list(data_dir.glob("**"))

[WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/.ipynb_checkpoints'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/extra'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1/sub_sub_1'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1/sub_sub_2'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University o

`*` matches everything except the directory separator

In [35]:
list(data_dir.glob("*"))

[WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/.ipynb_checkpoints'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/date.txt'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/extra'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/name.txt'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_2')]

In [39]:
list(data_dir.glob("*.ipynb"))

[WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/Untitled.ipynb')]

In [40]:
data_subdirs = list(data_dir.glob("sub*"))
data_subdirs

[WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_2')]

In [41]:
list(data_subdirs[0].glob("f1_00*.txt"))

[WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1/f1_001.txt'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1/f1_0010.txt'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1/f1_0011.txt'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1/f1_0012.txt'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1/f1_0013.txt'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1/f1_0014.txt'),
 WindowsPath('C:/

`?` matches any single character 

In [42]:
list(data_subdirs[0].glob("f1_00?.txt"))

[WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1/f1_001.txt'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1/f1_002.txt'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1/f1_003.txt'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1/f1_004.txt'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1/f1_005.txt'),
 WindowsPath('C:/Users/salil.bavdekar/OneDrive - University of Florida/Courses/TA Courses/Python for Engineers/Class notebooks/W3 - Sep 6/data/sub_1/f1_006.txt'),
 WindowsPath('C:/Users

**Examples using the `glob` package**

In [43]:
from glob import glob

glob("data\*")

['data\\date.txt',
 'data\\extra',
 'data\\name.txt',
 'data\\sub_1',
 'data\\sub_2',
 'data\\Untitled.ipynb']

In [44]:
glob("data\sub_1\*")

['data\\sub_1\\f1_001.txt',
 'data\\sub_1\\f1_0010.txt',
 'data\\sub_1\\f1_0011.txt',
 'data\\sub_1\\f1_0012.txt',
 'data\\sub_1\\f1_0013.txt',
 'data\\sub_1\\f1_0014.txt',
 'data\\sub_1\\f1_002.txt',
 'data\\sub_1\\f1_003.txt',
 'data\\sub_1\\f1_004.txt',
 'data\\sub_1\\f1_005.txt',
 'data\\sub_1\\f1_006.txt',
 'data\\sub_1\\f1_007.txt',
 'data\\sub_1\\f1_008.txt',
 'data\\sub_1\\f1_009.txt',
 'data\\sub_1\\New Text Document.txt',
 'data\\sub_1\\Random.txt',
 'data\\sub_1\\sub_sub_1',
 'data\\sub_1\\sub_sub_2']

## Other directory operations

In [45]:
new_dir = data_dir / "temp"
new_dir.mkdir()

In [46]:
new_dir = data_dir / "temp" / "temp1" / "temp2" / "temp3"
new_dir.mkdir()

FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Users\\salil.bavdekar\\OneDrive - University of Florida\\Courses\\TA Courses\\Python for Engineers\\Class notebooks\\W3 - Sep 6\\data\\temp\\temp1\\temp2\\temp3'

In [47]:
new_dir = data_dir / "temp" / "temp1" / "temp2" / "temp3"
new_dir.mkdir(parents=True)

In [48]:
new_dir = data_dir / "temp"
new_dir.mkdir()

FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\salil.bavdekar\\OneDrive - University of Florida\\Courses\\TA Courses\\Python for Engineers\\Class notebooks\\W3 - Sep 6\\data\\temp'

In [49]:
new_dir = data_dir / "temp"
new_dir.mkdir(exist_ok=True)

**`shutil`**

In [52]:
import shutil

shutil.move(data_dir / "date.txt", data_dir / "temp")
shutil.rmtree(data_dir / "extra")

## Example

In [None]:
for idx, subdir in enumerate(data_subdirs, start=1):
    for i in range(1,15):
        with open(subdir / f"f{idx}_00{i}.txt", "w") as f:
            f.write(f"This is file number {i} in subdirectory {idx}")