# Standard Library Modules

## Directory traversal

Write a function `count_user_dirs` that returns the number of user directories
(`/root` and `/home/*`) on a Linux system.

You can then call it before and after creating a new user to check that the
count changes.

In [None]:
# Your code here

### Solution

In [None]:
import pathlib


def count_user_dirs() -> int:
    home = pathlib.Path("/home")
    return sum(1 for item in home.iterdir() if item.is_dir()) + 1


print(f"Found {count_user_dirs()} user dir(s)")
!adduser --disabled-password --gecos "" new_user
print(f"Found {count_user_dirs()} user dir(s)")
!userdel -r new_user

## File existence

Write a function `check_bashrc` that checks whether a `.bashrc` file exists
for a given user.

The function takes an optional `user` argument. If no user is given, it
should check the configuration for `root`, i.e. the file `/root/.bashrc`.
Otherwise it should check `/home/<user>/.bashrc`.

The function should return a boolean.

In [None]:
# Your code here

### Solution

In [None]:
import pathlib


def check_bashrc(user: str | None = None) -> bool:
    if user is None:
        return pathlib.Path("/root/.bashrc").exists()
    else:
        return pathlib.Path(f"/home/{user}/.bashrc").exists()


check_bashrc()

## Recursive traversal of subdirectories

Count the number of files (excluding directories) contained in
the `/etc` directory and all of its subdirectories.

In [None]:
# Your code here

### Solution

In [None]:
import pathlib


etc_dir = pathlib.Path("/etc")

total = 0
for path in etc_dir.rglob("*"):
    if path.is_file():
        total += 1
total

In [None]:
import pathlib


sum(1 for f in pathlib.Path("/etc").rglob("*") if f.is_file())

## Permission management

Create a file `/root/.client-secret.txt` that contains a randomly generated
secret string.

Set the permissions of this file so that **only** the current user is allowed
to read it (no write or execute permissions for anyone).

You can then check its permissions and contents from the shell.

In [None]:
# Your code here

### Solution

In [None]:
import os
import pathlib
import string
import random


secret_value = "".join(
    random.choices(
        string.ascii_letters + string.digits + string.punctuation,
        k=20,
    )
)
secret = pathlib.Path("/root/.client-secret.txt")
with open(secret, mode="w", opener=lambda p, f: os.open(p, f, 0o200)) as fh:
    fh.write(secret_value)
secret.chmod(0o400)

In [None]:
!ls -al /root
!cat /root/.client-secret.txt

## File archiving

In this exercise, the goal is to create an archive containing all of the
configuration files (`*.conf`) found under `/etc`.

The archive must preserve the directory structure relative to `/etc`. For
example, if `/etc/apt/apt.conf` exists, it should appear in the archive at
`apt/apt.conf` (relative to the root of the archive).

Proceed in stages:

- List all `.conf` files in the `/etc` directory.
- Create a temporary directory.
- Copy all `.conf` files (creating the directories that contain them) into
  this temporary directory, preserving relative paths.
- Create a compressed archive (for example, in `bztar` format) from the
  temporary directory.

In [None]:
# Your code here

### Solution

In [None]:
import pathlib
import shutil
import tempfile


etc = pathlib.Path("/etc")

confs = etc.rglob("*.conf")
with tempfile.TemporaryDirectory() as tempdir:
    tempdir_path = pathlib.Path(tempdir)
    for conf in confs:
        temp_conf = tempdir_path / conf.relative_to(etc)
        temp_conf.parent.mkdir(parents=True, exist_ok=True)
        shutil.copy2(conf, temp_conf)
    shutil.make_archive("confs", "bztar", tempdir)

## Efficient path retrieval

We will work on a corpus of Usenet newsgroup messages. Let's download it:

In [None]:
!git clone https://github.com/nzmonzmp/20Newsgroups.git
!tar xzf 20Newsgroups/20news-bydate.tar.gz
!rm -rf 20Newsgroups

- Create a `Path` object from the [`pathlib`](https://docs.python.org/3/library/pathlib.html)
  module that represents the `20news-bydate-test` directory.
- Use the [`glob`](https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob)
  method to retrieve all the files in the corpus. How many files are there?
- With the [`stat`](https://docs.python.org/3/library/pathlib.html#pathlib.Path.stat)
  method, compute the total size of all files and display it in megabytes (MB).
  You can compare your result with the output of the following command:

  ```bash
  du -sbh 20news-bydate-test/
  ```

In [None]:
!du -sbh 20news-bydate-test/

In [None]:
# Your code here

### Solution

In [None]:
import pathlib


data_path = pathlib.Path("20news-bydate-test")
files = list(data_path.glob("*/*"))
print(f"Number of files: {len(files)}")

total_size = 0
for file_path in files:
    total_size += file_path.stat().st_size

# Or with sum
total_size = sum(file_path.stat().st_size for file_path in files)
print(f"Total file size: {int(round(total_size / 1e6))} MB")

In [None]:
!du -h 20news-bydate-test/misc.forsale/76679

## Working with file dates

Let's start by retrieving some data: log files from the `dpkg` package manager.

In [None]:
!git clone https://github.com/shuuchuu/tp-logs.git
!tar xf tp-logs/logs.tar.xz
!rm -rf tp-logs

- Use `ls` to quickly inspect the `logs` directory. You should see several
  log files, some compressed (`.gz`) and some plain text, with different
  modification times (as expected with log rotation).

- Create a new directory `dated-logs` where you will copy the content of
  `logs`, but with standardized file names. Each file in `dated-logs`
  must be named using the **modification date** of the corresponding file
  in `logs`, in the following format:

  ```text
  YYYY-mm-dd_HH-MM-SS_dpkg.log.gz
  ```

  In other words, for each file in `logs`:

  * Read the modification time using `.stat().st_mtime`.
  * Convert the timestamp into a timezone-aware `datetime` object.
  * Build the destination path in `dated-logs` using `strftime` to format
    the date (see the `strftime` format codes in the
    [`datetime`](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes)
    documentation).
  * If the original file is already compressed (`.gz`), simply copy it.
  * Otherwise, compress it as gzip before saving it to `dated-logs`.

After your script runs, all files in `logs` should be represented in
`dated-logs` with normalized, date-based names.

In [None]:
# Your code here

### Solution

In [None]:
ls -al logs

In [None]:
import datetime
import gzip
import pathlib
import shutil


def compress(path: pathlib.Path, dest: pathlib.Path) -> None:
    with path.open(mode="rb") as fh_in:
        with gzip.open(dest, mode="wb") as fh_out:
            shutil.copyfileobj(fh_in, fh_out)


logs_dir = pathlib.Path("logs")

dated_logs_dir = pathlib.Path("dated-logs")
dated_logs_dir.mkdir(exist_ok=True)

for item in logs_dir.iterdir():
    mtime = item.stat().st_mtime
    date = datetime.datetime.fromtimestamp(mtime, datetime.timezone.utc)

    dest = dated_logs_dir / f"{date.strftime('%Y-%m-%d_%H-%M-%S_dpkg.log.gz')}"

    if item.suffix == ".gz":
        shutil.copy2(item, dest)
    else:
        compress(item, dest)