# 💾 Introduction

Welcome to the project week course: **Introduction to programming**! For four days, you will work guided by an instructor on the automation of the accounting department in an imaginary company. On the last, fifth day, you (in pairs) will prepare your automation projects in the form of a 5-minute pitch.

⚠️⚠️⚠️ Beware! Be sure that you work on a copy of a notebook, not on the GitHub version.

---

This workbook is called a **Jupyter Notebook**. We run it in the [Google Colab](https://colab.research.google.com), a free, cloud-based notebook environment. Regardless of Google Colab, you can run a notebook locally on your machine: in Jupyter Lab or VS Code (with the Jupyter extension). There are also alternative cloud platforms like Kaggle Kernels.

Let's look at the basic notebook terminology:
- A notebook is a place to write programs, view their results, and write text.
- Each rectangle containing text or code in a notebook is called a _cell_.
- Text cells (like this one) are written in a simple format called [Markdown](https://colab.research.google.com/notebooks/markdown_guide.ipynb).
- Code cells contain code in the Python 3 language. Running a code cell will execute all of the code it contains.

Please note that we use a notebook, but all the code written here can be run as a standard Python script file. See more details [here](https://realpython.com/run-python-scripts/) (however, do it later, it's not required for this lab).

To open a notebook, you can upload it or open it from GitHub or Google Drive. The most user-friendly approach for saving a notebook is using Google Drive (so you should use one).

---

The memory state is preserved as you run different cells, i.e., all variables have their assigned values, libraries are imported and so on. We call it the REPL (read-eval-print loop) approach. You can re-run code cells as many times as you want. You can also restart a session, which will reset memory.

---

Let's now run a sample code in the block below. Click the play button on the left side of a code cell (or you can use the shortcut: `shift + enter`).

In [None]:
import platform
from datetime import date

print(f"Hello on the Project Week {date.today().year}!")
print("🐍 We're using the Python version:", platform.python_version())

---

The key premises for the lab:
- All assignments in the lab are distributed in this notebook.
- Whenever you write code, you'll make mistakes. When you run a code cell with errors, Python will produce error messages to tell you what you did wrong. Errors are okay. Even experienced programmers make many errors. When you make an error, you have to find the source of the problem, fix it, and move on.
- There are special cells with tests for verification of your code. All tests functions started with thye prefix `test_`. Please don't change the contents of the test cells or the prepared in advance functions and variables.
- If you're stuck on an assignment for a few minutes, try talking to a neighbour or a instructor.
- You can speed up jumping over the notebook by using the table of contents (on the left panel).
- You can check all files by clicking the files button (on the left panel).

# 🚩 Let's get started!

Either run the notebook for the first time or restart it. Run the code below to load the required libraries and prepare the necessary files. The initialisation code won't help you with assignments, so keep it hidden. You may break the workshop notebook if you accidentally change something!

In [None]:
%pip install markdown weasyprint pdfplumber > /dev/null 2>&1

# @title

import os
import shutil
import sys
import markdown
import random
import logging
import pdfplumber
import csv

from io import StringIO
from pathlib import Path
from weasyprint import HTML
from datetime import datetime, timedelta

for name in [
  'weasyprint', 'fontTools', 'pyphen', 'PIL', 'urllib3',
  'cairocffi', 'tinycss2', 'cssselect2'
]:
  logger = logging.getLogger(name)
  logger.setLevel(logging.CRITICAL + 1)
  logger.propagate = False

class Capturing(list):
    def __enter__(self):
        self._stdout = sys.stdout
        sys.stdout = self._stringio = StringIO()
        return self
    def __exit__(self, *args):
        self.extend(self._stringio.getvalue().splitlines())
        del self._stringio
        sys.stdout = self._stdout

def run_test(func_to_test, *args, expected_result=None, expected_output=None, custom_assertion_fn=None, custom_assertion_err=""):
  test_func = globals().get(func_to_test)
  if not test_func:
    print(f"\n❌ Test '{func_to_test}' not found.")
    return
  print(f"🔍 Running Test: {func_to_test} with the input: {args}.")
  print("-" * 50)
  try:
    result = test_func(*args)
    if expected_result is not None:
      assert result == expected_result, f"Expected {expected_result}, but got {result}"
    if expected_output is not None:
      with Capturing() as output:
        test_func(*args)
      assert output == expected_output, f"Expected output: {expected_output}, but got {output}"
    if custom_assertion_fn is not None:
      assert custom_assertion_fn(result), custom_assertion_err
    print("✅ Test passed!")
  except AssertionError as e:
    print("❌ Test failed with an assertion error.")
    print(f"⚠️ {e}.")
  except Exception as e:
    print("❌ Test failed with an unexpected error.")
    print(f"⚠️ {type(e).__name__}: {e}.")
  print("-" * 50)
  print("")

def test_multiply():
  run_test("multiply", 2, 3, expected_result=6)
  run_test("multiply", -4, 5, expected_result=-20)
  run_test("multiply", 0, 10, expected_result=0)

def test_full_name():
  run_test("full_name", "John", "Doe", expected_result="John Doe")
  run_test("full_name", "Ada", "Lovelace", expected_result="Ada Lovelace")
  run_test("full_name", "Alan", "Turing", expected_result="Alan Turing")

def test_square():
  run_test("square", 4, expected_result=16)
  run_test("square", -3, expected_result=9)
  run_test("square", 0, expected_result=0)

def test_count_even():
  run_test("count_even", [1, 2, 3, 4, 5, 6], expected_result=3)
  run_test("count_even", [11, 13, 15], expected_result=9)
  run_test("count_even", [], expected_result=0)

def test_first_last():
  run_test("first_last", [10, 20, 30, 40], expected_result=[10, 40])
  run_test("first_last", [7], expected_result=[7, 7])
  run_test("first_last", [], expected_result=[])

os.makedirs("./tests/list_files/", exist_ok=True)
os.makedirs("./tests/list_files/a", exist_ok=True)
Path('./tests/list_files/b').touch(exist_ok=True)
os.makedirs("./tests/list_files/c", exist_ok=True)
Path('./tests/list_files/d').touch(exist_ok=True)
Path('./tests/list_files/e').touch(exist_ok=True)

def test_list_files():
  run_test("list_files", "./tests/list_files/", expected_output=["b","d","e"])

os.makedirs("./tests/make_folder/", exist_ok=True)

def test_make_folder():
  if os.path.isdir("./tests/make_folder/a"):
    os.rmdir("./tests/make_folder/a")
  run_test("make_folder", "./tests/make_folder/a", custom_assertion_fn=lambda _: os.path.isdir("./tests/make_folder/a"), custom_assertion_err="Folder not created")

os.makedirs("./tests/move_file/", exist_ok=True)

def test_move_file():
  if os.path.exists("./tests/move_file/archive/notes.txt"):
    os.remove("./tests/move_file/archive/notes.txt")
  if os.path.isdir("./tests/move_file/archive/"):
    os.rmdir("./tests/move_file/archive/")
  Path('./tests/move_file/notes.txt').touch(exist_ok=True)
  run_test(
      "move_file",
      "./tests/move_file/notes.txt",
      "./tests/move_file/archive/",
      custom_assertion_fn=lambda _: os.path.isfile("./tests/move_file/archive/notes.txt"),
      custom_assertion_err="File not moved"
  )

def test_is_valid_email():
  run_test("is_valid_email", "user@example.com", expected_result=True)
  run_test("is_valid_email", "user.name@site.co.uk", expected_result=True)
  run_test("is_valid_email", "invalid-email@com", expected_result=False)

def test_extract_numbers():
  run_test("extract_numbers", "There are 3 apples and 24 oranges.", expected_result=['3', '24'])
  run_test("extract_numbers", "No numbers here!", expected_result=[])

def test_extract_date_parts():
  run_test("extract_date_parts", "2025-05-11", expected_result=(2025, 5, 11))

def test_normalize_spaces():
  run_test("normalize_spaces", "This   is   a  test.", expected_result="This is a test.")

def test_extract_phone():
  run_test("extract_phone", "Call me at 123-456-7890 or 987-654-3210.", expected_result=['123-456-7890', '987-654-3210'])

accountants = [
    "Martha",
    "Arlene",
    "Betty"
]

years = [
    "2024",
    "2025"
]

departments = [
    "Marketing",
    "Human Resources",
    "IT"
]

companies = {
    "TechVision Solutions Ltd.": {"currency": "PLN", "departments": ["Marketing", "IT"]},
    "GreenLeaf Enterprises": {"currency": "EUR", "departments": ["Human Resources"]},
    "Blue Ocean Industries": {"currency": "PLN", "departments": ["Marketing"]},
    "Stellar Dynamics Corp": {"currency": "EUR", "departments": ["IT"]},
    "Quantum Innovations Inc.": {"currency": "PLN", "departments": ["Human Resources", "Marketing"]},
    "Sunrise Global Trading": {"currency": "EUR", "departments": ["IT"]},
    "Mountain Peak Ventures": {"currency": "PLN", "departments": ["Marketing"]},
    "Digital Frontier Systems": {"currency": "EUR", "departments": ["IT", "Human Resources"]},
    "Silver Arrow Technologies": {"currency": "PLN", "departments": ["Marketing"]},
    "Golden Gate Manufacturing": {"currency": "EUR", "departments": ["Human Resources"]},
    "Crystal Clear Solutions": {"currency": "PLN", "departments": ["IT"]},
    "RedRock Analytics": {"currency": "EUR", "departments": ["Marketing", "IT"]},
    "Evergreen Logistics": {"currency": "PLN", "departments": ["Human Resources"]},
    "Phoenix Data Services": {"currency": "EUR", "departments": ["IT"]},
    "Coastal Wave Industries": {"currency": "PLN", "departments": ["Marketing"]}
}

services = {
    "Marketing": [
        "Digital Advertising Campaign",
        "Market Research Study",
        "Brand Identity Design",
        "Social Media Management",
        "Content Creation Services"
    ],
    "Human Resources": [
        "Employee Training Program",
        "Recruitment Services",
        "HR Software Subscription",
        "Team Building Event",
        "Workplace Safety Consultation"
    ],
    "IT": [
        "Software License Renewal",
        "Cloud Storage Subscription",
        "Network Security Audit",
        "Hardware Maintenance",
        "IT Support Services"
    ]
}

def create_workshop_folders():
  for accountant in accountants:
    os.makedirs(f"invoices/{accountant}", exist_ok=True)
  for year in years:
    os.makedirs(f"invoices/{year}", exist_ok=True)
    for department in departments:
      os.makedirs(f"invoices/{year}/{department}", exist_ok=True)
    os.makedirs(f"invoices/{year}/Manual Verification", exist_ok=True)
  print("[📁] Workshop folders have been created.")

def reset_workshop_folders():
    shutil.rmtree("invoices", ignore_errors=True)
    print("[💣] Workshop folders have been removed.")
    create_workshop_folders()

reset_workshop_folders()

random.seed(230347)

def generate_random_date(start_date=datetime(2024, 1, 1), end_date=datetime(2025, 12, 31)):
  time_between = end_date - start_date
  days_between = time_between.days
  random_days = random.randrange(days_between)
  return start_date + timedelta(days=random_days)

def format_date(date):
  formats = [
    "%d-%m-%Y",  # DD-MM-YYYY
    "%Y-%m-%d",  # YYYY-MM-DD
    "%d.%m.%Y"   # DD.MM.YYYY
  ]
  return date.strftime(random.choice(formats))

def generate_invoice_content(company, service, date, amount, currency):
  markdown_content = f"""
# INVOICE

**Date:** {date}
**Company:** {company}
**Service:** {service}
**Amount:** {amount:.2f} {currency}

Thank you for your business!
  """
  return markdown_content

def create_random_invoices():
  num_invoices = 100
  for _ in range(num_invoices):
    company = random.choice(list(companies.keys()))
    currency = companies[company]["currency"]
    department = random.choice(departments)
    service = random.choice(services[department])
    if random.random() < 0.07:
      chars = list(service)
      if len(service) >= 3:
        pos = random.randint(0, len(chars)-1)
        op = random.choice(['change', 'delete', 'insert'])
        if op == 'change':
            chars[pos] = random.choice('abcdefghijklmnopqrstuvwxyz')
        elif op == 'delete':
            chars.pop(pos)
        else:
            chars.insert(pos, random.choice('abcdefghijklmnopqrstuvwxyz'))
      service = ''.join(chars)
    amount = random.uniform(1000, 10000)
    date = generate_random_date()
    formatted_date = format_date(date)
    content = generate_invoice_content(company, service, formatted_date, amount, currency)
    html_content = markdown.markdown(content)
    num_accountants = random.choices([1, 2, 3], weights=[0.85, 0.10, 0.05])[0]
    selected_accountants = random.sample(accountants, num_accountants)
    for accountant in selected_accountants:
      filename = f"invoices/{accountant}/invoice_{company.replace(' ', '_')}_{formatted_date}.pdf"
      HTML(string=html_content).write_pdf(filename)
  print(f"[📄] Generated {num_invoices} random invoices (some may be shared between accountants).")

create_random_invoices()

random.seed(None)

'''
with pdfplumber.open("./invoices/Martha/invoice_Blue_Ocean_Industries_06.03.2025.pdf") as pdf:
  text = pdf.pages[0].extract_text()
  print(text)
'''

with open('./invoices/companies.csv', 'w', newline='') as csvfile:
  writer = csv.writer(csvfile)
  writer.writerow(['Company', 'Departments'])
  for company, info in companies.items():
    departments = ';'.join(info['departments'])
    writer.writerow([company, departments])
print(f"[📄] The companies CSV has been generated.")

print("[✅] The workshop notebook is initialised successfully.")

# 📓 Day 1: Clean Up Messy Folders

Today's lab contains:
- [ ] Python's code building blocks: functions and lists;
- [ ] the `os` module;
- [ ] regular expressions;
- [ ] the main goal is cleaning up the messy folder _invoices_.

## Python's code building blocks: **Functions**

- Functions are defined using the `def` keyword followed by the function name and parentheses `()` containing optional parameters.
- Use the function name followed by parentheses and required arguments to execute it. Example: `greet("Alice")`.
- Use the `return` keyword to return a value from a function.
- Functions can have default parameter values.
- Arguments can be passed by name to improve readability.
- Variables defined inside a function are local to that function by default.
- Functions can be assigned to variables, passed as arguments, and returned from other functions.

Example:
```python
def greet(who="Bob"): # The `who` argument has the default value.
  return "Hello, " + who # Returns a value.
greet(who="Alice") # Call the function with the named argument `who`.
print(who) # Can't have access to a local variable created in a function.
fn = greet # A function can be assigned to a variable.
```

---

Let's do some warm-up exercises about function defining.
1. Define a function `multiply(a, b)` that takes two numbers and returns their product.
2. Define a function `full_name(first, last)` that takes a first name and a last name and returns them as a full name string, separated by a space.
3. Define a function `square(n)` that takes a number and returns its square.

In [None]:
def multiply():
  # Write your code here.
  return None

def full_name():
  # Write your code here.
  return None

def square():
  # Write your code here.
  return None

In [None]:
test_multiply()
test_full_name()
test_square()

## Python's code building blocks: **Lists**

- Lists are ordered, and mutable collections of items defined using square brackets `[]`. Example: `fruits = ["apple", "banana", "cherry"]`.
- Access items using zero-based indexing. Example: `fruits[0]` returns "apple".
- Use slicing to get a sublist. Example: `fruits[1:3]` returns `["banana", "cherry"]`.
- Lists can be modified after creation. Example: `fruits[1] = "blueberry"`.
- Use `.append()` to add an item at the end. Example: `fruits.append("orange")`.
- Use `.insert(index, item)` to insert at a specific position. Example: `fruits.insert(1, "mango")`.
- `.remove(value)` removes the first occurrence.
- `.pop(index)` removes and returns the item at the given index.
- Use `len()` to get the number of items. Example: `len(fruits)`.
- Use `for` loops to iterate over list elements.
- _List Comprehension_ is a concise way to create lists. Example: `[x**2 for x in range(5)]` results in `[0, 1, 4, 9, 16]`.

Example:
```python
fruits = ["1" + s for s in ["apple", "banana", "cherry"]]
fruits.append("kiwi")
fruits.pop(1)
for fruit in fruits:
  print(fruit) # Prints: 1apple, 1cherry, kiwi

# Alternative way of iterating a list
for i in range(len(fruits)):
  print(fruits[i]) # Prints: 1apple, 1cherry, kiwi
```

More methods can be found [here](https://www.w3schools.com/python/python_ref_list.asp). **Please take a look now!**  
More about list slicing can be found [here](https://www.geeksforgeeks.org/python-list-slicing/). Check it if you need it!

---

Let's do some exercises to practice lists in Python.
1. Write a function `count_even(numbers)` that takes a list of integers and returns the count of even numbers.
2. Write a function `first_last(lst)` that returns a new list containing only the first and last elements of the input list.

In [None]:
def count_even(numbers: list) -> int: # I've use here type hints! It very helpful in error detection!
  # Write your code here.
  return -1

def first_last(lst: list) -> list:
  # Write your code here.
  return []

In [None]:
test_count_even()
test_first_last()

## The **os** module

- The `os` module provides a way to interact with the operating system (e.g., file system operations, environment variables).
- You must import it before using via `import os`.
- `os.listdir(path)` lists files and directories in the given path.
- `os.path.exists(path)` checks if a path exists.
- `os.path.isfile(path)`checks if it's a file.
- `os.path.isdir(path)` checks if it's a directory.
- `os.remove(path)` deletes a file.
- `os.mkdir(path)` creates a single directory.
- `os.makedirs(path)` creates intermediate directories as needed.
- `os.rename(src, dst)` moves or renames a file or directory Example: `os.rename("old.txt", "folder/new.txt")`.
- `os.path.join(part1, part2)` joins paths in a platform-independent way.
- `os.path.basename` returns a string value which represents the base name the specified path.

There are numerous methods in the `os` module. You can find all of them [here](https://www.w3schools.com/python/module_os.asp). Check it if you need it!  
More information about `os.path` can be found [here](https://docs.python.org/3/library/os.path.html).

---

Let's do some exercises to practice selected methods of the `os` module.
1. Write a function `list_files(path)` that prints the names of all files (not directories) in a given folder.
2. Write a function `make_folder(folder_name)` that checks if a directory exists, and if not, creates it.
3. Write a function `move_file(src, dst_folder)` that moves a file to a specified folder. If the folder does not exist, create it.

In [None]:
import os

def list_files(path: str) -> None:
  # Write your code here.
  print("Remove me!")

def make_folder(folder_name: str) -> None:
  # Write your code here.
  pass # The `pass` is used here and later as a placeholder; remove it when you do an exercise.

def move_file(src: str, dst_folder: str) -> None:
  # Write your code here.
  pass

In [None]:
test_list_files()
test_make_folder()
test_move_file()

## Regular expressions

- Regular expressions (regex) are sequences of characters that define search patterns, commonly used for string matching, validation, and extraction.
- Plain characters match themselves. Example: `cat` matches `"cat"`.
- Special characters like `. ^ $ * + ? { } [ ] \ | ( )` that have specific meanings in patterns.
- `[abc]` matches any one of `'a'`, `'b'`, or `'c'`.
- `[^abc]` matches any character except `'a'`, `'b'`, or `'c'`.
- Quantifiers:
  - `*` = 0 or more,
  - `+` = 1 or more,
  - `?` = 0 or 1
  - `{n}` = exactly n times,
  - `{n,}` = n or more times,
  - `{n,m}` = between n and m times.
- `.` matches any character except newline.
- `|` acts like a logical OR. Example: `cat|dog` matches `"cat"` or `"dog"`.
- `()` is used for extracting submatches and applying quantifiers. Example: `(ab)+` matches one or more occurrences of `"ab"`.

### Regular expressions in Python (aka the **re** module)

- Importing: `import re`.
- `re.search(pattern, string)` returns a match object for the first match, or `None`. The similar method is `re.match(pattern, string)`, which matches only at the beginning of the string. There is also `re.fullmatch(pattern, string)`, that matches the entire string against the pattern.
- `re.findall(pattern, string)` returns a list of all non-overlapping matches.
- `re.sub(pattern, replacement, string)` replaces all matches with the replacement.
- `re.split(pattern, string)` splits a string based on the pattern.
- `.group(group)` returns the matched string.
- `.groups()` returns all captured groups.
- Flags modifies regex behavior, e.g. `re.IGNORECASE` performs a case-insensitive search.

Example:
```python
m = re.search(r"(\d+)-(\d+)", "Date: 2024-05")
print(m.group(0))  # "2024-05"
print(m.group(1))  # "2024"
print(m.group(2))  # "05"

m = re.search(r"(\w+)-(\d+)", "Item-A12")
print(m.groups())  # ("Item", "A12")
```

More information about regex in Python can be found [here](https://www.w3schools.com/python/python_regex.asp). Use it, if you need it.

---

Let's do some exercises to practice regular expressions and the `re` module.
1. Write a function `is_valid_email(email)` that returns `True` if the email is in a valid format like `name@domain.com`, otherwise `False`. Hint: Use `re.fullmatch()` with a basic email pattern.
2. Write a function `extract_numbers(text)` that returns a list of all numbers found in a string. Hint: Use `re.findall()` with `\d+`.
3. Write a function `extract_date_parts(date_string)` that returns a tuple (year, month, day) from a string like "2025-05-11" using groups. A tuple in Python is an immutable, ordered collection of elements, defined using parentheses `()`.
4. Write a function `normalize_spaces(text)` that replaces all consecutive spaces or tabs with a single space.
5. Write a function `extract_phone(text)` that returns all phone numbers in the format `XXX-XXX-XXXX`.

In [None]:
import re

def is_valid_email(email: str) -> bool:
  # Write your code here.
  return False

def extract_numbers(text: str) -> list:
  # Write your code here.
  return []

def extract_date_parts(date_string: str) -> tuple:
  # Write your code here.
  return (0, 0, 0)

def normalize_spaces(text: str) -> str:
  # Write your code here.
  return ""

def extract_phone(text: str) -> list:
  # Write your code here.
  return []

In [None]:
test_is_valid_email()
test_extract_numbers()
test_extract_date_parts()
test_normalize_spaces()
test_extract_phone()

## Cleaning up the messy folders

A company wants to automate assigning invoices to a business unit, i.e. a unit within the company: `Marketing`, `Human Resources`, and `IT`. Invoices are received by the accounting department, which decides which unit will be charged based on the information inside the invoices.

Currently, three accountants do the assigning process: `Arlene`, `Betty` and `Martha`.

The project's ultimate goal is a computer program that can partly replace the human-based process.

A messy folder structure:
- Each accountant has a folder with invoices, which she works with.
- There are two years: 2024 and 2025.
- Beware of invoice duplicates.
- The name of a PDF file (an invoice) contains a date. There are three possible forms of dates:
  - `DD-MM-YYYY`, e.g. `invoice_Blue_Ocean_Industries_15-01-2025.pdf`;
  - `YYYY-MM-DD`, e.g. `invoice_Coastal_Wave_Industries_2025-07-20.pdf`;
  - `DD.MM.YYYY`, e.g. `invoice_Evergreen_Logistics_21.01.2024.pdf`;
- **Goal:** Organise the folder structure. The invoices must be moved to folders with the corresponding year. Remove duplicates. Show the instructor how many invoices are placed in the target folders (i.e., 2024 and 2025), which completes Day 1.

In [None]:
# Write your code here.

# 📓 Day 2: Unclutter Invoices

Today's lab contains:
- [ ] Python's code building blocks: dictionaries and string manipulation;
- [ ] loading data from CSV;
- [ ] the _pdfplumber_ library;
- [ ] the main goal is assigning invoices to the units' folders.

## Python's code building blocks: **Dictionaries**

- A dictionary is an unordered, mutable collection of key-value pairs.
- Defined using curly braces `{}` with key-value pairs separated by colons.
- Use square brackets or the `.get()` method to access values.
- To add or update values, assign a value to a key.
- _Keys Must Be Immutable_, i.e., keys can be strings, numbers, or tuples --- but not lists or other dictionaries.
- To check existence, use the `in` keyword.
- There are multiple dictionary methods, likes `.keys()`, `.values()`, `.items()`, `.pop()`, `.update()`. Read more [here](https://www.w3schools.com/python/python_dictionaries_methods.asp).
- Dictionaries can contain other dictionaries.

Example:
```python
my_dict = {"name": "Alice", "age": 30}
my_dict["name"]     # "Alice"  
my_dict.get("age")  # 30
my_dict["city"] = "Paris"
del my_dict["age"]  # or use .pop("age")

for key, value in my_dict.items():
  print(key, value) # prints: name Alice, city Paris

"name" in my_dict  # True
```

---

Let's do some exercises to practice dictionaries.
1. Write a function `invert_dict(d)` that swaps keys and values in a dictionary.
Assume all values are unique and hashable.
2. Write a function `merge_dicts(d1, d2)` that returns a new dictionary combining both.
If a key exists in both, the value from `d2` should override.

In [None]:
def invert_dict(d: dict) -> dict:
  # Write your code here.
  return {}

def merge_dicts(d1: dict, d2: dict) -> dict:
  # Write your code here.
  return {}

## Get the invoices right

- Load the companies list from the CSV file `./invoices/companies.csv`. Each company sold services to selected units.
- There are three units: `Marketing`, `Human Resources` and `IT`.
- If a company sells to a single unit only, then move all its invoices to a related folder.
- A company name can be taken from a PDF file, i.e. the company line or the file name.
- If a company sells to more than one unit, find a keyword in invoice items (below the Service header).
- Keywords for units:
 - Marketing: `Digital Advertising Campaign`, `Market Research Study`, `Brand Identity Design`, `Social Media Management`, `Content Creation Services`;
 - Human Resources: `Employee Training Program`, `Recruitment Services`, `HR Software Subscription`, `Team Building Event`, `Workplace Safety Consultation`;
 - IT: `Software License Renewal`, `Cloud Storage Subscription`, `Network Security Audit`, `Hardware Maintenance`, `IT Support Services`.
- Assign an invoice to the corresponding unit folder based on the found keyword.
- **Goal:** Within a folder, i.e. 2024 or 2025, move all invoices according to their identified units. If you cannot determine a unit, move it to the `Manual Verification` folder.

In [None]:
# Write your code here.

# 📓 Day 3: Improve Cleaning Algorithm

Today's lab contains:
- [ ] Python's code building blocks: HTTP requests, JSON parsing;
- [ ] NBP API;
- [ ] the Levenshtein distance;
- [ ] the main goal is improving the algorithm to meet new business logic.

## Improve the cleaning algorithm

- New rule: all invoices over 10000 PLN must be moved to the `Manual Verification` folder.
- Note that some invoices are in EUR, so we must convert the currency.
- Use the NBP API to convert prices to PLN (https://api.nbp.pl/api/exchangerates/rates/A/EUR/2025-01-02?format=json).
- Some service names in the invoices have small misspelling errors.
- New rule: use the Levenshtein distance to match keywords. The maximum allowed distance is 2.
- **Goal:** Improvement of the cleaning procedure to meet all new requirements.

In [None]:
# Write your code here.

# 📓 Day 4: Make A Report For The Boss

Today's lab contains:
- [ ] HTML code building blocks;
- [ ] the _weasyprint_ library;
- [ ] data processing;
- [ ] the main goal is generating, for each year, a report in the form of PDF.

## Generating reports

- For each unit:
  - show the total number of invoices (including those moved to the `Manual Verification` folder);
  - show the total sum of invoices in PLN;
- Print the table with the invoices that require manual verification. Sort it by amount.
- A report should be created in HTML and then generated as a PDF.
- **Goal:** For each year, generate a report (PDF) summarising the invoices' cleaning.

Hint: You can extend the code from previous days.

In [None]:
# Write your code here.

# 📓 Day 5: Do The Final Project

In pairs, make a 5-min project pitch 💬. Based on learnt process, please prepare a proposal for your automation. It can be related to learning, hobby, or anything that is important to you.
