**DISCLAIMER:**

This code was developed just for making it easier for me to migrate the slides to a new system. It has neither been extensively tested, nor is it guaranteed to work in all cases. Use at your own risk. 

I'm not deleting this because I think it may be useful, but if you don't think so, feel free to delete it.

In case you have questions please feel free to [reach out](mailto:tarkhanyan02@gmail.com)

This file includes:
1. ChatGPT prompt that helps fix hanging words issue
2. Script to create a bibtex file from `\citebutton` occurrences
3. Script to update legacy Beamer .tex files
4. Script to rename pdfs (internal, perhaps won't be needed later)
5. Script to put new and old slides next to each other on the same slide
6. Script to merge all slides in a folder into one file (while preserving the bookmarks)

Also note that the script was located in the root of the repo, now if you decided to use it, you should either move it or change the default paths.

# ChatGPT prompt that helps fix hanging words issue 

Objective: Improve text layout by preventing isolated words from appearing alone on a new line (e.g., due to line wrapping). To achieve this:

Try to shorten the sentence while preserving the original meaning.

Use common abbreviations or reformulate phrases if that helps reduce length.

If the sentence cannot be shortened meaningfully, suggest a version that is slightly longer, so more words are pushed to the next line—minimizing the isolation issue.

<INSTRUCTION>

1. Always preserve the semantic meaning and tone.
2. If abbreviations are used, ensure they are commonly understood or intuitive in context.
3. Output both the shortened version (if successful) and, if not possible, a longer alternative that balances the layout better.
4. Always suggest at least 2 variants.
5. Don't explain yourself

# Auto migrate whatever possible
Before running the code, ensure that you have the necessary libraries installed. You can install them using pip:

```bash
pip install ipykernel pathlib
```

**Internal note**:
ChatGPT thread where the below 2 functions were created:
https://chatgpt.com/share/e/68752101-7ddc-8012-911b-5dfa825e7b79

## Create bibtex
Everything that was used in \\citebutton should now be referenced with \\sourceref or \\furtherreading and those are sourcing from the bibtex file. The script below gather all the references for a given chunk and saves them to a `references.bib` file.


In [1]:
#!/usr/bin/env python3
"""
– build *fresh* `references.bib` from custom
\citebutton occurrences in a folder’s .tex sources.

Key behaviour
-------------
* **Non‑recursive** scan of `.tex` files whose basename begins with an optional
    prefix.  The prefix is removed in the comment banner that precedes each
    group of references.
* Creates (or overwrites) `auto/references.bib` in the same folder.
* Deduplicates keys **within the current run** so the output never contains
    duplicates, even if different files cite the same link.
* Each BibTeX entry now contains the fields you requested:

        % \citebutton{<...original...>}
        @article{KEY,
                author = {<guessed author>},
                title  = {<first argument of \citebutton>},
                year   = {YYYY},
                url    = {<second argument>}
        }

    Author is heuristically extracted from the label text (the first capitalized
    token preceding a 4‑digit year, or the first capitalized word if no year is
    present).  You can refine `_guess_author()` if needed.
* The script logs **which file it is scanning** and lists all reference keys
    extracted from that file.  Logs go to both console and
    `auto/references_build.log`.
"""
from __future__ import annotations

import logging
import os
import pathlib
import re
from typing import List, Set

# --------------------------------------------------------------------------- #
# 1. Regex helpers                                                            #
# --------------------------------------------------------------------------- #
_CITE_RE = re.compile(r"\\citebutton\{([^}]*)\}\{([^}]*)\}")
_YEAR_RE = re.compile(r"(19|20)\d{2}")
_CAP_WORD_RE = re.compile(r"[A-Z][A-Za-z]+")

# --------------------------------------------------------------------------- #
# 2. Helper functions                                                         #
# --------------------------------------------------------------------------- #

def _slug(text: str) -> str:
    """Convert arbitrary text into a safe ASCII slug (for BibTeX key)."""
    return re.sub(r"[^A-Za-z0-9]+", "_", text).strip("_") or "ref"


def _guess_author(label: str) -> str:
    """Try to infer an author surname from the label text."""
    # First, look for a capitalized word immediately *before* a year.
    for m in _YEAR_RE.finditer(label):
            pre = label[: m.start()].strip()
            if pre:
                    words = _CAP_WORD_RE.findall(pre)
                    if words:
                            return words[-1]
    # Fall back: first capitalized word anywhere.
    m = _CAP_WORD_RE.search(label)
    return m.group(0) if m else _slug(label)[:8]


def _make_key(author: str, year: str) -> str:
    """Return AUTHOR_YEAR if year present, else AUTHOR_XXXX."""
    if year:
        return f"{author}_{year}"
    return author

# --------------------------------------------------------------------------- #
# 3. Main runner                                                              #
# --------------------------------------------------------------------------- #

def create_bibtex(folder: str | os.PathLike[str], prefix: str = "slides", 
                  save_in_new_dir: bool = False) -> None:    
    """Extract citebutton references and write `references.bib`."""
    folder = pathlib.Path(folder).expanduser().resolve()
    if save_in_new_dir:
        auto_dir = folder / "auto"
    else:
        auto_dir = folder
    auto_dir.mkdir(exist_ok=True)

    bib_path = auto_dir / "references.bib"
    log_path = auto_dir / "references_build.log"

    logging.basicConfig(
            level=logging.INFO,
            format="%(levelname)s | %(message)s",
            handlers=[
                    logging.StreamHandler(),
                #     logging.FileHandler(log_path, encoding="utf-8", mode="w"),
            ],
    )
    logger = logging.getLogger("citebutton2bib")

    added_keys: Set[str] = set()
    output_lines: List[str] = []

    for tex_path in folder.glob(f"{prefix}*.tex"):
            if tex_path.is_dir():
                    continue
            logger.info("Scanning %s", tex_path.name)
            content = tex_path.read_text(encoding="utf-8")
            matches = list(_CITE_RE.finditer(content))
            if not matches:
                    logger.info("No citebuttons found in %s", tex_path.name)
                    continue

            # Strip prefix from filename for the comment banner
            display_name = (
                    tex_path.name[len(prefix):]
                    if prefix and tex_path.name.startswith(prefix)
                    else tex_path.name
            )

            file_block: List[str] = [f"% ================= {display_name} ================="]
            keys_in_file: List[str] = []

            for m in matches:
                    label, url = m.group(1).strip(), m.group(2).strip()
                    year_match = _YEAR_RE.search(label) or _YEAR_RE.search(url)
                    year_val = year_match.group(0) if year_match else ""
                    author_val = _guess_author(label)
                    key = _make_key(author_val, year_val)

                    if key in added_keys:
                            logger.info("SKIP duplicate key %s", key)
                            continue

                    original_cite = m.group(0)
                    entry = [
                            f"% {original_cite}",
                            f"@article{{{key},",
                            f"    author = {{{author_val}}},",
                            f"    title  = {{{label}}},",
                            f"    year   = {{{year_val}}},",
                            f"    url    = {{{url}}}",
                            "}",
                            "",  # blank line between entries
                    ]
                    file_block.extend(entry)
                    added_keys.add(key)
                    keys_in_file.append(key)

            if keys_in_file:
                    output_lines.extend(file_block)
                    logger.info(
                            "Added %d reference(s) from %s – keys: %s",
                            len(keys_in_file),
                            tex_path.name,
                            ", ".join(keys_in_file),
                    )

    if not output_lines:
            logger.info("No citebutton occurrences found in any file. Nothing to write.")
            return

    bib_path.write_text("\n".join(output_lines), encoding="utf-8")
    logger.info("Wrote %d unique reference(s) → %s", len(added_keys), bib_path.name)
    logger.info("Done. Full log at %s", log_path)


## Update .tex-s
Automatically: 

1. changes `\documentclass` line to incorporate new aspect ration and reduce the font size
2. includes the preamble and `basic-math.tex`, `basic-ml.tex` files  
3. changes `\citebutton` to `\furtherreading`
4. makes title frame work with `\titlemeta`

Some notes:
- Moves the old files to a folder called `before_migration` and replaces the original files **in-place**.
- Has the option to include the old code as comments in the new file (but there was some issue when `\citebutton` was already commented out, we would get a new not commented line with `\furtherreading`)
- You can specify a prefix for the files to be migrated, default is `slides`
- You can additionally specify what should the main title be (since we now have title-subtitle in the first slide)

In [2]:
r"""
– update legacy Beamer .tex files.

For each top‑level .tex file whose basename starts with an optional *prefix*,
this script performs the following migrations in order:

1. **\documentclass** – convert the 11 pt / aspectratio=169 variant to the new
   10 pt style (commenting the old line).
2. **\usepackage → \input** – replace the legacy lmu‑lecture package include.
3. **Title block → \titlemeta{…}** – collapse the multi‑command header into the
   new macro.
4. **\citebutton{LABEL}{URL} → \furtherreading{KEY}** – replace every custom
   cite button with a reference to a BibTeX entry produced by the companion
   *extract_citebutton_to_bib.py* script.  The key is built with exactly the
   same heuristic so the two scripts remain in sync.

Each transformation comments the original code with a “% OLD” block and
introduces the replacement preceded by “% new”.  Untouched files are skipped
with a log entry.

Logs (console + file) live in the `auto/` subfolder.
"""

from __future__ import annotations

import logging
import os
import pathlib
import re
from typing import Callable, List, Tuple



# --------------------------------------------------------------------------- #
# 2. Compile all replacement patterns – each returns (new_text, changed)      #
# --------------------------------------------------------------------------- #

# (a) \documentclass
_docclass_re = re.compile(
    r"^(\\documentclass\[\s*)11pt,(compress,t,notes=noshow,\s*aspectratio=169,\s*xcolor=table\](\s*\{beamer\}))",
    re.M,
)

def _fix_docclass(text: str, with_comments: bool = True) -> Tuple[str, bool]:
    def _sub(m: re.Match[str]) -> str:
        old_line = m.group(0)
        new_line = r"\documentclass[10pt,compress,t,notes=noshow, xcolor=table]{beamer}"
        if with_comments:
            return f"% OLD\n%{old_line}\n% NEW\n{new_line}"
        else:
            return new_line

    new, n = _docclass_re.subn(_sub, text)
    return new, n > 0


# (b) \usepackage → \input
_usepkg_re = re.compile(r"^(\\)usepackage\{(\.\./\.\./style/)lmu-lecture\}", re.M)
def _fix_usepackage(text: str, with_comments: bool = True) -> Tuple[str, bool]:
    def _sub(m: re.Match[str]) -> str:
        old_line = m.group(0)
        new_line = (
            rf"{m.group(1)}input{{{m.group(2)}preamble}}" "\n"
            r"\input{../../latex-math/basic-math}" "\n"
            r"\input{../../latex-math/basic-ml}"
        )

        if with_comments:
            return f"% OLD\n%{old_line}\n% NEW\n{new_line}"
        else:
            return new_line

    new, n = _usepkg_re.subn(_sub, text)
    return new, n > 0


# (c) title/goal/chapter/lecture block → \titlemeta{...}
_block_re = re.compile(
    r"""
    \\newcommand\{\\titlefigure\}\{(?P<path>[^}]*)\}\s*
    \\newcommand\{\\learninggoals\}\{(?P<goals>.*?)\}\s*
    \\lecturechapter\{(?P<title>[^}]*)\}\s*
    \\lecture\{(?P<subtitle>[^}]*)\}
    """,
    re.S | re.X,
)

# ToDo: make subtitle an argument, not the main title
def _fix_block(text: str, title: str, with_comments: bool = True) -> Tuple[str, bool]:
    def _sub(m: re.Match[str]) -> str:
        old = m.group(0)
        new_block = (
            f"\\titlemeta{{\n{title}\n}}{{\n{m.group('title')}\n}}{{\n"
            f"{m.group('path')}\n}}{{\n{m.group('goals')}\n}}\n"
        )
        if with_comments:
            commented_old = "% OLD\n%" + old.replace("\n", "\n%")
            return f"{commented_old}\n% NEW\n{new_block}"
        else:
            return new_block

    new, n = _block_re.subn(_sub, text)
    return new, n > 0


# (d) \citebutton → \furtherreading


def _fix_citebutton(text: str, with_comments: bool = True) -> Tuple[str, bool]:
    changed = False

    def _sub(m: re.Match[str]) -> str:
        nonlocal changed
        label, url = m.group(1).strip(), m.group(2).strip()
        year_match = _YEAR_RE.search(label) or _YEAR_RE.search(url)
        year_val = year_match.group(0) if year_match else ""
        author_val = _guess_author(label)
        key = _make_key(author_val, year_val)
        old_cmd = m.group(0)
        new_cmd = rf"\furtherreading{{{key}}}"
        changed = True
        
        if with_comments:
            return f"% OLD\n%{old_cmd}\n% NEW\n{new_cmd}"
        else:
            return new_cmd
        
    new_text = _CITE_RE.sub(_sub, text)
    return new_text, changed


_TRANSFORMS: List[Callable[[str], Tuple[str, bool]]] = [
    _fix_docclass,
    _fix_usepackage,
    _fix_block,
    _fix_citebutton,
]

# --------------------------------------------------------------------------- #
# 3. Main runner                                                              #
# --------------------------------------------------------------------------- #

def run(folder: str | os.PathLike[str], 
    prefix: str = "slides",
    with_comment: bool = True,
    in_place: bool = False) -> None:
    """
    Convert all matching .tex files in *folder*.
    If in_place is False, create `<folder>/auto` and write changed files there.
    If in_place is True, backup originals to `<folder>/before_migration` and overwrite originals.
    """
    folder = pathlib.Path(folder).expanduser().resolve()
    if in_place:
        backup_dir = folder / "before_migration"
        backup_dir.mkdir(exist_ok=True)
        log_file = folder / "migration_inplace.log"
    else:
        auto_dir = folder / "auto"
        auto_dir.mkdir(exist_ok=True)
        log_file = auto_dir / "migration.log"

    logging.basicConfig(
    level=logging.INFO,
    format="%(levelname)s | %(message)s",
    handlers=[
        logging.StreamHandler(),
        # logging.FileHandler(log_file, encoding="utf-8", mode="w"),
    ],
    )
    logger = logging.getLogger("migrate_tex")

    logger.info("Starting migration in %s (prefix '%s')", folder, prefix)

    for tex_path in folder.glob(f"{prefix}*.tex"):
        if tex_path.is_dir():
            continue
        with tex_path.open(encoding="utf-8") as fh:
            content = fh.read()

        changed = False
        for fn in _TRANSFORMS:
            if fn is _fix_block:
                content, was_changed = fn(content, title=TITLE_MAIN, with_comments=with_comment)
            else:
                content, was_changed = fn(content, with_comments=with_comment)
            changed |= was_changed

        if not changed:
            logger.info("SKIP %s – no patterns found", tex_path.name)
            continue

        if in_place:
            # Backup original
            backup_path = backup_dir / tex_path.name
            if not backup_path.exists():
                tex_path.replace(backup_path)
                # Write modified content in place
                tex_path.write_text(content, encoding="utf-8")
                logger.info("UPDATED %s (backup in before_migration)", tex_path.name)
        else:
            new_name = f"{tex_path.stem}_auto.tex"
            out_path = auto_dir / new_name
            out_path.write_text(content, encoding="utf-8")
            logger.info("WROTE %s", new_name)

    logger.info("Migration finished. See %s for details.", log_file)





## Run

In [7]:
import os
from pathlib import Path

all_chunks = [i for i in os.listdir("slides") if Path("slides", i).is_dir()]
print(f"{len(all_chunks)} chunks: {all_chunks}")

10 chunks: ['01_intro', '02_interpretable-models-1', '03_interpretable-models-2', '04_feature-effects', '05_functional-decompositions', '06_shapley', '08_regional-effects', '09_feature-importance-1', '11_local-explanations-lime', '12_local-explanations-counterfactual']


In [9]:
for chunk in all_chunks:
    print(f"Processing chunk: {chunk}")
    
    FOLDER = Path("slides") / chunk
    PREFIX = "TO"
    INCLUDE_COMMENTS = False

    TITLE_MAIN = "PLACEHOLDER"

    assert FOLDER.exists(), f"Folder {FOLDER} does not exist. Dirs: \n{list(FOLDER.parent.iterdir())}"

    if not INCLUDE_COMMENTS:
        print("Running without comments in the output .tex files.")
    
    run(FOLDER, PREFIX, with_comment=INCLUDE_COMMENTS, in_place=True)

INFO | Starting migration in C:\Users\hayk_\OneDrive\Desktop\lecture_service_attempt\slides\01_intro (prefix 'TO')
INFO | Migration finished. See C:\Users\hayk_\OneDrive\Desktop\lecture_service_attempt\slides\01_intro\migration_inplace.log for details.
INFO | Starting migration in C:\Users\hayk_\OneDrive\Desktop\lecture_service_attempt\slides\02_interpretable-models-1 (prefix 'TO')
INFO | Migration finished. See C:\Users\hayk_\OneDrive\Desktop\lecture_service_attempt\slides\02_interpretable-models-1\migration_inplace.log for details.
INFO | Starting migration in C:\Users\hayk_\OneDrive\Desktop\lecture_service_attempt\slides\03_interpretable-models-2 (prefix 'TO')
INFO | SKIP TO-DO-slides-im-RPF-talk-Coco.tex – no patterns found
INFO | Migration finished. See C:\Users\hayk_\OneDrive\Desktop\lecture_service_attempt\slides\03_interpretable-models-2\migration_inplace.log for details.
INFO | Starting migration in C:\Users\hayk_\OneDrive\Desktop\lecture_service_attempt\slides\04_feature-effe

Processing chunk: 01_intro
Running without comments in the output .tex files.
Processing chunk: 02_interpretable-models-1
Running without comments in the output .tex files.
Processing chunk: 03_interpretable-models-2
Running without comments in the output .tex files.
Processing chunk: 04_feature-effects
Running without comments in the output .tex files.
Processing chunk: 05_functional-decompositions
Running without comments in the output .tex files.
Processing chunk: 06_shapley
Running without comments in the output .tex files.
Processing chunk: 08_regional-effects
Running without comments in the output .tex files.
Processing chunk: 09_feature-importance-1
Running without comments in the output .tex files.
Processing chunk: 11_local-explanations-lime
Running without comments in the output .tex files.
Processing chunk: 12_local-explanations-counterfactual
Running without comments in the output .tex files.


In [10]:
FOLDER = Path("slides") / "11_local-explanations-lime" / "backup"
PREFIX = "slides"
INCLUDE_COMMENTS = False

TITLE_MAIN = "PLACEHOLDER"

create_bibtex(FOLDER, PREFIX)
run(FOLDER, PREFIX, with_comment=INCLUDE_COMMENTS, in_place=True)

INFO | Scanning slides-le-adversarial-counterfactuals.tex
INFO | Added 1 reference(s) from slides-le-adversarial-counterfactuals.tex – keys: Ballet_2019
INFO | Scanning slides-le-adversarial.tex
INFO | SKIP duplicate key Ballet_2019
INFO | SKIP duplicate key Goodfellow_2014
INFO | SKIP duplicate key Goodfellow_2015
INFO | Added 13 reference(s) from slides-le-adversarial.tex – keys: Poellabauer_2018, Eykholt_2018, Athalye_2018, MITCSAIL_2017, Szegedy_2013, Goodfellow_2014, Griffin_2016, Kim_2019, Ilyas_2019, Goodfellow_2015, Lyu_2015, Papernot_2016, Goodfellow_2017
INFO | Scanning slides-le-counterfactuals-methods.tex
INFO | SKIP duplicate key Dandl_2020
INFO | Added 2 reference(s) from slides-le-counterfactuals-methods.tex – keys: Wachter_2018, Dandl_2020
INFO | Scanning slides-le-counterfactuals.tex
INFO | SKIP duplicate key Dandl_2020
INFO | Added 3 reference(s) from slides-le-counterfactuals.tex – keys: Lewis_1973, Verma_2020, Karimi_2020
INFO | Scanning slides-le-intro.tex
INFO | A

# Rename pdfs (internal, perhaps won't be needed later)

In [54]:
from pathlib import Path

folder_pdfs = Path("pdfs", "05_functional_decompositions")

assert folder_pdfs.exists(), f"Folder {folder_pdfs} does not exist. Folders:\n {list(Path('pdfs').glob('*'))}"

folder_num = folder_pdfs.parts[-1].split("_")[0]


# num files
pdfs = list(folder_pdfs.glob("*.pdf"))

num_files = len(pdfs)
print(f"Found {num_files} PDF files in {folder_pdfs}.")

# sort pdfs
pdfs.sort(key=lambda p: p.stem.lower())

print(pdfs)

Found 10 PDF files in pdfs\05_functional_decompositions.
[WindowsPath('pdfs/05_functional_decompositions/05_01_before.pdf'), WindowsPath('pdfs/05_functional_decompositions/05_02_before.pdf'), WindowsPath('pdfs/05_functional_decompositions/05_03_before.pdf'), WindowsPath('pdfs/05_functional_decompositions/05_04_before.pdf'), WindowsPath('pdfs/05_functional_decompositions/05_05_before.pdf'), WindowsPath('pdfs/05_functional_decompositions/lecture_service_attempt (1).pdf'), WindowsPath('pdfs/05_functional_decompositions/lecture_service_attempt (2).pdf'), WindowsPath('pdfs/05_functional_decompositions/lecture_service_attempt (3).pdf'), WindowsPath('pdfs/05_functional_decompositions/lecture_service_attempt (4).pdf'), WindowsPath('pdfs/05_functional_decompositions/lecture_service_attempt (5).pdf')]


In [55]:
# rename to "{folder_num}_{index}_before.pdf"
for i, pdf in enumerate(pdfs, start=1):
    new_name = folder_pdfs / f"{folder_num}_{i:02d}_before.pdf"
    pdf.rename(new_name)
    print(f"Renamed {pdf} to {new_name}")


Renamed pdfs\05_functional_decompositions\05_01_before.pdf to pdfs\05_functional_decompositions\05_01_before.pdf
Renamed pdfs\05_functional_decompositions\05_02_before.pdf to pdfs\05_functional_decompositions\05_02_before.pdf
Renamed pdfs\05_functional_decompositions\05_03_before.pdf to pdfs\05_functional_decompositions\05_03_before.pdf
Renamed pdfs\05_functional_decompositions\05_04_before.pdf to pdfs\05_functional_decompositions\05_04_before.pdf
Renamed pdfs\05_functional_decompositions\05_05_before.pdf to pdfs\05_functional_decompositions\05_05_before.pdf
Renamed pdfs\05_functional_decompositions\lecture_service_attempt (1).pdf to pdfs\05_functional_decompositions\05_06_before.pdf
Renamed pdfs\05_functional_decompositions\lecture_service_attempt (2).pdf to pdfs\05_functional_decompositions\05_07_before.pdf
Renamed pdfs\05_functional_decompositions\lecture_service_attempt (3).pdf to pdfs\05_functional_decompositions\05_08_before.pdf
Renamed pdfs\05_functional_decompositions\lecture_s

In [40]:
files_not_before = [pdf for pdf in pdfs if not pdf.stem.endswith("_before")]

for i, pdf in enumerate(files_not_before, start=1):
    new_name = folder_pdfs / f"{folder_num}_{i:02d}_after.pdf"
    pdf.rename(new_name)
    print(f"Renamed {pdf} to {new_name}")

Renamed pdfs\12_local_explanations_counterfactual\lecture_service_attempt (2).pdf to pdfs\12_local_explanations_counterfactual\12_01_after.pdf
Renamed pdfs\12_local_explanations_counterfactual\lecture_service_attempt (3).pdf to pdfs\12_local_explanations_counterfactual\12_02_after.pdf
Renamed pdfs\12_local_explanations_counterfactual\lecture_service_attempt (4).pdf to pdfs\12_local_explanations_counterfactual\12_03_after.pdf
Renamed pdfs\12_local_explanations_counterfactual\lecture_service_attempt (5).pdf to pdfs\12_local_explanations_counterfactual\12_04_after.pdf
Renamed pdfs\12_local_explanations_counterfactual\lecture_service_attempt (6).pdf to pdfs\12_local_explanations_counterfactual\12_05_after.pdf
Renamed pdfs\12_local_explanations_counterfactual\lecture_service_attempt (7).pdf to pdfs\12_local_explanations_counterfactual\12_06_after.pdf


# Puting old/new pdfs on same page

In [None]:
!uv pip install pypdf Pillow


[2mUsing Python 3.10.18 environment at: C:\Users\hayk_\.conda\envs\lectures[0m
[2mResolved [1m3 packages[0m [2min 511ms[0m[0m
[2mPrepared [1m1 package[0m [2min 317ms[0m[0m
[2mInstalled [1m1 package[0m [2min 66ms[0m[0m
 [32m+[39m [1mpypdf[0m[2m==5.8.0[0m


In [56]:
import os
from pathlib import Path
from pypdf import PdfReader, PdfWriter, Transformation

def _pair_names(files: list[Path]):
    """Return {prefix: {"before": Path, "after": Path}} for complete pairs."""

    pairs: dict[str, dict[str, Path]] = {}
    for f in files:
        stem = f.stem  # filename without extension
        if stem.endswith("_before"):
            prefix = stem[:-7]
            pairs.setdefault(prefix, {})["before"] = f
        elif stem.endswith("_after"):
            prefix = stem[:-6]
            pairs.setdefault(prefix, {})["after"] = f

    # keep only complete pairs (have both keys)
    return {
        p: d for p, d in pairs.items() if {"before", "after"}.issubset(d.keys())
    }


def _merge_pair(before: Path, after: Path, output: Path):
    """Create <prefix>_side‑by‑side.pdf for one before/after pair."""

    rb, ra = PdfReader(before), PdfReader(after)
    writer = PdfWriter()

    page_count = min(len(rb.pages), len(ra.pages))

    for i in range(page_count):
        pb = rb.pages[i]
        pa = ra.pages[i]

        # dimensions
        w_b, h_b = pb.mediabox.width, pb.mediabox.height
        w_a, h_a = pa.mediabox.width, pa.mediabox.height
        new_w, new_h = w_b + w_a, max(h_b, h_a)

        # create blank page big enough for both
        new_page = writer.add_blank_page(width=new_w, height=new_h)

        # place BEFORE on the left (origin)
        new_page.merge_page(pb)

        # place AFTER translated to the right by w_b
        transform = Transformation().translate(tx=w_b, ty=0)
        new_page.merge_transformed_page(pa, transform)

    # write result
    with output.open("wb") as fh:
        writer.write(fh)
    print(f"✓  {output.name}")


def compare_dir(directory: str | os.PathLike):
    """Scan directory for before/after pairs and build comparison PDFs."""

    directory = Path(directory)
    if not directory.is_dir():
        raise NotADirectoryError(directory)

    pdfs = [p for p in directory.iterdir() if p.suffix.lower() == ".pdf"]
    pairs = _pair_names(pdfs)

    if not pairs:
        print("No before/after pairs found.")
        return

    for prefix, files in pairs.items():
        out_folder = directory / "side_by_side"
        out_folder.mkdir(exist_ok=True)
        out = out_folder / f"{prefix}_side-by-side.pdf"
        _merge_pair(files["before"], files["after"], out)

In [57]:
all_dirs = os.listdir("pdfs")

for chunk in all_dirs:
    chunk_path = Path("pdfs", chunk)
    compare_dir(chunk_path)

✓  01_01_intro_side-by-side.pdf
✓  01_02_goals_side-by-side.pdf
✓  01_03_dimensions_side-by-side.pdf
✓  01_04_correlation_side-by-side.pdf
✓  01_05_interactions_side-by-side.pdf
✓  02_01_motivation_side-by-side.pdf
✓  02_02_lm_simple_side-by-side.pdf
✓  02_03_lm_extensions_side-by-side.pdf
✓  02_04_glm_side-by-side.pdf
✓  02_05_rule_based_side-by-side.pdf
✓  03_01_gam_side-by-side.pdf
✓  03_02_ebm_side-by-side.pdf
✓  04_01_side-by-side.pdf
✓  04_02_side-by-side.pdf
✓  04_03_side-by-side.pdf
✓  04_04_side-by-side.pdf
✓  04_05_side-by-side.pdf
✓  04_06_side-by-side.pdf
✓  04_07_side-by-side.pdf
✓  04_08_side-by-side.pdf
✓  05_01_side-by-side.pdf
✓  05_02_side-by-side.pdf
✓  05_03_side-by-side.pdf
✓  05_04_side-by-side.pdf
✓  05_05_side-by-side.pdf
✓  06_01_side-by-side.pdf
✓  06_02_side-by-side.pdf
✓  06_03_side-by-side.pdf
✓  06_04_side-by-side.pdf
✓  06_05_side-by-side.pdf
✓  06_06_side-by-side.pdf
✓  08_01_side-by-side.pdf
✓  08_02_side-by-side.pdf
✓  09_01_side-by-side.pdf
✓  09_02_p

# Merge PDFs

In [62]:
def merge_pdfs_with_bookmarks(
    in_dir: str | Path,
    out_file: str | Path = "merged.pdf",
    recursive: bool = False,
) -> None:
    """Merge every PDF in *in_dir* into *out_file* and add a bookmark
    at the start of each original document.

    Parameters
    ----------
    in_dir : str | Path
        Folder containing PDF files.
    out_file : str | Path, default "merged.pdf"
        Where to write the combined PDF.
    recursive : bool, default False
        If True, search sub‑folders too.

    Notes
    -----
    • Requires **pypdf ≥ 3.7.0** (for ``PdfWriter.append`` with ``outline_item``).
    • PDF files are processed in alphabetical order for repeatability.
    """

    in_dir = Path(in_dir)
    if not in_dir.is_dir():
        raise NotADirectoryError(in_dir)

    pattern = "**/*.pdf" if recursive else "*.pdf"
    pdf_paths = sorted(p for p in in_dir.glob(pattern) if p.is_file())

    if not pdf_paths:
        raise FileNotFoundError("No PDF files found in", in_dir)

    writer = PdfWriter()

    for pdf_path in pdf_paths:
        reader = PdfReader(pdf_path)
        # Insert pages and create a bookmark for the start page of this document
        writer.append(reader, outline_item=pdf_path.stem)

    # --- Write to disk (fixed) --------------------------------------
    out_file = Path(out_file)
    with out_file.open("wb") as fh:
        writer.write(fh)  # writer.write() needs a *stream* (file handle)

    print(f"Merged {len(pdf_paths)} PDFs → {out_file.resolve()}")


In [63]:
all_dirs = os.listdir("pdfs")
print(all_dirs)
for chunk in all_dirs:
    print(chunk)
    chunk_path = Path("pdfs", chunk)
    merge_pdfs_with_bookmarks(
        chunk_path / "side_by_side", 
        Path("pdfs") / f"merged_{chunk}_side_by_side.pdf")
    

['01_intro', '02_interp_models_1', '03_interp_models_2', '04_feature_effects', '05_functional_decompositions', '06_shapley', '08_regional_effects', '09_feature_importance_1', '11_local_explanations_lime', '12_local_explanations_counterfactual']
01_intro
Merged 5 PDFs → C:\Users\hayk_\OneDrive\Desktop\lecture_service_attempt\pdfs\merged_01_intro_side_by_side.pdf
02_interp_models_1
Merged 5 PDFs → C:\Users\hayk_\OneDrive\Desktop\lecture_service_attempt\pdfs\merged_02_interp_models_1_side_by_side.pdf
03_interp_models_2
Merged 2 PDFs → C:\Users\hayk_\OneDrive\Desktop\lecture_service_attempt\pdfs\merged_03_interp_models_2_side_by_side.pdf
04_feature_effects
Merged 8 PDFs → C:\Users\hayk_\OneDrive\Desktop\lecture_service_attempt\pdfs\merged_04_feature_effects_side_by_side.pdf
05_functional_decompositions
Merged 5 PDFs → C:\Users\hayk_\OneDrive\Desktop\lecture_service_attempt\pdfs\merged_05_functional_decompositions_side_by_side.pdf
06_shapley
Merged 6 PDFs → C:\Users\hayk_\OneDrive\Desktop\l

In [64]:
merge_pdfs_with_bookmarks(
    Path("pdfs", "00_merged"), 
    Path("pdfs") / f"merged_all_side_by_side.pdf")

Merged 10 PDFs → C:\Users\hayk_\OneDrive\Desktop\lecture_service_attempt\pdfs\merged_all_side_by_side.pdf


# Count files

In [2]:
#!/usr/bin/env python3
"""
Count files in every sub‑directory under a given root and report the total.

Usage (from the command line):
    python count_files.py /path/to/root
"""

from pathlib import Path
import os
import sys
from pprint import pprint


def count_files_by_directory(root: str | os.PathLike = ".") -> tuple[dict[Path, int], int]:
    """
    Recursively count files inside every directory under *root*.

    Parameters
    ----------
    root : str | Path
        Directory to start walking from.

    Returns
    -------
    dir_counts : dict[Path, int]
        Mapping of absolute directory path → number of (non‑hidden) files it contains.
    total_files : int
        Sum of all counted files.
    """
    dir_counts: dict[Path, int] = {}
    total_files = 0
    root_path = Path(root).expanduser().resolve()

    for dirpath, _, filenames in os.walk(root_path):
        # Skip hidden files (starting with "."); remove this filter if you want to include them
        visible_files = [f for f in filenames if not f.startswith(".")]
        count = len(visible_files)

        dir_counts[Path(dirpath)] = count
        total_files += count

    return dir_counts, total_files


counts, total = count_files_by_directory()

pprint(counts)            # Nicely formats the per‑directory counts
print(f"\nTotal files: {total}")


{WindowsPath('//wsl.localhost/Ubuntu/root/lecture_service_attempt'): 5,
 WindowsPath('//wsl.localhost/Ubuntu/root/lecture_service_attempt/.git'): 7,
 WindowsPath('//wsl.localhost/Ubuntu/root/lecture_service_attempt/.git/branches'): 0,
 WindowsPath('//wsl.localhost/Ubuntu/root/lecture_service_attempt/.git/hooks'): 14,
 WindowsPath('//wsl.localhost/Ubuntu/root/lecture_service_attempt/.git/info'): 1,
 WindowsPath('//wsl.localhost/Ubuntu/root/lecture_service_attempt/.git/logs'): 1,
 WindowsPath('//wsl.localhost/Ubuntu/root/lecture_service_attempt/.git/logs/refs'): 0,
 WindowsPath('//wsl.localhost/Ubuntu/root/lecture_service_attempt/.git/logs/refs/heads'): 1,
 WindowsPath('//wsl.localhost/Ubuntu/root/lecture_service_attempt/.git/logs/refs/remotes'): 0,
 WindowsPath('//wsl.localhost/Ubuntu/root/lecture_service_attempt/.git/logs/refs/remotes/origin'): 1,
 WindowsPath('//wsl.localhost/Ubuntu/root/lecture_service_attempt/.git/objects'): 0,
 WindowsPath('//wsl.localhost/Ubuntu/root/lecture_servi

In [3]:
#!/usr/bin/env python3
"""
Utility helpers:

- count_files_by_directory(root, include=None, exclude=None)
      → (dict[pathlib.Path, int], int)

- zip_tree(root, zip_path, include=None, exclude=None, *, keep_structure=True)
      → pathlib.Path  # the ZIP file that was written
"""

from pathlib import Path
import os
import zipfile
from typing import Iterable, Sequence


def _normalize_list(
    items: Sequence[str | os.PathLike] | None, root: Path
) -> set[Path] | None:
    """Return absolute Path objects or None if *items* is None/empty."""
    if not items:
        return None
    return {root.joinpath(Path(p).as_posix()).resolve() for p in items}


def _should_skip(path: Path, include: set[Path] | None, exclude: set[Path] | None) -> bool:
    """True if *path* (a directory) must be skipped according to include/exclude."""
    if include is not None:
        # Skip everything NOT under one of the include paths
        return not any(include_path in path.parents or include_path == path for include_path in include)
    if exclude is not None:
        # Skip anything that is (or is inside) an excluded dir
        return any(ex_path in path.parents or ex_path == path for ex_path in exclude)
    return False


def count_files_by_directory(
    root: str | os.PathLike = ".",
    *,
    include: Sequence[str | os.PathLike] | None = None,
    exclude: Sequence[str | os.PathLike] | None = None,
) -> tuple[dict[Path, int], int]:
    """
    Recursively count visible files in *root*, honoring optional include/exclude.

    include / exclude:
        Iterable of directory paths relative to *root*.
        • If *include* is given, ONLY those paths (and their sub‑trees) are considered.
        • If *include* is None but *exclude* is provided, those paths are skipped.
        • Leading dots (hidden dirs/files) are always ignored.
    """
    root = Path(root).expanduser().resolve()
    include_set = _normalize_list(include, root)
    exclude_set = _normalize_list(exclude, root)

    dir_counts: dict[Path, int] = {}
    total = 0

    for dirpath, _, filenames in os.walk(root):
        dir_path = Path(dirpath)

        # Determine if this directory should be processed
        if _should_skip(dir_path, include_set, exclude_set):
            continue

        # Skip hidden directories entirely
        if dir_path.name.startswith("."):
            continue

        visible_files = [f for f in filenames if not f.startswith(".")]
        count = len(visible_files)
        dir_counts[dir_path] = count
        total += count

    return dir_counts, total


def zip_tree(
    root: str | os.PathLike,
    zip_path: str | os.PathLike,
    include: Sequence[str | os.PathLike] | None = None,
    exclude: Sequence[str | os.PathLike] | None = None,
    *,
    keep_structure: bool = True,
) -> Path:
    """
    Compress *root* into *zip_path* applying the same include/exclude logic.

    Parameters
    ----------
    keep_structure : bool
        • True  → store each file with its path relative to *root* (default).  
        • False → store only basenames (may cause name clashes).

    Returns  -------
    Path to the created ZIP archive.
    """
    root = Path(root).expanduser().resolve()
    zip_path = Path(zip_path).expanduser().resolve()
    include_set = _normalize_list(include, root)
    exclude_set = _normalize_list(exclude, root)

    with zipfile.ZipFile(zip_path, "w", compression=zipfile.ZIP_DEFLATED) as zf:
        for dirpath, _, filenames in os.walk(root):
            dir_path = Path(dirpath)

            if _should_skip(dir_path, include_set, exclude_set):
                continue
            if dir_path.name.startswith("."):
                continue

            for fname in filenames:
                if fname.startswith("."):
                    continue
                fpath = dir_path / fname
                arcname = fpath.relative_to(root) if keep_structure else fpath.name
                zf.write(fpath, arcname)

    return zip_path


# -------- example usage --------
if __name__ == "__main__":
    project_root = Path(".")  # current directory

    # Only include specific sub‑folders (relative to root)
    include_dirs = ["src", "tests"]

    # counts, total = count_files_by_directory(project_root, include=include_dirs)
    # print("Per‑directory counts:")
    # for d, c in counts.items():
    #     print(f"{d}: {c}")
    # print(f"TOTAL: {total}")

    # Create archive, excluding virtual‑env and build artifacts
    archive = zip_tree(
        project_root,
        "to_overleaf.zip",
        include=["latex-math", "slides", "style"],
    )
    print(f"\nCreated ZIP archive at {archive}")



Created ZIP archive at \\wsl.localhost\Ubuntu\root\lecture_service_attempt\to_overleaf.zip


In [4]:
len(os.listdir("style"))

25

In [5]:
count_files_by_directory("style")

({WindowsPath('//wsl.localhost/Ubuntu/root/lecture_service_attempt/style'): 23,
  WindowsPath('//wsl.localhost/Ubuntu/root/lecture_service_attempt/style/color'): 3,
  WindowsPath('//wsl.localhost/Ubuntu/root/lecture_service_attempt/style/logos'): 24},
 50)

# hanging words 
(wanted to create an automatic detector of hanging words, but abandoned the idea)

In [7]:
!uv pip install pdfplumber

[2mUsing Python 3.10.18 environment at: C:\Users\hayk_\.conda\envs\lectures[0m
[2mResolved [1m8 packages[0m [2min 777ms[0m[0m
[36m[1mDownloading[0m[39m pdfminer-six [2m(5.4MiB)[0m
[36m[1mDownloading[0m[39m cryptography [2m(3.2MiB)[0m
[36m[1mDownloading[0m[39m pypdfium2 [2m(2.8MiB)[0m
 [32m[1mDownloading[0m[39m pypdfium2
 [32m[1mDownloading[0m[39m cryptography
 [32m[1mDownloading[0m[39m pdfminer-six
[2mPrepared [1m6 packages[0m [2min 1.04s[0m[0m
[2mInstalled [1m6 packages[0m [2min 682ms[0m[0m
 [32m+[39m [1mcffi[0m[2m==1.17.1[0m
 [32m+[39m [1mcryptography[0m[2m==45.0.5[0m
 [32m+[39m [1mpdfminer-six[0m[2m==20250506[0m
 [32m+[39m [1mpdfplumber[0m[2m==0.11.7[0m
 [32m+[39m [1mpycparser[0m[2m==2.22[0m
 [32m+[39m [1mpypdfium2[0m[2m==4.30.1[0m


In [None]:
import pdfplumber
from pathlib import Path

pdf_path = Path("pdfs") / "02_interp_models_1" / "02_02_lm_simple_after.pdf" 
assert pdf_path.exists(), f"PDF file not found: {pdf_path}"
results = []

with pdfplumber.open(pdf_path) as pdf:
    for page_num, page in enumerate(pdf.pages, start=1):
        # Extract lines with positions
        for line in page.extract_text(x_tolerance=2, y_tolerance=2).split("\n"):
            # Very naïve header/footer filter:
            if page.height * 0.05 < page.extract_words()[0]['top'] < page.height * 0.95:
                words = line.strip().split()
                print(f"Page {page_num}: {line}")
                if len(words) == 1:
                    bbox = [w for w in page.extract_words() if w['text'] == words[0]][0]
                    results.append({
                        "page": page_num,
                        "text": words[0],
                        "bbox": (bbox['x0'], bbox['top'], bbox['x1'], bbox['bottom'])
                    })

# Show summary
for r in results:
    print(f"Page {r['page']}: “{r['text']}” at {r['bbox']}")


Page 1: Interpretable Machine Learning
Page 1: Linear Regression Model
Page 1: 400
Page 1: 300
Page 1: 200
Page 1: 100
Page 1: 0
Page 1: −100
Page 1: 0.0 2.5 5.0 7.5 10.0
Page 1: x
Page 1: y
Page 1: Learning goals
Page 1: LM basics and assumptions
Page 1: Interpretation of main effects in LM
Page 1: What are significant features?
Page 1: “400” at (30.7588364, 149.53615468000024, 36.12052232, 152.75059468000023)
Page 1: “300” at (30.7588364, 161.34743588000023, 36.12052232, 164.56187588000023)
Page 1: “200” at (30.7588364, 173.15514548000021, 36.12052232, 176.36958548000024)
Page 1: “100” at (30.7588364, 184.96642668000024, 36.12052232, 188.18086668000024)
Page 1: “0” at (34.334008, 196.77413628000022, 36.12123664, 199.98857628000025)
Page 1: “−100” at (28.8837464, 208.58541748000025, 36.12266528, 211.79985748000024)
Page 1: “x” at (92.40108079999999, 216.27292932000023, 94.3654608, 220.20168932000024)
Page 1: “y” at (23.92529412, 177.93287480000023, 27.85405412, 179.89725480000024)


# Cleanup the files created by Latex Project (VS Coe)

In [1]:
import os
from pathlib import Path

os.chdir("..")


In [2]:


extension_to_remove = ['aux', 'bcf', 'fdb_latexmk', 'fls', 'log', 'nav', 'out', 
                       'pdf', 'run', 'snm', 'synctex', 'toc', "xml", "gz", "bbl"]

all_chunks = [i for i in os.listdir("slides") if Path("slides", i).is_dir()]
print(f"{len(all_chunks)} chunks: {all_chunks}")


failed = []
for chunk in all_chunks:
    print(f"Processing chunk: {chunk}")
    # Remove auxiliary files
    folder = Path("slides") / chunk
    for ext in extension_to_remove:
        for file in folder.glob(f"*.{ext}"):
            try:
                file.unlink(missing_ok=True)
                print(f"Removed {file.name}")
            except Exception as e:
                failed.append(file.name)
                print(f"Error removing {file.name}: {e}")
                
assert not failed, f"Failed to remove some files: {failed}"

10 chunks: ['01_intro', '02_interpretable-models-1', '03_interpretable-models-2', '04_feature-effects', '05_functional-decompositions', '06_shapley', '08_regional-effects', '09_feature-importance-1', '11_local-explanations-lime', '12_local-explanations-counterfactual']
Processing chunk: 01_intro
Processing chunk: 02_interpretable-models-1
Processing chunk: 03_interpretable-models-2
Processing chunk: 04_feature-effects
Processing chunk: 05_functional-decompositions
Processing chunk: 06_shapley
Processing chunk: 08_regional-effects
Processing chunk: 09_feature-importance-1
Removed slides02-fi-pfi.bbl
Removed slides03-fi-cfi.bbl
Processing chunk: 11_local-explanations-lime
Processing chunk: 12_local-explanations-counterfactual
Removed slides07-le-counterfactuals-optim.bbl
