# Instructions for Scripts in `helpers/scripts`

-----

Owner: Vadim Rudakov, lefthand67@gmail.com
Version: 0.2.1
Birth: 2025-12-20
Last Modified: 2026-01-05

-----

## Overview

This directory contains various utility scripts designed to automate common tasks within the AI Engineering Handbook project. Each script is written in Python and can be executed from the GNU/Linux command line. Below are the general steps to make a script executable and how to run it, along with specific instructions for each script.

## How to use scripts on GNU/Linux

In [1]:
grep -i 'pretty' /etc/os-release

[01;31m[KPRETTY[m[K_NAME="Fedora Linux 42 (KDE Plasma Desktop Edition)"


### Make a Script Executable

1. **Navigate to the Directory**: Open your terminal and navigate to the directory where the script is located.

```bash
cd path/to/your/project/helpers/scripts
```

2. **Make the Scripts Executable**: Run the following command to make all the scripts in this directory and its children executable.

```bash
find . -type f -name '*.py' -exec chmod 0755 {} +
```

This approach ensures that all Python scripts in the directory tree are made executable with the appropriate permissions (`0755`).

### Add Scripts to `PATH`

1. **Locate the Scripts Directory**: Obtain the absolute path using `pwd`.
1. **Edit Your Shell Configuration**: Open your `.bashrc` or `.bash_profile`.
1. **Add to PATH**: Add the following line:
    ```bash
    export PATH=${PATH}:/path/to/your/project/helpers/scripts
    ```

1. **Apply the Changes**: Reload your shell configuration file to apply the changes.
    ```bash
    source ~/.bashrc
    ```

### Run a Script

Once the script is executable and added to your `PATH`, you can run it from any directory by providing any necessary arguments as required by the script.

```bash
script_name.py [arguments]
```

Replace `script_name.py` with the actual name of your script and `[arguments]` with any required or optional arguments.

### Troubleshooting

- **Permission Denied**: If you encounter a "Permission denied" error when trying to run the script, ensure that you have made it executable using the `chmod 0755` command.

- **Syntax Errors**: If you encounter syntax errors, make sure that your Python environment is correctly set up and that the script is compatible with your version of Python.

## 1. check_broken_links.py

This script performs fast, local-only validation of relative file links within a directory and its subdirectories. While optimized for Jupyter Notebooks (`.ipynb`), it can scan any Markdown-style links. 

This tool is designed to serve as a high-quality diagnostic step in CI/CD, providing clear, parsable feedback to automate documentation maintenance.

It adheres to the **Smallest Viable Architecture (SVA)** principle, using only the Python standard library.

:::{hint} **SVA = right tool for the job**
SVA isn’t about minimal *code* — it’s about **minimal *cognitive and operational overhead***.

- Our users already have Python.
- They can **edit the script directly** to tweak regex or logic.
- No build system, no dependencies, no virtual envs needed (you use only stdlib!).
:::

**Features:**

* **Local-Only Policy:** Excludes external `http/https` URLs to focus on local repository integrity.
* **Git Root Awareness:** Automatically detects the Git project root to resolve absolute paths (e.g., `/docs/image.png`).
* **Intelligent Skipping:** Ignores internal anchors and fragments that do not contain path separators.
* **Directory & File Exclusion:** Automatically skips common noise directories like `.venv` and `.ipynb_checkpoints`.

### Usage

Synopsis:

```bash
check_broken_links.py [paths] [--pattern PATTERN] [options]
```

* **Arguments**:
* `paths`: The directory or specific file to search. Default is `.`.
* `--pattern`: The glob pattern to match files. **Default is `*.ipynb**`.


* **Options**:
* `--exclude-dirs`: Directories to skip (default: `in_progress`, `pr`, `.venv`).
* `--exclude-files`: Specific files to skip (default: `.aider.chat.history.ipynb`).
* `--verbose`: Shows detailed logs of skipped URLs and valid links.

### Default Exclusions

You can update the default exclusions directly in the `LinkCheckerCLI` class within the script:

```python
class LinkCheckerCLI:
    DEFAULT_EXCLUDE_DIRS = ["in_progress", "pr", ".venv"]
    DEFAULT_EXCLUDE_FILES = [".aider.chat.history.ipynb"]
```

### Examples

1. Check all `*.md` files in the current directory and subdirectories:

In [1]:
check_broken_links.py

Found 5 *.ipynb file(s) in .

✅ All links are valid!


2. Check all `*.txt` files recursively from the `./docs` directory:

In [2]:
check_broken_links.py . --pattern "*.md"

Found 8 *.md file(s) in .

✅ All links are valid!


3. Use exclusions (if not updated in the script):

In [3]:
check_broken_links.py --exclude-dirs drafts temp --exclude-files ReadMe.ipynb

Found 1 file file(s) in .

✅ All links are valid!


4. Check the given file:

In [4]:
cd ../../
ls

[0m[01;34m0_intro[0m            [01;34min_progress[0m     RELEASE_NOTES.ipynb
[01;34m1_execution[0m        LICENSE         RELEASE_NOTES.md
[01;34m2_model[0m            LICENSE-CODE    [01;34mresearch[0m
[01;34m3_infrastructure[0m   LICENSE-DOCS    [01;34msecurity[0m
[01;34m3_prompts[0m          [01;34mmlops[0m           test_commit_prompt.json
[01;34m4_orchestration[0m    myst.yml        todo
[01;34m5_context[0m          [01;34mpr[0m              [01;34mtools[0m
aider.CONVENTIONS  pyproject.toml  uv.lock
CHANGELOG          README.ipynb
[01;34mhelpers[0m            README.md


In [5]:
check_broken_links.py 0_intro/00_onboarding.ipynb

Found 1 file file(s) in 0_intro/00_onboarding.ipynb

✅ All links are valid!


4. Use verbose mode:

    ```bash
    check_broken_links.py --verbose
    ```

Broken links output looks like this:

In [6]:
check_broken_links.py

Found 40 *.ipynb file(s) in .

❌ 5 Broken links found:
BROKEN LINK: File 'tools/jupyter_and_markdown/semantic_notebook_versioning_ai_ready_jupyter_docs.ipynb' contains broken link: /helpers/scripts/environment_setup_scripts/
BROKEN LINK: File '4_orchestration/workflows/release_notes_generation/post-mortem_slm_non-determinism_in_commit_generation.ipynb' contains broken link: ../../../mlops/git_workflows/production_git_workflow_standards.md
BROKEN LINK: File '4_orchestration/workflows/release_notes_generation/post-mortem_slm_non-determinism_in_commit_generation.ipynb' contains broken link: ../../../mlops/git_workflows/commit_changelog_tooling.md
BROKEN LINK: File '4_orchestration/workflows/release_notes_generation/post-mortem_slm_non-determinism_in_commit_generation.ipynb' contains broken link: ../../../mlops/git_workflows/production_git_workflow_standards.md
BROKEN LINK: File '4_orchestration/workflows/release_notes_generation/slm_backed_release_documentation_pipeline_architecture.ipynb

: 1

## 2. format_string.py

This script formats a given input string by applying several transformations to make it URL-safe and filesystem-friendly. The transformations include converting to lowercase, replacing specific special characters, removing unwanted words, and truncating the string if necessary.

### Usage

Synopsis:

```bash
format_string.py 'Your Input String'
```

### Transformation Logic

1. **Convert to Lowercase**: All characters in the input string are converted to lowercase.
2. **Replace & with and**: The ampersand (`&`) is replaced with the word "and".
3. **Remove Special Symbols**: Certain special symbols (e.g., "the ", "(", ")", "# ", "#", "`", "~", "$", "%", "@") are removed from the string.
4. **Replace Special Symbols with Underscores**: Other special symbols (e.g., ".", ",", ";", ":", "!", "?", "-", "/", "\\", "|", "<", ">", "*") are replaced with underscores (`_`).
5. **Remove Multiple Underscores**: Any sequence of multiple underscores is reduced to a single underscore.
6. **Replace Spaces with Underscores**: All spaces in the string are replaced with underscores.
7. **Truncate Long Strings**: If the resulting string exceeds 50 characters, it is truncated to 50 characters.
8. **Remove Trailing Underscore**: If the final character of the string is an underscore, it is removed.

### Examples

In [1]:
format_string.py 'Agents4Science Conference Paper Digest: How Agents Are "Doing" Science Right Now'

agents4science_conference_paper_digest_how_agents


In [2]:
format_string.py '# Post-Mortem: Architectural Flaws in the `nbdiff`-Centric Jupyter Version Control Handbook'

post_mortem_architectural_flaws_in_nbdiff_centric


In [3]:
format_string.py 'Feedforward Neural Networks in Depth, Part 3 Cost Functions | I, Deep Learning.pdf'

feedforward_neural_networks_in_depth_part_3_cost_f


In [4]:
format_string.py 'From Concepts to Code: Introduction to Data Science (2024)'

from_concepts_to_code_introduction_to_data_science


In [5]:
format_string.py 'Quiz - Special Applications: Face Recognition & Neural Style Transfer'

quiz_special_applications_face_recognition_and_neu
