LCORE-581: refactored gendoc script #449

tisnik · 2025-08-26T09:03:03Z

Description

LCORE-581: refactored gendoc script

Type of change

Related Tickets & Documents

Related Issue #LCORE-581

Summary by CodeRabbit

Refactor
- Reorganized the documentation-generation script into modular, reusable components while preserving behavior.
- Improved reliability by safely handling directory changes and restoring the working directory after processing.
- Centralized orchestration to regenerate per-directory README files for the source folder and all nested subdirectories.
Documentation
- Added a new README in the source directory that lists each source file with a brief one-line description.
Chores
- Internal cleanup to streamline future maintenance and automation.

coderabbitai · 2025-08-26T09:03:11Z

Walkthrough

Refactors scripts/gen_doc.py by extracting per-directory README generation into generate_docfile(directory) and generate_documentation_on_path(path); main() now iterates src/ and its subdirectories via Path.rglob("*"), performs CWD swaps in a finally block, and preserves AST/docstring parsing and per-directory README behavior.

Changes

Cohort / File(s)	Summary of Changes
Doc generation refactor `scripts/gen_doc.py`	Replaced inline directory traversal and README generation with modular functions: `generate_docfile(directory)`, `generate_documentation_on_path(path)`, and `main()`. `main()` enumerates `src/` and subdirectories via `Path("src").rglob("*")`, the coordinator changes CWD, calls the generator, and restores CWD. AST parsing and first-docstring extraction behavior unchanged.
Generated README added `src/README.md`	New README listing `src` files and one-line descriptions extracted/generated for each module (e.g., `__init__.py`, `client.py`, `configuration.py`, `constants.py`, `lightspeed_stack.py`, `log.py`, `version.py`).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant U as CLI/User
    participant M as main()
    participant FS as Path("src").rglob("*")
    participant G as generate_documentation_on_path(path)
    participant D as generate_docfile(directory)
    participant IO as Filesystem

    U->>M: invoke script
    M->>FS: enumerate src/ and subdirectories
    loop per directory
        M->>G: call generate_documentation_on_path(dir)
        rect rgba(220,235,255,0.4)
        note right of G: save current CWD
        G->>IO: chdir(dir)
        G->>D: call generate_docfile(dir)
        D->>IO: scan *.py and parse AST for first docstring line
        D->>IO: write README.md in dir
        G-->>IO: finally: restore previous CWD
        end
    end
    M-->>U: exit

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I hop through folders, tidy and neat,
I fetch first lines where docstrings meet.
I write a README, then hop back home—
No loose crumbs left where rabbits roam. 🐇📜

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

scripts/gen_doc.py (4)

11-34: Prefer zero side-effects: avoid chdir and operate on Path objects end-to-end.

Repeatedly changing CWD is fragile (harder to reuse from other scripts, risks surprising interactions, complicates error handling). Since you already pass the target path around, you can generate README.md under that directory without chdir. This also nicely decouples generate_docfile from global state.

Apply this diff to eliminate chdir and make both functions path-driven:

 def generate_docfile(directory):
-    """Generate README.md in the CWD."""
-    with open("README.md", "w", encoding="utf-8", newline="\n") as indexfile:
-        print(
-            f"# List of source files stored in `{directory}` directory",
-            file=indexfile,
-        )
-        print("", file=indexfile)
-        files = sorted(os.listdir())
-
-        for file in files:
-            if file.endswith(".py"):
-                print(f"## [{file}]({file})", file=indexfile)
-                with open(file, "r", encoding="utf-8") as fin:
-                    source = fin.read()
-                try:
-                    mod = ast.parse(source)
-                    doc = ast.get_docstring(mod)
-                except SyntaxError:
-                    doc = None
-                if doc:
-                    print(doc.splitlines()[0], file=indexfile)
-                print(file=indexfile)
+    """Generate README.md under `directory` (no global CWD changes).**
+    """
+    # Local import to avoid touching top-level imports; also safe for reuse.
+    import tokenize
+    dir_path = Path(directory)
+    readme_path = dir_path / "README.md"
+    with open(readme_path, "w", encoding="utf-8", newline="\n") as indexfile:
+        print(
+            f"# List of source files stored in `{dir_path.as_posix()}` directory",
+            file=indexfile,
+        )
+        print("", file=indexfile)
+        for py_path in sorted(dir_path.glob("*.py")):
+            print(f"## [{py_path.name}]({py_path.name})", file=indexfile)
+            try:
+                with tokenize.open(py_path) as fin:
+                    source = fin.read()
+                mod = ast.parse(source)
+                doc = ast.get_docstring(mod)
+            except (SyntaxError, UnicodeDecodeError):
+                doc = None
+            if doc:
+                print(doc.splitlines()[0], file=indexfile)
+            print(file=indexfile)
 
 def generate_documentation_on_path(path):
-    """Generate documentation for all the sources found in path."""
-    directory = path
-    cwd = os.getcwd()
-    os.chdir(directory)
-    print(directory)
-
-    try:
-        generate_docfile(directory)
-    finally:
-        os.chdir(cwd)
+    """Generate documentation for all the sources found in `path`."""
+    print(path)
+    generate_docfile(path)

Also applies to: 36-47

21-23: Minor: rename loop variable from file to filename for clarity.

Small readability tweak; avoids shadowing a historically common name and clarifies intent.

Apply this tiny diff:

-        for file in files:
-            if file.endswith(".py"):
-                print(f"## [{file}]({file})", file=indexfile)
+        for filename in files:
+            if filename.endswith(".py"):
+                print(f"## [{filename}]({filename})", file=indexfile)

41-41: Nit: make progress output more informative (or toggle via a verbosity flag).

Plain print(directory) is terse. Consider a prefix or using logging for better UX.

Example minimal change:

-    print(directory)
+    print(f"[gendoc] Generating README.md in: {directory}")

49-55: Make execution robust to current working directory by anchoring src relative to this script.

If someone runs the script from outside the repo root, Path("src") may not resolve. Deriving src relative to this file avoids that.

Apply:

 def main():
-    """Entry point to this script, regenerates documentation in all directories."""
-    generate_documentation_on_path("src/")
-    for path in Path("src").rglob("*"):
+    """Entry point to this script, regenerates documentation in all directories."""
+    repo_root = Path(__file__).resolve().parents[1]  # repo root (parent of scripts/)
+    src_dir = repo_root / "src"
+    generate_documentation_on_path(src_dir)
+    for path in src_dir.rglob("*"):
         if path.is_dir():
             generate_documentation_on_path(path)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 30a432a and 247f1e2.

📒 Files selected for processing (1)

scripts/gen_doc.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-pr
GitHub Check: e2e_tests

🔇 Additional comments (2)

scripts/gen_doc.py (2)
11-17: LGTM: Clear modularization and safe CWD restoration.

Nice refactor extracting per-directory logic into functions and guarding CWD changes with a finally block. This improves readability and testability of the script.

Also applies to: 36-47, 49-55

24-30: Manual verification required: confirm encoding compliance for Python sources

I ran the suggested script to detect any Python files declaring a non-UTF-8/UTF8 coding cookie, but it produced no output. Absence of matches may simply mean there are no such declarations, but it’s not a definitive guarantee.

Please manually verify that:

No Python files in the repository declare a different source encoding via a # coding: or # -*- coding: -*- header.

All files you intend to parse are indeed UTF-8 (the default in Python 3) or otherwise safe to read with encoding="utf-8".

If you do have any non-UTF-8 files, consider updating the snippet to use tokenize.open, which respects PEP 263:
-                with open(file, "r", encoding="utf-8") as fin:
-                    source = fin.read()
+                # Respect source coding cookie (PEP 263)
+                import tokenize
+                with tokenize.open(file) as fin:
+                    source = fin.read()

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (7)

src/README.md (7)
1-2: Add an auto-generated notice at the top.

Since this README is produced by scripts/gen_doc.py, add a header to prevent accidental manual edits and reduce churn in PRs.

Apply this diff:
+
+
 # List of source files stored in `src/` directory
4-4: Fix “Lightspeed-stack” hyphenation.

Use “Lightspeed stack” consistently.
-Main classes for the Lightspeed-stack.
+Main classes for the Lightspeed stack.
-Lightspeed stack.
+Lightspeed stack.
Also applies to: 16-16

7-7: Correct “LLama” capitalization and tighten phrasing.

“Llama” is the common styling; “retrieval” is awkward here.
-LLama stack client retrieval.
+Llama stack client.
19-19: Prefer “Logging utilities.”

Minor wording improvement.
-Log utilities.
+Logging utilities.
21-22: Clarify what reads the version.

“Project manager tools” is ambiguous. Suggest phrasing that matches typical build/packaging usage.
-Service version that is read by project manager tools.
+Service version string exposed for build and packaging tools.
1-24: Option: note generation timestamp and source path.

If the generator supports it, embedding the generation date and source path aids traceability in downstream docs.

Example to have gen_doc.py prepend:

24-24: Ensure a single trailing newline.

Minor formatting nit: keep exactly one newline at EOF to satisfy common linters.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 247f1e2 and f85cdfd.

📒 Files selected for processing (2)

scripts/gen_doc.py (1 hunks)
src/README.md (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

scripts/gen_doc.py

🧰 Additional context used

🪛 LanguageTool

src/README.md

[grammar] ~3-~3: There might be a mistake here.
Context: ...red in src/ directory ## init.py Main classes for the Lightspeed-stack. ...

(QB_NEW_EN)

[grammar] ~6-~6: There might be a mistake here.
Context: ...for the Lightspeed-stack. ## client.py LLama stack client retrieval. ## [confi...

(QB_NEW_EN)

[grammar] ~9-~9: There might be a mistake here.
Context: ... client retrieval. ## configuration.py Configuration loader. ## [constants.py]...

(QB_NEW_EN)

[grammar] ~12-~12: There might be a mistake here.
Context: ... Configuration loader. ## constants.py Constants used in business logic. ## [l...

(QB_NEW_EN)

[grammar] ~15-~15: There might be a mistake here.
Context: ...business logic. ## lightspeed_stack.py Lightspeed stack. ## log.py L...

(QB_NEW_EN)

[grammar] ~18-~18: There might be a mistake here.
Context: ..._stack.py) Lightspeed stack. ## log.py Log utilities. ## [version.py](version....

(QB_NEW_EN)

[grammar] ~21-~21: There might be a mistake here.
Context: ...](log.py) Log utilities. ## version.py Service version that is read by project ...

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-pr
GitHub Check: e2e_tests

🔇 Additional comments (1)

src/README.md (1)

3-23: All README links verified

Ran the provided script against src/README.md; all relative links resolve to existing files. No broken links detected—ready to merge.

coderabbitai bot reviewed Aug 26, 2025

View reviewed changes

tisnik added 2 commits August 26, 2025 11:09

LCORE-581: refactored gendoc script

1de5d91

Generated doc for the main module

f85cdfd

tisnik force-pushed the lcore-581-refactored-gendoc-script branch from 247f1e2 to f85cdfd Compare August 26, 2025 09:10

coderabbitai bot reviewed Aug 26, 2025

View reviewed changes

tisnik merged commit 3e23fcb into lightspeed-core:main Aug 26, 2025
17 of 18 checks passed

coderabbitai bot mentioned this pull request Sep 4, 2025

LCORE-632: updated doc #499

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LCORE-581: refactored gendoc script #449

LCORE-581: refactored gendoc script #449

Uh oh!

tisnik commented Aug 26, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 26, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LCORE-581: refactored gendoc script #449

LCORE-581: refactored gendoc script #449

Uh oh!

Conversation

tisnik commented Aug 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Related Tickets & Documents

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tisnik commented Aug 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 26, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)