Skip to content

Conversation

@tisnik
Copy link
Contributor

@tisnik tisnik commented Aug 26, 2025

Description

LCORE-581: refactored gendoc script

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue #LCORE-581

Summary by CodeRabbit

  • Refactor
    • Reorganized the documentation-generation script into modular, reusable components while preserving behavior.
    • Improved reliability by safely handling directory changes and restoring the working directory after processing.
    • Centralized orchestration to regenerate per-directory README files for the source folder and all nested subdirectories.
  • Documentation
    • Added a new README in the source directory that lists each source file with a brief one-line description.
  • Chores
    • Internal cleanup to streamline future maintenance and automation.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 26, 2025

Walkthrough

Refactors scripts/gen_doc.py by extracting per-directory README generation into generate_docfile(directory) and generate_documentation_on_path(path); main() now iterates src/ and its subdirectories via Path.rglob("*"), performs CWD swaps in a finally block, and preserves AST/docstring parsing and per-directory README behavior.

Changes

Cohort / File(s) Summary of Changes
Doc generation refactor
scripts/gen_doc.py
Replaced inline directory traversal and README generation with modular functions: generate_docfile(directory), generate_documentation_on_path(path), and main(). main() enumerates src/ and subdirectories via Path("src").rglob("*"), the coordinator changes CWD, calls the generator, and restores CWD. AST parsing and first-docstring extraction behavior unchanged.
Generated README added
src/README.md
New README listing src files and one-line descriptions extracted/generated for each module (e.g., __init__.py, client.py, configuration.py, constants.py, lightspeed_stack.py, log.py, version.py).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant U as CLI/User
    participant M as main()
    participant FS as Path("src").rglob("*")
    participant G as generate_documentation_on_path(path)
    participant D as generate_docfile(directory)
    participant IO as Filesystem

    U->>M: invoke script
    M->>FS: enumerate src/ and subdirectories
    loop per directory
        M->>G: call generate_documentation_on_path(dir)
        rect rgba(220,235,255,0.4)
        note right of G: save current CWD
        G->>IO: chdir(dir)
        G->>D: call generate_docfile(dir)
        D->>IO: scan *.py and parse AST for first docstring line
        D->>IO: write README.md in dir
        G-->>IO: finally: restore previous CWD
        end
    end
    M-->>U: exit
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I hop through folders, tidy and neat,
I fetch first lines where docstrings meet.
I write a README, then hop back home—
No loose crumbs left where rabbits roam. 🐇📜

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
scripts/gen_doc.py (4)

11-34: Prefer zero side-effects: avoid chdir and operate on Path objects end-to-end.

Repeatedly changing CWD is fragile (harder to reuse from other scripts, risks surprising interactions, complicates error handling). Since you already pass the target path around, you can generate README.md under that directory without chdir. This also nicely decouples generate_docfile from global state.

Apply this diff to eliminate chdir and make both functions path-driven:

 def generate_docfile(directory):
-    """Generate README.md in the CWD."""
-    with open("README.md", "w", encoding="utf-8", newline="\n") as indexfile:
-        print(
-            f"# List of source files stored in `{directory}` directory",
-            file=indexfile,
-        )
-        print("", file=indexfile)
-        files = sorted(os.listdir())
-
-        for file in files:
-            if file.endswith(".py"):
-                print(f"## [{file}]({file})", file=indexfile)
-                with open(file, "r", encoding="utf-8") as fin:
-                    source = fin.read()
-                try:
-                    mod = ast.parse(source)
-                    doc = ast.get_docstring(mod)
-                except SyntaxError:
-                    doc = None
-                if doc:
-                    print(doc.splitlines()[0], file=indexfile)
-                print(file=indexfile)
+    """Generate README.md under `directory` (no global CWD changes).**
+    """
+    # Local import to avoid touching top-level imports; also safe for reuse.
+    import tokenize
+    dir_path = Path(directory)
+    readme_path = dir_path / "README.md"
+    with open(readme_path, "w", encoding="utf-8", newline="\n") as indexfile:
+        print(
+            f"# List of source files stored in `{dir_path.as_posix()}` directory",
+            file=indexfile,
+        )
+        print("", file=indexfile)
+        for py_path in sorted(dir_path.glob("*.py")):
+            print(f"## [{py_path.name}]({py_path.name})", file=indexfile)
+            try:
+                with tokenize.open(py_path) as fin:
+                    source = fin.read()
+                mod = ast.parse(source)
+                doc = ast.get_docstring(mod)
+            except (SyntaxError, UnicodeDecodeError):
+                doc = None
+            if doc:
+                print(doc.splitlines()[0], file=indexfile)
+            print(file=indexfile)
 
 def generate_documentation_on_path(path):
-    """Generate documentation for all the sources found in path."""
-    directory = path
-    cwd = os.getcwd()
-    os.chdir(directory)
-    print(directory)
-
-    try:
-        generate_docfile(directory)
-    finally:
-        os.chdir(cwd)
+    """Generate documentation for all the sources found in `path`."""
+    print(path)
+    generate_docfile(path)

Also applies to: 36-47


21-23: Minor: rename loop variable from file to filename for clarity.

Small readability tweak; avoids shadowing a historically common name and clarifies intent.

Apply this tiny diff:

-        for file in files:
-            if file.endswith(".py"):
-                print(f"## [{file}]({file})", file=indexfile)
+        for filename in files:
+            if filename.endswith(".py"):
+                print(f"## [{filename}]({filename})", file=indexfile)

41-41: Nit: make progress output more informative (or toggle via a verbosity flag).

Plain print(directory) is terse. Consider a prefix or using logging for better UX.

Example minimal change:

-    print(directory)
+    print(f"[gendoc] Generating README.md in: {directory}")

49-55: Make execution robust to current working directory by anchoring src relative to this script.

If someone runs the script from outside the repo root, Path("src") may not resolve. Deriving src relative to this file avoids that.

Apply:

 def main():
-    """Entry point to this script, regenerates documentation in all directories."""
-    generate_documentation_on_path("src/")
-    for path in Path("src").rglob("*"):
+    """Entry point to this script, regenerates documentation in all directories."""
+    repo_root = Path(__file__).resolve().parents[1]  # repo root (parent of scripts/)
+    src_dir = repo_root / "src"
+    generate_documentation_on_path(src_dir)
+    for path in src_dir.rglob("*"):
         if path.is_dir():
             generate_documentation_on_path(path)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 30a432a and 247f1e2.

📒 Files selected for processing (1)
  • scripts/gen_doc.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests
🔇 Additional comments (2)
scripts/gen_doc.py (2)

11-17: LGTM: Clear modularization and safe CWD restoration.

Nice refactor extracting per-directory logic into functions and guarding CWD changes with a finally block. This improves readability and testability of the script.

Also applies to: 36-47, 49-55


24-30: Manual verification required: confirm encoding compliance for Python sources

I ran the suggested script to detect any Python files declaring a non-UTF-8/UTF8 coding cookie, but it produced no output. Absence of matches may simply mean there are no such declarations, but it’s not a definitive guarantee.

Please manually verify that:

  • No Python files in the repository declare a different source encoding via a # coding: or # -*- coding: -*- header.
  • All files you intend to parse are indeed UTF-8 (the default in Python 3) or otherwise safe to read with encoding="utf-8".

If you do have any non-UTF-8 files, consider updating the snippet to use tokenize.open, which respects PEP 263:

-                with open(file, "r", encoding="utf-8") as fin:
-                    source = fin.read()
+                # Respect source coding cookie (PEP 263)
+                import tokenize
+                with tokenize.open(file) as fin:
+                    source = fin.read()

@tisnik tisnik force-pushed the lcore-581-refactored-gendoc-script branch from 247f1e2 to f85cdfd Compare August 26, 2025 09:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
src/README.md (7)

1-2: Add an auto-generated notice at the top.

Since this README is produced by scripts/gen_doc.py, add a header to prevent accidental manual edits and reduce churn in PRs.

Apply this diff:

+<!-- This file is auto-generated by scripts/gen_doc.py. Do not edit manually. -->
+
 # List of source files stored in `src/` directory

4-4: Fix “Lightspeed-stack” hyphenation.

Use “Lightspeed stack” consistently.

-Main classes for the Lightspeed-stack.
+Main classes for the Lightspeed stack.
-Lightspeed stack.
+Lightspeed stack.

Also applies to: 16-16


7-7: Correct “LLama” capitalization and tighten phrasing.

“Llama” is the common styling; “retrieval” is awkward here.

-LLama stack client retrieval.
+Llama stack client.

19-19: Prefer “Logging utilities.”

Minor wording improvement.

-Log utilities.
+Logging utilities.

21-22: Clarify what reads the version.

“Project manager tools” is ambiguous. Suggest phrasing that matches typical build/packaging usage.

-Service version that is read by project manager tools.
+Service version string exposed for build and packaging tools.

1-24: Option: note generation timestamp and source path.

If the generator supports it, embedding the generation date and source path aids traceability in downstream docs.

Example to have gen_doc.py prepend:

<!-- Generated by scripts/gen_doc.py on 2025-08-26 from src/*.py -->

24-24: Ensure a single trailing newline.

Minor formatting nit: keep exactly one newline at EOF to satisfy common linters.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 247f1e2 and f85cdfd.

📒 Files selected for processing (2)
  • scripts/gen_doc.py (1 hunks)
  • src/README.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/gen_doc.py
🧰 Additional context used
🪛 LanguageTool
src/README.md

[grammar] ~3-~3: There might be a mistake here.
Context: ...red in src/ directory ## init.py Main classes for the Lightspeed-stack. ...

(QB_NEW_EN)


[grammar] ~6-~6: There might be a mistake here.
Context: ...for the Lightspeed-stack. ## client.py LLama stack client retrieval. ## [confi...

(QB_NEW_EN)


[grammar] ~9-~9: There might be a mistake here.
Context: ... client retrieval. ## configuration.py Configuration loader. ## [constants.py]...

(QB_NEW_EN)


[grammar] ~12-~12: There might be a mistake here.
Context: ... Configuration loader. ## constants.py Constants used in business logic. ## [l...

(QB_NEW_EN)


[grammar] ~15-~15: There might be a mistake here.
Context: ...business logic. ## lightspeed_stack.py Lightspeed stack. ## log.py L...

(QB_NEW_EN)


[grammar] ~18-~18: There might be a mistake here.
Context: ..._stack.py) Lightspeed stack. ## log.py Log utilities. ## [version.py](version....

(QB_NEW_EN)


[grammar] ~21-~21: There might be a mistake here.
Context: ...](log.py) Log utilities. ## version.py Service version that is read by project ...

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests
🔇 Additional comments (1)
src/README.md (1)

3-23: All README links verified

Ran the provided script against src/README.md; all relative links resolve to existing files. No broken links detected—ready to merge.

@tisnik tisnik merged commit 3e23fcb into lightspeed-core:main Aug 26, 2025
17 of 18 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Sep 4, 2025
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant