Skip to content

Conversation

@tisnik
Copy link
Contributor

@tisnik tisnik commented Sep 30, 2025

Description

LCORE-625: Check if transcript directory is writable

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue #LCORE-625

Summary by CodeRabbit

  • New Features
    • Improved validation for configured storage paths: checks now ensure paths are directories and writable when required, and respect whether a path must already exist.
  • Bug Fixes
    • Better error handling and logging when saving transcripts to disk — IO errors are now reported and surfaced with path context.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 30, 2025

Warning

Rate limit exceeded

@tisnik has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 20 minutes and 14 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 93828af and 3388ab8.

📒 Files selected for processing (2)
  • tests/unit/models/config/test_user_data_collection.py (2 hunks)
  • tests/unit/utils/test_checks.py (2 hunks)

Walkthrough

Adds a new directory_check(...) in src/utils/checks.py, uses it from UserDataCollection.check_storage_location_is_set_when_needed in src/models/config.py to validate feedback/transcripts directories, and wraps the file write in store_transcript with an I/O try/except in src/utils/transcripts.py.

Changes

Cohort / File(s) Summary
New directory validation
src/utils/checks.py
Added directory_check(path: FilePath, must_exists: bool, must_be_writable: bool, desc: str) -> None which verifies existence (when must_exists=True), confirms the path is a directory, and optionally checks writability; raises InvalidConfigurationError with descriptive messages on failure.
Config validation uses checker
src/models/config.py
UserDataCollection.check_storage_location_is_set_when_needed now calls checks.directory_check for feedback_path and transcripts_path when those features are enabled; retains prior missing-path errors and adds directory/writability validation.
Transcript write error handling
src/utils/transcripts.py
store_transcript now wraps the file write in a try/except catching IOError/OSError, logs an error including the transcript path on failure, and re-raises the exception.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller as ConfigCheck
  participant Config as models.config:UserDataCollection
  participant Checks as utils.checks:directory_check
  participant FS as Filesystem/OS

  Config->>Checks: directory_check(path, must_exists=True, must_be_writable=..., desc)
  Checks->>FS: stat(path) / isdir(path)
  alt Path missing & must_exists == True
    Checks-->>Config: raise InvalidConfigurationError ("{desc} must be set")
  else Path exists
    alt Not a directory
      Checks-->>Config: raise InvalidConfigurationError ("{desc} is not a directory")
    else Is a directory
      alt must_be_writable == True
        Checks->>FS: access(path, WRITE)
        alt Not writable
          Checks-->>Config: raise InvalidConfigurationError ("{desc} is not writable")
        else Writable
          Checks-->>Config: return None
        end
      else
        Checks-->>Config: return None
      end
    end
  end
  Note over Config,Checks: Called for feedback and transcripts when enabled
Loading
sequenceDiagram
  autonumber
  actor Caller as StoreTranscriptCaller
  participant Transcripts as utils.transcripts:store_transcript
  participant FS as Filesystem/OS
  participant Logger as logs

  Caller->>Transcripts: store_transcript(path, payload)
  Transcripts->>FS: open(path, "w") / write(payload)
  alt Write succeeds
    Transcripts->>Logger: log success
    Transcripts-->>Caller: return
  else Write raises IOError/OSError
    Transcripts->>Logger: log error with path
    Transcripts-->>Caller: re-raise exception
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

I twitch my whiskers, check each gate,
A burrow's path I validate.
I poke for write, I sniff the floor,
I log the fall, I guard the door.
Configs neat — I hop once more. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly references the JIRA issue and accurately summarizes the primary functionality introduced—verifying that the transcript directory is writable—making it clear to reviewers what change to expect without extraneous detail.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/utils/checks.py (1)

60-60: Consider reordering parameters for consistency.

For consistency with similar validation functions, consider placing desc before should_be_writable, as descriptive parameters typically precede behavioral flags in the codebase.

Apply this diff if you choose to reorder:

-def directory_check(path: FilePath, should_be_writable: bool, desc: str) -> None:
+def directory_check(path: FilePath, desc: str, should_be_writable: bool = False) -> None:

Note: This would require updating all call sites, so only consider if you haven't integrated this function yet.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 761bb29 and 2fa47aa.

📒 Files selected for processing (1)
  • src/utils/checks.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/utils/checks.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/utils/checks.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests
🔇 Additional comments (1)
src/utils/checks.py (1)

77-82: LGTM - Implementation is correct.

The implementation correctly:

  • Validates directory existence using os.path.isdir()
  • Conditionally checks writability with os.access(path, os.W_OK)
  • Raises appropriate errors with clear messages
  • Follows the same pattern as file_check

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 30, 2025

Note

Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

@tisnik tisnik force-pushed the lcore-625-check-if-transcript-directory-is-writable branch from 2fa47aa to 86115f9 Compare September 30, 2025 08:47
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2fa47aa and 86115f9.

📒 Files selected for processing (2)
  • src/models/config.py (1 hunks)
  • src/utils/checks.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/utils/checks.py
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/models/config.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/models/config.py
src/{models/config.py,configuration.py}

📄 CodeRabbit inference engine (CLAUDE.md)

src/{models/config.py,configuration.py}: All configuration uses Pydantic models extending ConfigurationBase
Configuration base models must set model_config with extra="forbid" to reject unknown fields

Files:

  • src/models/config.py
src/{models/**/*.py,configuration.py}

📄 CodeRabbit inference engine (CLAUDE.md)

src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)

Files:

  • src/models/config.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation

Files:

  • src/models/config.py
🧬 Code graph analysis (1)
src/models/config.py (1)
src/utils/checks.py (1)
  • directory_check (60-87)
🪛 GitHub Actions: Pyright
src/models/config.py

[error] 238-238: pyright: Argument of type "str" cannot be assigned to parameter "path" of type "FilePath" in function "directory_check". "str" is not assignable to "Path".


[error] 249-249: pyright: Argument of type "str" cannot be assigned to parameter "path" of type "FilePath" in function "directory_check". "str" is not assignable to "Path".

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests
🔇 Additional comments (2)
src/models/config.py (2)

237-242: Incorrect parameter order in directory_check call.

The desc parameter is passed in the wrong position. According to the function signature in src/utils/checks.py, desc is the fourth parameter, not the second.

Apply this diff to fix the parameter order:

             checks.directory_check(
                 self.feedback_storage,
-                desc="Check directory to store feedback",
                 must_exists=False,
                 must_be_writable=True,
+                desc="Check directory to store feedback",
             )

Likely an incorrect or invalid review comment.


248-253: Incorrect parameter order in directory_check call.

The desc parameter is passed in the wrong position. According to the function signature in src/utils/checks.py, desc is the fourth parameter, not the second.

Apply this diff to fix the parameter order:

             checks.directory_check(
                 self.transcripts_storage,
-                must_exists=False,
-                desc="Check directory to store transcripts",
+                must_exists=False,
                 must_be_writable=True,
+                desc="Check directory to store transcripts",
             )

Likely an incorrect or invalid review comment.

@lightspeed-core lightspeed-core deleted a comment from coderabbitai bot Sep 30, 2025
@tisnik tisnik force-pushed the lcore-625-check-if-transcript-directory-is-writable branch 2 times, most recently from 745e06d to e4194f1 Compare September 30, 2025 09:46
@tisnik tisnik force-pushed the lcore-625-check-if-transcript-directory-is-writable branch from e4194f1 to 93828af Compare September 30, 2025 09:48
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/utils/checks.py (1)

70-70: Fix grammatical error in docstring.

The parameter description has a grammatical error: "Should the directory exists?" should be "Should the directory exist?"

Apply this diff:

-        must_exists (bool): Should the directory exists?
+        must_exists (bool): Should the directory exist?
src/utils/transcripts.py (2)

83-88: LGTM! Error handling for file write is appropriate.

The try/except block correctly catches I/O errors, logs them with context, and re-raises to propagate the failure. This provides a runtime safety net complementing the upfront directory writability check.

Minor note: In Python 3, IOError is an alias for OSError, making the tuple redundant (though harmless).

Optionally simplify to:

-    except (IOError, OSError) as e:
+    except OSError as e:

60-60: Consider wrapping directory creation in error handling for clearer diagnostics.

The mkdir call could fail due to runtime issues (disk full, permission changes, path conflicts). While the try/except around the file write will catch these failures, wrapping mkdir separately would provide clearer error messages distinguishing directory creation failures from file write failures.

     transcripts_path = construct_transcripts_path(user_id, conversation_id)
-    transcripts_path.mkdir(parents=True, exist_ok=True)
+    try:
+        transcripts_path.mkdir(parents=True, exist_ok=True)
+    except OSError as e:
+        logger.error("Failed to create transcript directory %s: %s", transcripts_path, e)
+        raise
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 745e06d and 93828af.

📒 Files selected for processing (3)
  • src/models/config.py (1 hunks)
  • src/utils/checks.py (1 hunks)
  • src/utils/transcripts.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/models/config.py
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/utils/checks.py
  • src/utils/transcripts.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/utils/checks.py
  • src/utils/transcripts.py
🧬 Code graph analysis (1)
src/utils/transcripts.py (1)
src/models/config.py (1)
  • dump (565-568)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests
🔇 Additional comments (1)
src/utils/checks.py (1)

79-87: LGTM! Logic correctly handles conditional existence and writability checks.

The implementation correctly handles all valid parameter combinations:

  • Early return when path doesn't exist and must_exists=False is appropriate
  • Subsequent checks (directory type, writability) only run when the path exists
  • Error messages are clear and include the description context

@tisnik tisnik merged commit 80c7938 into lightspeed-core:main Sep 30, 2025
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant