Skip to content

Conversation

@jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Jul 11, 2025

Summary by CodeRabbit

  • Refactor

    • Simplified cache directory handling by consolidating it into the resource dictionary for task execution and scheduling.
    • Removed the need to specify cache directory as a separate parameter in relevant functions and classes.
  • Tests

    • Updated tests to reflect the new approach to cache directory configuration.
    • Added an assertion to ensure that missing required arguments in executor creation raises an appropriate error.

@jan-janssen jan-janssen marked this pull request as draft July 11, 2025 09:03
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 11, 2025

Walkthrough

The changes refactor cache directory handling in the file-based task scheduler and executor. The explicit cache_directory parameter is removed from function signatures and constructors, consolidating cache directory configuration into the resource_dict per task. Associated tests are updated to reflect this new approach, ensuring consistent usage throughout the codebase.

Changes

File(s) Change Summary
executorlib/task_scheduler/file/shared.py Removed cache_directory parameter from execute_tasks_h5; now extracts cache directory from each task's resource_dict. Introduced per-task cache directory handling.
executorlib/task_scheduler/file/task_scheduler.py Removed explicit cache_directory parameter from FileTaskScheduler and create_file_executor. Integrated cache directory into resource_dict. Adjusted function signatures and removed related imports.
tests/test_cache_fileexecutor_serial.py Updated tests to pass cache_directory within resource_dict instead of as a separate argument. Added assertion for TypeError when create_file_executor is called without arguments.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant FileTaskScheduler
    participant execute_tasks_h5

    Client->>FileTaskScheduler: Initialize with resource_dict (includes cache_directory)
    FileTaskScheduler->>execute_tasks_h5: Call with resource_dict (per-task cache_directory)
    execute_tasks_h5->>execute_tasks_h5: Extract cache_directory from each task's resource_dict
    execute_tasks_h5-->>FileTaskScheduler: Task execution results (per-task cache directory used)
Loading

Possibly related PRs

Poem

In the warren of code where the cache once lay,
Now each little task finds its own way.
No more global burrows for files to reside,
Each resource dict keeps its cache inside.
With tests all refreshed and signatures neat,
This hop through the fields makes the scheduler fleet! 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa59964 and 1aff631.

📒 Files selected for processing (1)
  • executorlib/task_scheduler/file/task_scheduler.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • executorlib/task_scheduler/file/task_scheduler.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: notebooks
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
  • GitHub Check: pip_check
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
  • GitHub Check: unittest_mpich (macos-latest, 3.13)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.11)
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
  • GitHub Check: minimal
  • GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
  • GitHub Check: unittest_old
  • GitHub Check: unittest_openmpi (macos-latest, 3.13)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.13)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.12)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.11)
  • GitHub Check: notebooks_integration
  • GitHub Check: unittest_win
  • GitHub Check: unittest_flux_openmpi
  • GitHub Check: unittest_flux_mpich
  • GitHub Check: mypy
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@jan-janssen jan-janssen linked an issue Jul 11, 2025 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Jul 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.89%. Comparing base (7e64c3b) to head (1aff631).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #707   +/-   ##
=======================================
  Coverage   96.89%   96.89%           
=======================================
  Files          29       29           
  Lines        1319     1320    +1     
=======================================
+ Hits         1278     1279    +1     
  Misses         41       41           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ble_cache

# Conflicts:
#	executorlib/task_scheduler/file/task_scheduler.py
@jan-janssen jan-janssen changed the base branch from main to remove_makedirs July 11, 2025 10:23
@jan-janssen jan-janssen marked this pull request as ready for review July 11, 2025 10:24
Base automatically changed from remove_makedirs to main July 11, 2025 13:43
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e64c3b and aa59964.

📒 Files selected for processing (3)
  • executorlib/task_scheduler/file/shared.py (3 hunks)
  • executorlib/task_scheduler/file/task_scheduler.py (4 hunks)
  • tests/test_cache_fileexecutor_serial.py (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/test_cache_fileexecutor_serial.py (2)
executorlib/task_scheduler/file/task_scheduler.py (1)
  • create_file_executor (79-116)
tests/test_singlenodeexecutor_noblock.py (1)
  • resource_dict (11-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: unittest_mpich (ubuntu-latest, 3.12)
  • GitHub Check: unittest_old
  • GitHub Check: unittest_flux_mpich
  • GitHub Check: notebooks
  • GitHub Check: mypy
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
  • GitHub Check: minimal
  • GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
  • GitHub Check: notebooks_integration
🔇 Additional comments (12)
tests/test_cache_fileexecutor_serial.py (5)

61-62: LGTM! Good test coverage for the new required parameter.

The addition of this test case correctly verifies that create_file_executor() raises a TypeError when called without the now-required resource_dict parameter, which aligns with the API change.


64-66: LGTM! Correctly updated test calls to match the new API.

The addition of resource_dict={} to these test calls properly adapts them to the new function signature where resource_dict is now a required parameter.


112-112: LGTM! Correctly moved cache_directory into resource_dict.

The change from passing cache_directory as a top-level parameter to embedding it within resource_dict aligns with the API refactoring described in the PR objectives.


152-152: LGTM! Consistent with the API change.

Moving cache_directory from the top-level kwargs to inside the resource_dict is consistent with the refactoring pattern applied throughout the codebase.


192-192: LGTM! Consistent API usage.

The change maintains consistency with the new approach of embedding cache_directory within the resource_dict rather than passing it as a separate parameter.

executorlib/task_scheduler/file/shared.py (4)

82-82: LGTM! Proper initialization of per-task cache directory tracking.

The addition of cache_dir_dict is necessary to store cache directories per task, supporting the new flexible cache directory approach.


106-106: LGTM! Correct extraction of cache directory from task resources.

The change to extract cache_directory from task_resource_dict using pop() and convert it to an absolute path is appropriate. Using pop() ensures the cache directory is removed from the resource dict after extraction, preventing it from being passed to downstream functions that don't expect it.


150-150: LGTM! Proper storage of cache directory for task tracking.

Storing the cache directory in cache_dir_dict using the task key allows for per-task cache directory retrieval later in the execution flow.


155-158: LGTM! Consistent use of per-task cache directories.

The change to use cache_dir_dict[key] instead of a single global cache_directory variable properly completes the refactoring to support per-task cache directory configuration.

executorlib/task_scheduler/file/task_scheduler.py (3)

53-53: LGTM! Good consolidation of cache directory configuration.

Setting the default cache directory within the resource_dict consolidates cache directory configuration into a single location, which aligns with the PR objectives for more flexible cache directory handling.


63-63: LGTM! Consistent with the API refactoring.

Using resource_dict instead of a separate cache_directory parameter in _process_kwargs is consistent with the broader refactoring to consolidate cache directory configuration.


103-104: LGTM! Maintains backward compatibility for cache_directory parameter.

The logic to add cache_directory to resource_dict when provided maintains backward compatibility with existing code that passes the cache_directory parameter, while integrating it into the new unified resource dictionary approach.



def create_file_executor(
resource_dict: dict,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify the breaking change is intentional and documented.

The change from resource_dict: Optional[dict] = None to resource_dict: dict makes this parameter required, which is a breaking change. This should be intentional and properly documented.


🏁 Script executed:

#!/bin/bash
# Description: Check if this breaking change is mentioned in documentation or changelog
# Expected: Find references to the API change in documentation files

# Search for documentation about this change
fd -e md -e rst -e txt -e py --exec grep -l "resource_dict" {} \; | head -10
echo "---"
# Look for changelog or migration docs
fd -e md -e rst -e txt -e py -I --exec grep -l -i "breaking\|changelog\|migration" {} \; | head -5

Length of output: 608


Add documentation for the breaking change to resource_dict

I didn’t find any mentions of this API change in your documentation or changelog. Since making resource_dict required is a breaking change, please:

  • Add an entry under a “Breaking Changes” section in your CHANGELOG (e.g. docs/CHANGELOG.md), noting that
    executorlib/task_scheduler/file/task_scheduler.py: TaskScheduler.__init__ now requires resource_dict: dict.
  • Update any migration guide or README to reflect the new required parameter.
  • Consider adding a docstring note in executorlib/task_scheduler/file/task_scheduler.py (around line 80) calling out the parameter requirement.
🤖 Prompt for AI Agents
In executorlib/task_scheduler/file/task_scheduler.py around line 80, the
resource_dict parameter in TaskScheduler.__init__ is now required, which is a
breaking change. Add an entry under a "Breaking Changes" section in
docs/CHANGELOG.md describing this new requirement. Update any migration guides
or README files to mention the new required resource_dict parameter. Also, add a
note in the __init__ method's docstring near line 80 to document that
resource_dict is now a mandatory argument.

Copy link
Member

@liamhuber liamhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a doc-migration nit, but otherwise LGTM!

Co-authored-by: Liam Huber <liam.huber@gmail.com>
@jan-janssen jan-janssen merged commit 957170b into main Jul 11, 2025
30 checks passed
@jan-janssen jan-janssen deleted the more_flexible_cache branch July 11, 2025 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] How to set cache_directory at submission time?

3 participants