Skip to content

Conversation

@tadasbar
Copy link
Collaborator

@tadasbar tadasbar commented Mar 13, 2025

This PR adds log retrieval for finished Google Batch jobs.

Log tailing is not available via the Cloud Logging API, so we can only retrieve logs all at once. A simple solution is to retrieve them only after the job has finished.

There was a discussion in #14 about how the API only allows retrieving a single line per request and 60 requests per minute. However, I have discovered that we can retrieve 1,000 lines (stdout/stderr) per request, so 60,000 lines per minute.

In this implementation, if a request fails, it retries after a 60-second delay. If it fails again, the logfile is left as-is - remains incomplete.

This implementation, while not perfect, improves the current situation, where an empty .snakemake/googlebatch_logs/ directory is created. Downloading these logs is highly beneficial, as browsing them in the Google Cloud Console's logging utility is highly painful.

Summary by CodeRabbit

  • New Features

    • Automated log collection for completed jobs is now available. This update ensures that logs from both successful and failed jobs are reliably captured with built-in handling of temporary service limitations to improve overall stability.
  • Bug Fixes

    • Improved error handling for log retrieval to manage service request limitations effectively.
  • Tests

    • Enhanced mock functionality in tests to accommodate new attributes and methods, improving test coverage and reliability.

@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2025

📝 Walkthrough

Walkthrough

The pull request introduces a new method in the GoogleBatchExecutor class to download and locally save logs from the Google Cloud Logging API. The method, invoked from check_active_jobs upon job completion (both FAILED and SUCCEEDED states), includes error handling for ResourceExhausted exceptions by logging warnings and implementing a retry mechanism with a sleep interval. Import statements have been updated accordingly, and a minor comment capitalization fix was applied. Additionally, test mocks have been updated to reflect changes in job attributes and logger behavior.

Changes

File Changes Summary
snakemake_executor_plugin_googlebatch/.../executor.py - Added save_finished_job_logs in GoogleBatchExecutor to download & save logs, including error handling for ResourceExhausted exceptions.
- Updated imports to include ResourceExhausted and logging.
- Corrected capitalization in the cancel_jobs method comment.
tests/tests_mocked_api.py - Modified the create_job mock from BatchServiceClient to include an additional uid attribute in the returned Job object.
- Added a new patch for google.cloud.logging.Client that mocks the logger method to return an object with a list_entries lambda function.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65d5b26 and bc49e38.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_googlebatch/executor.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • snakemake_executor_plugin_googlebatch/executor.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: testing-mocked-api
🔇 Additional comments (4)
snakemake_executor_plugin_googlebatch/executor.py (4)

2-2: Added necessary imports for log retrieval functionality.

The new imports support the log retrieval functionality:

  • time for implementing the retry delay
  • ResourceExhausted for handling rate limiting exceptions
  • logging for accessing the Google Cloud Logging API

Also applies to: 15-16


512-514: Well-placed log retrieval call.

The log retrieval function is appropriately called when jobs reach terminal states (FAILED or SUCCEEDED), which is the correct placement for this functionality since logs are only available after job completion.


580-580: Fixed comment capitalization.

The comment has been corrected to maintain consistent capitalization style.


526-577:

❓ Verification inconclusive

Implementation looks good with robust error handling.

The implementation properly handles log retrieval with appropriate retry logic for rate limiting (ResourceExhausted) and general exceptions. The default parameters align with the PR objectives (retrieving up to 1,000 lines per request with a 60-second retry delay).


🏁 Script executed:

#!/bin/bash
# Verify that google-cloud-logging dependency is managed in the project

echo "Checking for google-cloud-logging in dependency files..."
for file in requirements.txt setup.py pyproject.toml; do
  if [ -f "$file" ]; then
    echo "=== $file ==="
    rg "google-cloud-logging" "$file" || echo "Not found in $file"
  else
    echo "$file not found"
  fi
done

echo -e "\nChecking for imports of google.cloud.logging in the codebase..."
rg "from google.cloud import .*logging" || rg "from google.cloud.logging import"

Length of output: 878


Action: Verify that the dependency management for Google Cloud Logging is handled appropriately, while the log retrieval implementation itself is robust.

  • The log-saving function in snakemake_executor_plugin_googlebatch/executor.py (lines 526–577) effectively handles log retrieval with proper retry logic for API rate limiting via the ResourceExhausted exception and general error handling.
  • The use of default parameters (sleeps=60 and page_size=1000) aligns well with the PR objectives.
  • The verification script confirms that the code imports google.cloud.logging (as seen in both executor.py and example/show-logs.py). However, standard dependency files (requirements.txt, setup.py, pyproject.toml) were not found. Please double-check that your dependency management strategy (perhaps using an alternative method) correctly provisions the google-cloud-logging package.

Overall, the functional aspects of the implementation look solid. Please verify the dependency management approach to ensure consistency across environments.

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@tadasbar tadasbar self-assigned this Mar 13, 2025
@tadasbar tadasbar marked this pull request as ready for review March 13, 2025 18:09
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
.github/workflows/ci_mocked_api.yaml (2)

86-90: Consider adding a non-interactive installation flag.

When installing Docker Compose on line 89, it's often safer for CI to add the -y flag to avoid prompts:

-          sudo apt-get install docker-compose
+          sudo apt-get install -y docker-compose

98-103: Consider verifying the downloaded MinIO Client binary.

For increased security and reliability, you could verify the binary's checksum before making it executable. Example:

 curl -O https://dl.min.io/client/mc/release/linux-amd64/mc
+echo "<INSERT_EXPECTED_SHA256>  mc" > mc.sha256
+sha256sum -c mc.sha256
 chmod +x mc
 sudo mv mc /usr/local/bin/
tests/tests_mocked_api.py (1)

25-31: Consider adding a test case for log entries.

Currently, the logger mock returns an empty list. You may want to add a scenario where list_entries returns a sample log entry to validate that the save_finished_job_logs method writes logs correctly to the file.

snakemake_executor_plugin_googlebatch/executor.py (1)

519-564: Excellent error-handling strategy with retry.

The implementation properly retries after hitting the rate limit (ResourceExhausted). Consider two minor details:

  1. Ensure log_entry.payload is always a string. If structured logging is used later, you may need a format conversion.
  2. Fix the minor spelling typo in line 551 (“retreive” → “retrieve”).

Example fix:

-            self.logger.warning(f"Trying to retreive logs for job {job_uid} once more.")
+            self.logger.warning(f"Trying to retrieve logs for job {job_uid} once more.")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e75c43 and dcab2c6.

⛔ Files ignored due to path filters (1)
  • pyproject.toml is excluded by !pyproject.toml
📒 Files selected for processing (3)
  • .github/workflows/ci_mocked_api.yaml (1 hunks)
  • snakemake_executor_plugin_googlebatch/executor.py (4 hunks)
  • tests/tests_mocked_api.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • tests/tests_mocked_api.py
  • snakemake_executor_plugin_googlebatch/executor.py
🔇 Additional comments (5)
.github/workflows/ci_mocked_api.yaml (2)

91-97: LGTM for MinIO setup.

Setting up MinIO with an action is straightforward and well-contained here. No issues spotted.


104-106: LGTM for MinIO client configuration.

The alias setup is concise and clear. No concerns here.

tests/tests_mocked_api.py (1)

14-14: Good addition of the uid field in the mock.

Adding uid="bar" keeps the mock consistent with the new usage in the executor code that expects a unique job identifier.

snakemake_executor_plugin_googlebatch/executor.py (2)

2-16: New imports are appropriate.

The added imports (time, ResourceExhausted, logging) are necessary for the log retrieval implementation.


505-507: Log retrieval logic on job completion looks good.

Triggering save_finished_job_logs once the job is either "FAILED" or "SUCCEEDED" is a clear and logical approach to finalize logs.

@tadasbar
Copy link
Collaborator Author

@CodeRabbit do the review again

@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2025

@tadasbar I'll review all the changes in your pull request again.

✅ Actions performed

Full review triggered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
.github/workflows/ci_mocked_api.yaml (3)

86-92: Consider storing MinIO credentials in GitHub secrets.
Storing credentials directly in the workflow file can be a security risk if this repository is public or if the credentials are reused elsewhere.

Here's an example of how you might reference secret values:

 - name: Setup minio
   uses: comfuture/minio-action@v1
   with:
-    access_key: minio
-    secret_key: minio123
     port: 9000
+    access_key: ${{ secrets.MINIO_ACCESS_KEY }}
+    secret_key: ${{ secrets.MINIO_SECRET_KEY }}

93-97: Verify download integrity.
When installing the MinIO client from a remote URL, consider verifying the authenticity of the binary via checksums to prevent supply chain attacks.

   - name: Install MinIO Client CLI
     run: |
       curl -O https://dl.min.io/client/mc/release/linux-amd64/mc
       chmod +x mc
       sudo mv mc /usr/local/bin/
+      # Example checksum verification step:
+      # echo "<expected-checksum>  mc" | sha256sum --check

99-102: Use GitHub secrets to configure MinIO client.
Similar to the setup step, consider avoiding plaintext credentials in the workflow.

 - name: Configure MinIO client
   run: |
     mc alias set minio http://localhost:9000 minio minio123
+    # If secrets are stored:
+    # mc alias set minio http://localhost:9000 ${{ secrets.MINIO_ACCESS_KEY }} ${{ secrets.MINIO_SECRET_KEY }}
snakemake_executor_plugin_googlebatch/executor.py (1)

519-564: Consider using str() for log payloads and employing backoff strategies.
log_entry.payload may not always be a pure string and can cause unexpected exceptions if it’s not cast. Additionally, you might consider exponential backoff instead of sleeping a fixed 60 seconds.

-                    logfile.write(log_entry.payload + "\n")
+                    logfile.write(str(log_entry.payload) + "\n")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e75c43 and ede0946.

⛔ Files ignored due to path filters (1)
  • pyproject.toml is excluded by !pyproject.toml
📒 Files selected for processing (3)
  • .github/workflows/ci_mocked_api.yaml (1 hunks)
  • snakemake_executor_plugin_googlebatch/executor.py (4 hunks)
  • tests/tests_mocked_api.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • snakemake_executor_plugin_googlebatch/executor.py
  • tests/tests_mocked_api.py
🔇 Additional comments (4)
snakemake_executor_plugin_googlebatch/executor.py (2)

505-507: Log retrieval logic is appropriate.
Invoking save_finished_job_logs upon job completion (either SUCCEEDED or FAILED) is a straightforward way to retrieve logs after the job ends.


567-567: Comment correction is fine.
No issues; this clarifies when and why jobs are canceled.

tests/tests_mocked_api.py (2)

14-14: Mock return value updated with UID.
This aligns well with the new code that relies on the job UID for log retrieval.


25-31: Mocking the logger for Google Cloud Logging.
This mock structure appropriately tests the save_finished_job_logs flow without requiring real API calls.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
snakemake_executor_plugin_googlebatch/executor.py (1)

519-564: Implementation for downloading and saving job logs.

The log retrieval functionality is well-implemented with appropriate retry logic for handling API rate limits. There are a few suggestions for improvement:

  1. There's an inconsistency in how the log payload is written to file: str(log_entry.payload) in the first try block vs log_entry.payload directly in the retry block.
  2. Consider using append mode ('a') for the second attempt to avoid losing logs that were successfully written before hitting rate limits.
  3. Add additional exception handling for potential IO errors when writing to files.

Here's a suggested improvement to address these concerns:

    def save_finished_job_logs(
        self,
        job_info: SubmittedJobInfo,
        sleeps=60,
        page_size=1000,
    ):
        """
        Download logs using Google Cloud Logging API and save
        them locally. Since tail logging does not work, this function
        is run only at the end of the job.
        """
        job_uid = job_info.aux["batch_job"].uid
        filter_query = f"labels.job_uid={job_uid}"
        logfname = job_info.aux["logfile"]

        log_client = logging.Client(project=self.executor_settings.project)
        logger = log_client.logger("batch_task_logs")

        try:
            with open(logfname, "w", encoding="utf-8") as logfile:
                for log_entry in logger.list_entries(
                    filter_=filter_query,
                    page_size=page_size,
                ):
                    logfile.write(str(log_entry.payload) + "\n")
        except ResourceExhausted:
            self.logger.warning(
                "Too many requests to Google Logging API.\n"
                + f"Skipping logs for job {job_uid} and sleeping for {sleeps}s."
            )
            time.sleep(sleeps)

            self.logger.warning(f"Trying to retrieve logs for job {job_uid} once more.")
            try:
-                with open(logfname, "w", encoding="utf-8") as logfile:
+                with open(logfname, "a", encoding="utf-8") as logfile:
                    for log_entry in logger.list_entries(
                        filter_=filter_query,
                        page_size=page_size,
                    ):
-                        logfile.write(log_entry.payload + "\n")
+                        logfile.write(str(log_entry.payload) + "\n")
            except ResourceExhausted:
                self.logger.warning(
                    "Retry to retrieve logs failed, "
                    + f"the log file {logfname} might be incomplete."
                )
+            except IOError as e:
+                self.logger.warning(
+                    f"Failed to write to log file {logfname}: {str(e)}"
+                )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ede0946 and 511d6d1.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_googlebatch/executor.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • snakemake_executor_plugin_googlebatch/executor.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: testing-mocked-api
🔇 Additional comments (5)
snakemake_executor_plugin_googlebatch/executor.py (5)

2-2: Import added for new log retrieval functionality.

The time module import is necessary for implementing the retry mechanism with time.sleep() in the new save_finished_job_logs method.


505-507: LGTM: Logs are now saved when jobs complete.

This is a good enhancement that calls the new log retrieval method when jobs finish (either successfully or with failures), ensuring logs are available for inspection.


567-567: Improved comment capitalization.

Minor improvement to the comment for consistency.


15-16: Ensure "google-cloud-logging" dependency is properly managed.

The code now imports and uses the Google Cloud Logging client, but it's important to ensure that the "google-cloud-logging" package is properly declared in your dependencies.

#!/bin/bash
echo "Searching for 'google-cloud-logging' in dependency files..."
for file in requirements.txt setup.py pyproject.toml; do
  if [ -f "$file" ]; then
    echo "=== $file ==="
    grep "google-cloud-logging" "$file" || echo "No reference in $file."
  else
    echo "$file not found."
  fi
done

Also applies to: 519-564


15-16:

❓ Verification inconclusive

New imports support log retrieval functionality.

The addition of ResourceExhausted exception and the logging client are appropriate for implementing the new log retrieval functionality.


🏁 Script executed:

#!/bin/bash
# Verify that google-cloud-logging is properly installed/imported in the project
grep -r "google-cloud-logging" --include="*.py" --include="*.toml" --include="requirements.txt" --include="setup.py" .

Length of output: 114


I'm not seeing evidence of the dependency when searching for the string "google-cloud-logging", likely because the package is imported with its module name ("from google.cloud import logging") rather than its PyPI package name. To improve our verification, please run the following script, which searches for the actual import statement and checks for the dependency in common dependency files:

Once you verify that the expected import and dependency declarations are in place, we can confidently conclude that the new log retrieval functionality is properly supported by these changes.


🏁 Script executed:

#!/bin/bash
echo "Verifying 'from google.cloud import logging' in the repository:"
grep -R "from google.cloud import logging" .

echo ""
echo "Checking for dependency 'google-cloud-logging' in requirements.txt (if present):"
if [ -f requirements.txt ]; then
    grep -R "google-cloud-logging" requirements.txt || echo "Not found in requirements.txt"
else
    echo "requirements.txt not found"
fi

echo ""
echo "Checking for dependency 'google-cloud-logging' in setup.py (if present):"
if [ -f setup.py ]; then
    grep -R "google-cloud-logging" setup.py || echo "Not found in setup.py"
else
    echo "setup.py not found"
fi

Length of output: 833


Log Retrieval Import & Dependency Confirmation

  • The new import statements in snakemake_executor_plugin_googlebatch/executor.py—specifically importing ResourceExhausted and using from google.cloud import logging—adequately support the new log retrieval functionality.
  • Our verification confirms that the module import is present in the code. However, no explicit reference to the google-cloud-logging package was found in requirements.txt or setup.py. If dependency management is handled elsewhere (for instance via a pyproject.toml, Pipfile, or an environment where this package is pre-installed), no further action is necessary. Otherwise, please ensure that the google-cloud-logging package is properly declared to avoid runtime issues.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
snakemake_executor_plugin_googlebatch/executor.py (1)

519-564: Implement proper error handling, but consider potential improvements

The implementation does a good job handling the ResourceExhausted exception and attempting to retry after a delay. However, there are a few possible improvements to consider:

  1. The retry logic contains duplicated code for log retrieval
  2. There's no handling for other potential exceptions (e.g., permission issues, network errors)
  3. Consider implementing exponential backoff instead of a fixed delay

Here's a potential implementation with reduced duplication and more comprehensive error handling:

def save_finished_job_logs(
    self,
    job_info: SubmittedJobInfo,
    sleeps=60,
    page_size=1000,
):
    """
    Download logs using Google Cloud Logging API and save
    them locally. Since tail logging does not work, this function
    is run only at the end of the job.
    """
    job_uid = job_info.aux["batch_job"].uid
    filter_query = f"labels.job_uid={job_uid}"
    logfname = job_info.aux["logfile"]

    log_client = logging.Client(project=self.executor_settings.project)
    logger = log_client.logger("batch_task_logs")

-    try:
-        with open(logfname, "w", encoding="utf-8") as logfile:
-            for log_entry in logger.list_entries(
-                filter_=filter_query,
-                page_size=page_size,
-            ):
-                logfile.write(str(log_entry.payload) + "\n")
-    except ResourceExhausted:
-        self.logger.warning(
-            "Too many requests to Google Logging API.\n"
-            + f"Skipping logs for job {job_uid} and sleeping for {sleeps}s."
-        )
-        time.sleep(sleeps)
-
-        self.logger.warning(f"Trying to retrieve logs for job {job_uid} once more.")
-        try:
-            with open(logfname, "w", encoding="utf-8") as logfile:
-                for log_entry in logger.list_entries(
-                    filter_=filter_query,
-                    page_size=page_size,
-                ):
-                    logfile.write(str(log_entry.payload) + "\n")
-        except ResourceExhausted:
-            self.logger.warning(
-                "Retry to retrieve logs failed, "
-                + f"the log file {logfname} might be incomplete."
-            )
+    max_retries = 1
+    retry_count = 0
+    
+    while retry_count <= max_retries:
+        try:
+            with open(logfname, "w" if retry_count == 0 else "a", encoding="utf-8") as logfile:
+                if retry_count > 0:
+                    logfile.write("\n--- Log retrieval resumed after retry ---\n\n")
+                
+                for log_entry in logger.list_entries(
+                    filter_=filter_query,
+                    page_size=page_size,
+                ):
+                    logfile.write(str(log_entry.payload) + "\n")
+                
+                # If we get here, retrieval was successful
+                break
+                
+        except ResourceExhausted:
+            retry_count += 1
+            if retry_count <= max_retries:
+                self.logger.warning(
+                    "Too many requests to Google Logging API.\n"
+                    + f"Skipping logs for job {job_uid} and sleeping for {sleeps}s before retry {retry_count}/{max_retries}."
+                )
+                time.sleep(sleeps)
+            else:
+                self.logger.warning(
+                    "Maximum retries reached, "
+                    + f"the log file {logfname} might be incomplete."
+                )
+        except Exception as e:
+            self.logger.warning(
+                f"Failed to retrieve logs for job {job_uid}: {str(e)}. "
+                + f"The log file {logfname} might be incomplete."
+            )
+            break
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 511d6d1 and 45c2d27.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_googlebatch/executor.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • snakemake_executor_plugin_googlebatch/executor.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: testing-mocked-api
🔇 Additional comments (3)
snakemake_executor_plugin_googlebatch/executor.py (3)

15-16: Ensure "google-cloud-logging" is included in project dependencies

You've added imports for Google Cloud Logging functionality, but the project may be missing the explicit dependency for "google-cloud-logging" in its dependency files (requirements.txt, setup.py, or pyproject.toml).


505-507: LGTM: Good addition of log saving for completed jobs

The implementation correctly calls save_finished_job_logs when a job reaches terminal states (FAILED or SUCCEEDED), which aligns with the PR objective to retrieve logs after job completion.


567-567: Small comment capitalization correction

You've fixed the capitalization in the method documentation comment to maintain consistency.

@tadasbar tadasbar moved this from In progress to In review in Snakemake Hackathon March 2025 Mar 14, 2025
@johanneskoester johanneskoester requested a review from jakevc March 14, 2025 21:43
Copy link
Contributor

@jakevc jakevc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be cleaner to implement a retry function for the with(open) operation so you don't have to call that logic twice

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
snakemake_executor_plugin_googlebatch/executor.py (1)

526-571: Good implementation of log retrieval with proper error handling

The new save_finished_job_logs method effectively handles the retrieval of logs from the Google Cloud Logging API, with appropriate error handling for rate limiting and retries.

Consider enhancing the implementation with:

  1. Adding more comprehensive error handling for other potential API exceptions
  2. Returning a boolean indicating success or failure for logging/reporting purposes
  3. Adding debug logging for successful retrieval with the count of log entries
def save_finished_job_logs(
    self,
    job_info: SubmittedJobInfo,
    sleeps=60,
    page_size=1000,
):
    """
    Download logs using Google Cloud Logging API and save
    them locally. Since tail logging does not work, this function
    is run only at the end of the job.
    """
    job_uid = job_info.aux["batch_job"].uid
    filter_query = f"labels.job_uid={job_uid}"
    logfname = job_info.aux["logfile"]

    log_client = logging.Client(project=self.executor_settings.project)
    logger = log_client.logger("batch_task_logs")

-    def attempt_log_save(fname, query, page_size):
+    def attempt_log_save(fname, query, page_size):
+        count = 0
        with open(fname, "w", encoding="utf-8") as logfile:
            for log_entry in logger.list_entries(
                filter_=query,
                page_size=page_size,
            ):
                logfile.write(str(log_entry.payload) + "\n")
+                count += 1
+        return count

    self.logger.info(f"Saving logs for Batch job {job_uid} to {logfname}.")

+    success = False
    try:
-        attempt_log_save(logfname, filter_query, page_size)
+        count = attempt_log_save(logfname, filter_query, page_size)
+        self.logger.debug(f"Successfully saved {count} log entries to {logfname}")
+        success = True
    except ResourceExhausted:
        self.logger.warning(
            "Too many requests to Google Logging API.\n"
            + f"Skipping logs for job {job_uid} and sleeping for {sleeps}s."
        )
        time.sleep(sleeps)

        self.logger.warning(f"Trying to retrieve logs for job {job_uid} once more.")
        try:
-            attempt_log_save(logfname, filter_query, page_size)
+            count = attempt_log_save(logfname, filter_query, page_size)
+            self.logger.debug(f"Successfully saved {count} log entries to {logfname} on retry")
+            success = True
        except ResourceExhausted:
            self.logger.warning(
                "Retry to retrieve logs failed, "
                + f"the log file {logfname} might be incomplete."
            )
+    except Exception as e:
+        self.logger.error(f"Failed to retrieve logs for job {job_uid}: {str(e)}")
+
+    return success
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1996b10 and 1fabd6d.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_googlebatch/executor.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • snakemake_executor_plugin_googlebatch/executor.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: testing-mocked-api
🔇 Additional comments (3)
snakemake_executor_plugin_googlebatch/executor.py (3)

15-16: Ensure google-cloud-logging dependency is properly managed

The change adds a new import for the Google Cloud Logging client, which requires the google-cloud-logging package. A previous review already flagged this, but it's worth re-emphasizing.

#!/bin/bash
# Checking if google-cloud-logging is declared in dependency files
for file in requirements.txt setup.py pyproject.toml; do
  if [ -f "$file" ]; then
    echo "=== Checking $file ==="
    grep -q "google-cloud-logging" "$file" && echo "✅ Found google-cloud-logging in $file" || echo "❌ Not found in $file"
  fi
done

512-514: LGTM - Appropriate placement for log retrieval

The code correctly calls the log retrieval function when a job reaches a terminal state (FAILED or SUCCEEDED), ensuring logs are captured immediately after job completion.


574-574: LGTM - Comment capitalization fix

Good catch correcting the capitalization in the comment for better consistency.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
snakemake_executor_plugin_googlebatch/executor.py (1)

526-577: Well-implemented log retrieval with proper error handling

The implementation of save_finished_job_logs is well-structured with:

  • Clear parameter definitions including defaults for pagination and retry timing
  • Appropriate error handling for ResourceExhausted exceptions with a retry mechanism
  • Comprehensive logging to inform users of the process and any issues

The retry mechanism with a 60-second sleep interval addresses the API limitation of 60 requests per minute mentioned in the PR description.

Consider implementing a more robust retry mechanism with exponential backoff instead of a fixed sleep time. This would help in cases where the API might be under sustained heavy load:

-    def save_finished_job_logs(
-        self,
-        job_info: SubmittedJobInfo,
-        sleeps=60,
-        page_size=1000,
-    ):
+    def save_finished_job_logs(
+        self,
+        job_info: SubmittedJobInfo,
+        initial_sleep=60,
+        max_retries=2,
+        page_size=1000,
+    ):

And then in the exception handling:

-        try:
-            attempt_log_save(logfname, filter_query, page_size)
-        except ResourceExhausted:
-            self.logger.warning(
-                "Too many requests to Google Logging API.\n"
-                + f"Skipping logs for job {job_uid} and sleeping for {sleeps}s."
-            )
-            time.sleep(sleeps)
-
-            self.logger.warning(
-                f"Trying to retrieve logs for Batch job {job_uid} once more."
-            )
-            try:
-                attempt_log_save(logfname, filter_query, page_size)
-            except ResourceExhausted:
-                self.logger.warning(
-                    "Retry to retrieve logs failed, "
-                    + f"the log file {logfname} might be incomplete."
-                )
+        retries = 0
+        current_sleep = initial_sleep
+        
+        while True:
+            try:
+                attempt_log_save(logfname, filter_query, page_size)
+                break
+            except ResourceExhausted:
+                retries += 1
+                if retries >= max_retries:
+                    self.logger.warning(
+                        "Max retries reached. "
+                        + f"The log file {logfname} might be incomplete."
+                    )
+                    break
+                
+                self.logger.warning(
+                    "Too many requests to Google Logging API.\n"
+                    + f"Retry {retries}/{max_retries}: Sleeping for {current_sleep}s."
+                )
+                time.sleep(current_sleep)
+                current_sleep *= 2  # Exponential backoff
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fabd6d and 7e643d6.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_googlebatch/executor.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • snakemake_executor_plugin_googlebatch/executor.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: testing-mocked-api
🔇 Additional comments (4)
snakemake_executor_plugin_googlebatch/executor.py (4)

2-2: Import added for new functionality

The time module is correctly imported to support the sleep functionality in the new log retrieval method.


512-514: Good placement for log retrieval after job completion

The placement of the save_finished_job_logs call is correct - it's triggered for both successful and failed jobs, which ensures logs are always retrieved once a job is complete.


580-580: Fixed comment capitalization

This is a minor fix to capitalize the first letter of the method comment.


15-16:

❓ Verification inconclusive

API exception handling imports look good

The imports for ResourceExhausted and logging from Google Cloud libraries are appropriate for the new log retrieval functionality.


🏁 Script executed:

#!/bin/bash
# Verify google-cloud-logging dependency is properly managed
rg "google-cloud-logging" -g "requirements.txt" -g "setup.py" -g "pyproject.toml"

Length of output: 224


Final Verification of Google Cloud Logging Dependency Import

  • The import statements at snakemake_executor_plugin_googlebatch/executor.py (lines 15–16) correctly bring in DeadlineExceeded, ResourceExhausted, and the logging module from Google Cloud, aligning with the needs of the log retrieval functionality.
  • An initial search for the string "google-cloud-logging" in standard dependency files (i.e. requirements.txt, setup.py, or pyproject.toml) did not yield any output. This may be due to filtering by the search command or might indicate that the dependency isn’t explicitly listed.
  • Action Required: Please manually verify that the dependency for google-cloud-logging is properly declared or managed within your project’s dependency files to ensure that the runtime environment includes the correct package.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
snakemake_executor_plugin_googlebatch/executor.py (2)

544-551: Consider adding docstring to the inner function.

The inner helper function attempt_log_save would benefit from a brief docstring describing its purpose and parameters.

Add a docstring to improve code readability:

        def attempt_log_save(fname, logger, query, page_size):
+            """
+            Helper function to retrieve logs and write them to a file.
+            
+            Args:
+                fname: Path to the log file
+                logger: Google Cloud logging logger
+                query: Filter query for logs
+                page_size: Number of log entries to retrieve per request
+            """
            with open(fname, "w", encoding="utf-8") as logfile:
                for log_entry in logger.list_entries(
                    filter_=query,
                    page_size=page_size,
                ):
                    logfile.write(str(log_entry.payload) + "\n")

545-551: Consider adding directory existence check before writing to the log file.

The code should ensure that the directory for the log file exists before attempting to write to it.

Add a directory existence check:

        def attempt_log_save(fname, logger, query, page_size):
+            # Ensure the directory exists
+            os.makedirs(os.path.dirname(fname), exist_ok=True)
            with open(fname, "w", encoding="utf-8") as logfile:
                for log_entry in logger.list_entries(
                    filter_=query,
                    page_size=page_size,
                ):
                    logfile.write(str(log_entry.payload) + "\n")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e643d6 and 65d5b26.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_googlebatch/executor.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • snakemake_executor_plugin_googlebatch/executor.py
🪛 Ruff (0.8.2)
snakemake_executor_plugin_googlebatch/executor.py

542-542: Local variable logger is assigned to but never used

Remove assignment to unused variable logger

(F841)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: testing-mocked-api
🔇 Additional comments (2)
snakemake_executor_plugin_googlebatch/executor.py (2)

512-514: The log retrieval implementation looks good.

The code appropriately triggers log retrieval after a job has completed (either failed or succeeded), which aligns with the PR's objective of retrieving logs after job finalization.


15-16: Ensure "google-cloud-logging" is added as a dependency.

A previous review comment indicated that "google-cloud-logging" might not be explicitly listed in the project dependencies. This could cause runtime issues when importing the module.

Please ensure that "google-cloud-logging" is added to your dependency management file to avoid potential runtime issues.

@tadasbar
Copy link
Collaborator Author

tadasbar commented Apr 5, 2025

Thanks for the suggestion @jakevc.

@tadasbar tadasbar requested a review from jakevc April 5, 2025 17:34
@tadasbar tadasbar merged commit c273f29 into main Apr 8, 2025
5 checks passed
@tadasbar tadasbar deleted the fix-implement-post-job-logging branch April 8, 2025 08:49
@github-project-automation github-project-automation bot moved this from In review to Done in Snakemake Hackathon March 2025 Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

4 participants