-
Notifications
You must be signed in to change notification settings - Fork 24
fix: SP-2195 timeout error during dependency scan #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: SP-2195 timeout error during dependency scan #108
Conversation
Caution Review failedThe pull request is closed. WalkthroughThe pull request updates the project documentation and refactors the dependency scanning process. The changelog now includes a new version entry (1.20.5) with details on a timeout issue fix. In the source code, the dependency scanning method now leverages concurrent processing using a ThreadPoolExecutor, replacing sequential file processing. An inner function is introduced to handle individual file requests with improved error handling and response aggregation. No changes were made to the declarations of exported or public entities. Changes
Sequence Diagram(s)sequenceDiagram
participant C as Client
participant S as get_dependencies_json()
participant T as ThreadPoolExecutor
participant P as process_file
C->>S: Call get_dependencies_json(request)
S->>T: Submit process_file for each file concurrently
T->>P: Execute process_file(file)
P-->>T: Return individual file response (or error)
T->>S: Aggregate all responses
S-->>C: Return final aggregated response with status and file list
Possibly related PRs
Suggested reviewers
Poem
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
3dd5268
to
342ac5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/scanoss/scanossgrpc.py (1)
68-68
: Make concurrency configurable
Consider allowingMAX_CONCURRENT_REQUESTS
to be set by an environment variable or passed in as a parameter.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
CHANGELOG.md
(2 hunks)Dockerfile
(1 hunks)src/scanoss/__init__.py
(1 hunks)src/scanoss/scanossgrpc.py
(11 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md
[duplication] ~13-~13: Possible typo: you repeated a word.
Context: ...hanges... ## [1.20.5] - 2025-03-14 ### Fixed - Fixed timeout issue with dependency scan ## ...
(ENGLISH_WORD_REPEAT_RULE)
🔇 Additional comments (14)
src/scanoss/__init__.py (1)
25-25
: Correct version bump
The updated version to 1.20.5 properly reflects the changes noted in the changelog.CHANGELOG.md (2)
12-19
: Changelog updates
Adding entries for versions 1.20.4 and 1.20.5 is consistent with the newly introduced features and fixes.🧰 Tools
🪛 LanguageTool
[duplication] ~13-~13: Possible typo: you repeated a word.
Context: ...hanges... ## [1.20.5] - 2025-03-14 ### Fixed - Fixed timeout issue with dependency scan ## ...(ENGLISH_WORD_REPEAT_RULE)
482-484
: Linked version references
These references to compare past releases (1.20.3, 1.20.4, and 1.20.5) align correctly with the version timeline.Dockerfile (1)
1-1
: Confirm environment compatibility
Switching frompython:3.10-slim-buster
topython:3.10-slim
is valid. Please verify that the new base image satisfies all required system dependencies for your build.Would you like me to generate a script to check for needed packages?
src/scanoss/scanossgrpc.py (10)
25-30
: Concurrency import
Usingconcurrent.futures
is appropriate for handling parallel tasks.
232-241
: Comprehensive input validation
Exiting early whendependencies
orfiles_json
is missing will prevent unnecessary calls and errors downstream.
242-260
: Robust concurrency approach
Encapsulating file processing in a dedicated function improves readability and error handling within the thread pool.
262-270
: Thread pool usage
Using aThreadPoolExecutor
for parallel requests can significantly reduce total processing time for network-bound operations.
272-280
: Status merging logic
Overwriting the top-level status on any non-SUCCESS response is correct for a strict success/failure approach. However, confirm if partial success should be indicated differently in future enhancements.Please confirm that it is intentional to mark the overall result as non-SUCCESS upon the first encountered error or warning.
439-455
: Clearer status code definitions
Defining constants and mapping them to more descriptive messages clarifies the status logic.
466-469
: Proxy configuration logs
Providing debug info during proxy setup is beneficial for troubleshooting.
488-489
: Consistent error messages
The error string here aligns with the rest of the code’s approach to indicating missing data.
500-502
: Detailed exception logging
Displaying the exception class and message is valuable for diagnosing issues during gRPC calls.
509-509
: No functional change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/scanoss/scanossgrpc.py (1)
68-68
: Consider making the concurrency limit configurable.The MAX_CONCURRENT_REQUESTS constant is a good practice, but consider making it configurable through initialization parameters or environment variables, similar to how other limits are handled in the codebase.
-MAX_CONCURRENT_REQUESTS = 5 +MAX_CONCURRENT_REQUESTS = int(os.environ.get('SCANOSS_MAX_CONCURRENT_REQUESTS', '5'))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
CHANGELOG.md
(2 hunks)src/scanoss/__init__.py
(1 hunks)src/scanoss/scanossgrpc.py
(11 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/scanoss/init.py
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md
[duplication] ~13-~13: Possible typo: you repeated a word.
Context: ...hanges... ## [1.20.6] - 2025-03-14 ### Fixed - Fixed timeout issue with dependency scan ## ...
(ENGLISH_WORD_REPEAT_RULE)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (11)
CHANGELOG.md (2)
12-14
: LGTM: Clear entry for the timeout fix.The changelog entry properly documents the issue fix that is being addressed in this PR, following the established format of the project.
🧰 Tools
🪛 LanguageTool
[duplication] ~13-~13: Possible typo: you repeated a word.
Context: ...hanges... ## [1.20.6] - 2025-03-14 ### Fixed - Fixed timeout issue with dependency scan ## ...(ENGLISH_WORD_REPEAT_RULE)
490-491
: LGTM: Proper version link added.The link for comparing v1.20.5 to v1.20.6 follows the consistent pattern used throughout the changelog.
src/scanoss/scanossgrpc.py (9)
25-26
: LGTM: Necessary imports for concurrent processing.The addition of concurrent.futures and json imports supports the refactoring of the dependency scanning process to address the timeout issue.
233-240
: LGTM: Improved error handling and validation.The error messages are clear and provide better checking of the input structure.
242-269
: Well-structured concurrent processing implementation.The implementation of the process_file function and ThreadPoolExecutor is well done:
- Each file gets a unique request ID for traceability
- Proper error handling is in place
- The responses are correctly collected for further processing
This addresses the timeout issue by enabling parallel processing of dependency files.
270-280
: LGTM: Effective response merging logic.The implementation for merging responses is robust:
- Uses a constant for status code comparison
- Creates a properly structured merged response
- Correctly propagates error status from individual responses
- Preserves the file entries from all successful responses
This ensures a consistent response structure while handling concurrent processing results.
439-441
: LGTM: Improved readability with constants.Using constants for status codes improves code readability and maintainability.
449-454
: LGTM: Streamlined status code handling.The conditional handling of status codes is now more readable and maintainable with simplified messages and clear condition checks.
466-466
: LGTM: Simplified debug messages.The debug messages are now more concise and clearer.
Also applies to: 469-469
499-501
: LGTM: Consistent error handling pattern.The error handling follows the same pattern as used throughout the class, maintaining consistency in the codebase.
242-280
:❓ Verification inconclusive
Verify this approach fixes the timeout issue.
The implementation appears sound, but it's worth confirming that the concurrency level (MAX_CONCURRENT_REQUESTS = 5) is sufficient to address the timeout issue in production environments, especially with large dependency sets.
🏁 Script executed:
#!/bin/bash # Check for any existing tests that validate the concurrent dependency processing rg -A 3 -B 3 "test.*dependencies.*concurrent|test.*concurrent.*dependencies" --type pyLength of output: 88
Concurrency & Timeout Verification Required
The updated implementation using a ThreadPoolExecutor with MAX_CONCURRENT_REQUESTS = 5 appears sound for processing dependency requests concurrently. However, our search for tests specifically validating this concurrency-based timeout fix returned no results. Please manually verify in a staging or production-like environment that the increased concurrency actually resolves the timeout issues when handling large dependency sets. In addition, consider adding dedicated tests to cover these scenarios to ensure that this fix remains robust in future changes.
""" | ||
|
||
def __init__( | ||
def __init__( # noqa: PLR0913 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this?
resp_dict = MessageToDict(resp, preserving_proto_field_name=True) # Convert gRPC response to a dict | ||
return resp_dict | ||
return None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove white spaces
Summary by CodeRabbit