Conversation
This commit refactors the oasis.py script into multiple modules for better code organization, maintainability, and readability. The core functionality remains the same, but the code is now structured into separate modules for tools, Ollama management, embedding handling, analysis, and report generation. This modularization enhances code reusability and makes it easier to maintain and extend the project.
Contributor
Reviewer's Guide by SourceryThis pull request refactors the codebase to improve organization and maintainability by modularizing the code into separate files and classes. It enhances vulnerability detection with detailed descriptions and examples, improves logging with a custom EmojiFormatter, enhances report generation with HTML templates and CSS styling, adds comprehensive test files, improves cache management, adds new command-line arguments, and updates documentation. Sequence diagram for generating embeddingssequenceDiagram
participant User
participant EmbeddingManager
participant OllamaClient
participant FileSystem
User->>EmbeddingManager: process_input_files(input_path)
activate EmbeddingManager
EmbeddingManager->>FileSystem: parse_input(input_path)
activate FileSystem
FileSystem-->>EmbeddingManager: List[Path]
deactivate FileSystem
loop for each file in List[Path]
EmbeddingManager->>EmbeddingManager: is_valid_file(file_path)
alt is valid file
EmbeddingManager->>EmbeddingManager: index_code_files(files)
activate EmbeddingManager
EmbeddingManager->>OllamaClient: embeddings(model, prompt)
activate OllamaClient
OllamaClient-->>EmbeddingManager: embedding
deactivate OllamaClient
EmbeddingManager->>EmbeddingManager: chunk_content(content, chunk_size)
EmbeddingManager->>FileSystem: save_cache()
activate FileSystem
FileSystem-->>EmbeddingManager: None
deactivate FileSystem
deactivate EmbeddingManager
else is not valid file
EmbeddingManager->>Logger: Skip file
end
end
EmbeddingManager-->>User: None
deactivate EmbeddingManager
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
This commit introduces several enhancements to the reporting and logging functionalities of OASIS, along with improvements to code structure and error handling. Key changes include: - Enhanced Reporting: Reports now use Jinja2 templates for improved styling and formatting. A new CSS file manages the report's visual style. The executive summary report generation is refactored for better readability and maintainability. Heading levels in detailed analysis sections are normalized for consistency. Error messages during analysis are now included in the reports. - Improved Logging: Logging has been enhanced to include more informative error messages, especially regarding Ollama connection issues. A new option allows for saving error logs to a file in silent mode. - Code Restructuring: The oasis.py and embedding.py files have been refactored to improve code organization and modularity. Functions are reorganized and new functions are introduced to handle specific tasks, such as model selection, input processing, and directory setup. The analyze.py file is also refactored to improve code clarity and chunk handling. - Dependency Updates: Added jinja2 as a dependency. - Cache Management: Cache handling is improved with better validation and normalization of cache entries. Cache filtering by file extension is also added. - Model Selection: Model selection is improved with better handling of unavailable models and clearer error messages. The model display now includes more detailed information, such as parent model and context length. Progress bars are used during model information retrieval. - Other Improvements: Several minor improvements and bug fixes are included, such as handling missing analysis keys, ensuring parent directories exist before file creation, and improving code comments.
This change enhances the model selection process, allowing users to select models by index or name. It also improves error handling and logging throughout the application, providing more informative messages and debug information. Additionally, the model display during selection has been improved, and minor updates were made to the embedding and reporting processes.
This commit introduces configuration options for the Ollama endpoint, excluded and default models, and maximum chunk size. The vulnerability analysis is improved with a more comprehensive prompt and a refined vulnerability mapping. Additionally, model-specific emojis and logging keywords are now managed through the configuration file.
This commit introduces several common web vulnerabilities to the test files, including server-side request forgery, XML external entity injection, path traversal, insecure direct object references, authentication issues, and cross-site request forgery. These vulnerabilities are added to Java, C#, Python, shell script, and PHP files for demonstration and testing purposes.
This change introduces a constant EMBEDDING_THRESHOLDS in config.py to store the list of thresholds used for vulnerability analysis. The analyze.py script is updated to use this constant instead of hardcoded threshold values. Additionally, the maximum chunk size for embedding text is reduced. Finally, the vulnerability analysis now filters results below the specified threshold before performing detailed analysis.
This commit refactors the OASIS codebase into a proper Python package. This improves code organization, maintainability, and allows for easier distribution and installation. The main functionality remains unchanged.
Refactoring the Ollama interaction into a dedicated manager class, enhancing model selection and availability checks, and restructuring the analysis and reporting process for better clarity and efficiency. These changes improve the overall user experience and make the codebase more maintainable. Introducing analyze by function (beta)
Improved the extraction and display of model parameter information. The update provides more accurate parameter counts and adds emojis to indicate model size categories (small/fast and large).
This change improves the clarity and usability of the security analysis reports by adding explanatory sections and refining formatting. The reports now include guidance on interpreting similarity scores, risk levels, and recommended actions. Additionally, model display in the Ollama manager is improved, and model size indicators are added.
This change refactors the SecurityAnalyzer class to improve code readability, maintainability, and logging. It introduces helper functions to encapsulate specific tasks like getting vulnerability embeddings, processing files, and processing functions within files. The logic for handling different embedding structures is also consolidated. Additionally, the error logging during vulnerability searches is made more informative. No changes in functionality are intended. Some minor changes were made to other files to improve logging and argument handling. A duplicated CSRF vulnerability was removed from vulnerable.php and an os import was removed from vulnerable.py as it was unused.
This pull request enhances the embedding management and analysis process, improves function extraction, and adds detailed vulnerability information to reports. Key changes include: - Streamlined argument handling with default values. - Improved function extraction using regex and LLM with fallback mechanisms. - Enhanced vulnerability analysis with detailed prompts and vulnerability-specific information. - Added support for vulnerability embeddings with rich prompts. - Improved reporting with detailed vulnerability information and enhanced formatting. - Updated cache file naming and structure. - Added new vulnerability types and patterns. - Improved logging and error handling. - Updated documentation and help texts. - Code cleanup and refactoring for better readability and maintainability.
Owner
Author
|
@sourcery-ai review |
Contributor
There was a problem hiding this comment.
Hey @psyray - I've reviewed your changes and found some issues that need to be addressed.
Blocking issues:
- Hardcoded database password found. (link)
- Hardcoded credentials found. (link)
- Hardcoded credentials found. (link)
Overall Comments:
- Consider adding a command-line option to specify the output directory.
Here's what I looked at during the review
- 🟡 General issues: 11 issues found
- 🔴 Security: 3 blocking issues
- 🟡 Testing: 1 issue found
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This change enhances logging by using logger.exception() for errors, providing more detailed tracebacks. It also refactors the embedding generation process for better clarity and efficiency, including handling for chunked embeddings and improved embedding dimension consistency checks. Additionally, the change includes more robust error handling during file processing and cache management.
This update significantly improves report generation and introduces a new audit feature. Reports now support multiple models, have a cleaner structure, and include an executive summary. Additionally, the audit feature allows for comprehensive vulnerability analysis and reporting. Several dependencies have been updated, and Python 3.9+ is now required. Minor bug fixes and improvements are also included.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This update significantly expands OASIS's capabilities by:
Summary by Sourcery
Refactor the codebase to improve modularity and maintainability. This includes moving the main application logic into a separate package, improving the structure of the project, and updating the entry point.
Chores: