Skip to content

refactor: modularize code for improved organization and maintainability#4

Merged
psyray merged 17 commits intomasterfrom
refactor
Mar 17, 2025
Merged

refactor: modularize code for improved organization and maintainability#4
psyray merged 17 commits intomasterfrom
refactor

Conversation

@psyray
Copy link
Copy Markdown
Owner

@psyray psyray commented Mar 14, 2025

This update significantly expands OASIS's capabilities by:

  • adding support for new vulnerability types,
  • improving report generation, and enhancing the overall user experience,
  • including more detailed vulnerability analysis,
  • improving logging, and better handling of large files,
  • refactoring the codebase for better organization and maintainability,
  • bumping the minimum Python version to 3.9.

Summary by Sourcery

Refactor the codebase to improve modularity and maintainability. This includes moving the main application logic into a separate package, improving the structure of the project, and updating the entry point.

Chores:

  • Move the main application logic into a separate package.
  • Improve the structure of the project.
  • Update the entry point to import from the new package.

This commit refactors the oasis.py script into multiple modules for better code organization, maintainability, and readability. The core functionality remains the same, but the code is now structured into separate modules for tools, Ollama management, embedding handling, analysis, and report generation. This modularization enhances code reusability and makes it easier to maintain and extend the project.
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Mar 14, 2025

Reviewer's Guide by Sourcery

This pull request refactors the codebase to improve organization and maintainability by modularizing the code into separate files and classes. It enhances vulnerability detection with detailed descriptions and examples, improves logging with a custom EmojiFormatter, enhances report generation with HTML templates and CSS styling, adds comprehensive test files, improves cache management, adds new command-line arguments, and updates documentation.

Sequence diagram for generating embeddings

sequenceDiagram
    participant User
    participant EmbeddingManager
    participant OllamaClient
    participant FileSystem

    User->>EmbeddingManager: process_input_files(input_path)
    activate EmbeddingManager
    EmbeddingManager->>FileSystem: parse_input(input_path)
    activate FileSystem
    FileSystem-->>EmbeddingManager: List[Path]
    deactivate FileSystem
    loop for each file in List[Path]
        EmbeddingManager->>EmbeddingManager: is_valid_file(file_path)
        alt is valid file
            EmbeddingManager->>EmbeddingManager: index_code_files(files)
            activate EmbeddingManager
            EmbeddingManager->>OllamaClient: embeddings(model, prompt)
            activate OllamaClient
            OllamaClient-->>EmbeddingManager: embedding
            deactivate OllamaClient
            EmbeddingManager->>EmbeddingManager: chunk_content(content, chunk_size)
            EmbeddingManager->>FileSystem: save_cache()
            activate FileSystem
            FileSystem-->>EmbeddingManager: None
            deactivate FileSystem
            deactivate EmbeddingManager
        else is not valid file
            EmbeddingManager->>Logger: Skip file
        end
    end
    EmbeddingManager-->>User: None
    deactivate EmbeddingManager
Loading

File-Level Changes

Change Details Files
Modularized the codebase by separating functionalities into distinct modules for better organization and maintainability.
  • Created separate modules for embedding management, security analysis, reporting, and Ollama client management.
  • Moved functions related to file processing, logging, and utility operations into a dedicated 'tools' module.
  • Defined configuration constants in a 'config' module for better maintainability.
  • Created a 'templates' directory with a Jinja2 template for report generation.
oasis.py
oasis/embedding.py
oasis/analyze.py
oasis/report.py
oasis/config.py
oasis/ollama_manager.py
oasis/tools.py
oasis/templates/report_template.html
oasis/templates/report_styles.css
oasis/__init__.py
Enhanced vulnerability detection by incorporating detailed vulnerability descriptions and examples.
  • Added detailed descriptions, common patterns, security impact, and mitigation strategies for each vulnerability type in the 'config' module.
  • Modified the analysis prompts to include vulnerability details for more accurate detection.
  • Updated the report generation to include vulnerability descriptions and mitigation strategies.
oasis/config.py
oasis/analyze.py
oasis/report.py
Improved logging with a custom EmojiFormatter for better readability and context-aware icons.
  • Implemented a custom EmojiFormatter class to add contextual emojis to log messages.
  • Added keyword lists for different types of log messages to select appropriate emojis.
  • Configured the logging system to use the EmojiFormatter for console output.
oasis/tools.py
Enhanced report generation with HTML templates and CSS styling for improved readability and visual appeal.
  • Created an HTML template using Jinja2 for report generation.
  • Added CSS styling to the template for better formatting and visual appeal.
  • Modified the report generation process to use the HTML template for generating reports.
oasis/report.py
oasis/templates/report_template.html
oasis/templates/report_styles.css
Added comprehensive test files with examples of various vulnerabilities to improve testing coverage.
  • Added new test files for Java, C#, and Shell scripts with examples of common vulnerabilities.
  • Updated the existing Python test file with additional vulnerability examples.
test_files/Vulnerable.java
test_files/Vulnerable.cs
test_files/vulnerable.sh
test_files/vulnerable.py
test_files/vulnerable.php
Improved cache management by storing embeddings in a dedicated .oasis_cache/ directory.
  • Modified the cache file path to store embeddings in a dedicated .oasis_cache/ directory.
  • Ensured that the cache directory is created if it does not exist.
  • Updated the cache loading and saving logic to use the new cache file path.
oasis/embedding.py
Added new command-line arguments for better control over the scanning process.
  • Added a new command-line argument for specifying the analysis type (file or function).
  • Added a new command-line argument for specifying the output format (pdf, html, markdown).
  • Added a new command-line argument for specifying the chunk size for embedding text.
oasis/oasis.py
Updated documentation to include new features, command line arguments, and usage instructions.
  • Updated the README.md file to include new features, command line arguments, and usage instructions.
  • Added a CONTRIBUTING.md file to guide contributors on how to contribute to the project.
README.md
CONTRIBUTING.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@psyray psyray self-assigned this Mar 14, 2025
@psyray psyray added the enhancement New feature or request label Mar 14, 2025
sourcery-ai[bot]

This comment was marked as outdated.

psyray added 14 commits March 14, 2025 19:53
This commit introduces several enhancements to the reporting and logging functionalities of OASIS, along with improvements to code structure and error handling. Key changes include:

- Enhanced Reporting: Reports now use Jinja2 templates for improved styling and formatting. A new CSS file manages the report's visual style. The executive summary report generation is refactored for better readability and maintainability. Heading levels in detailed analysis sections are normalized for consistency. Error messages during analysis are now included in the reports.
- Improved Logging: Logging has been enhanced to include more informative error messages, especially regarding Ollama connection issues. A new option allows for saving error logs to a file in silent mode.
- Code Restructuring: The oasis.py and embedding.py files have been refactored to improve code organization and modularity. Functions are reorganized and new functions are introduced to handle specific tasks, such as model selection, input processing, and directory setup. The analyze.py file is also refactored to improve code clarity and chunk handling.
- Dependency Updates: Added jinja2 as a dependency.
- Cache Management: Cache handling is improved with better validation and normalization of cache entries. Cache filtering by file extension is also added.
- Model Selection: Model selection is improved with better handling of unavailable models and clearer error messages. The model display now includes more detailed information, such as parent model and context length. Progress bars are used during model information retrieval.
- Other Improvements: Several minor improvements and bug fixes are included, such as handling missing analysis keys, ensuring parent directories exist before file creation, and improving code comments.
This change enhances the model selection process, allowing users to select models by index or name. It also improves error handling and logging throughout the application, providing more informative messages and debug information. Additionally, the model display during selection has been improved, and minor updates were made to the embedding and reporting processes.
This commit introduces configuration options for the Ollama endpoint, excluded and default models, and maximum chunk size.
The vulnerability analysis is improved with a more comprehensive prompt and a refined vulnerability mapping.
Additionally, model-specific emojis and logging keywords are now managed through the configuration file.
This commit introduces several common web vulnerabilities to the test files, including server-side request forgery, XML external entity injection, path traversal, insecure direct object references, authentication issues, and cross-site request forgery. These vulnerabilities are added to Java, C#, Python, shell script, and PHP files for demonstration and testing purposes.
This change introduces a constant EMBEDDING_THRESHOLDS in config.py to store the list of thresholds used for vulnerability analysis. The analyze.py script is updated to use this constant instead of hardcoded threshold values. Additionally, the maximum chunk size for embedding text is reduced. Finally, the vulnerability analysis now filters results below the specified threshold before performing detailed analysis.
This commit refactors the OASIS codebase into a proper Python package. This improves code organization, maintainability, and allows for easier distribution and installation. The main functionality remains unchanged.
Refactoring the Ollama interaction into a dedicated manager class, enhancing model selection and availability checks, and restructuring the analysis and reporting process for better clarity and efficiency. These changes improve the overall user experience and make the codebase more maintainable.
Introducing analyze by function (beta)
Improved the extraction and display of model parameter information. The update provides more accurate parameter counts and adds emojis to indicate model size categories (small/fast and large).
This change improves the clarity and usability of the security analysis reports by adding explanatory sections and refining formatting. The reports now include guidance on interpreting similarity scores, risk levels, and recommended actions. Additionally, model display in the Ollama manager is improved, and model size indicators are added.
This change refactors the SecurityAnalyzer class to improve code readability, maintainability, and logging. It introduces helper functions to encapsulate specific tasks like getting vulnerability embeddings, processing files, and processing functions within files. The logic for handling different embedding structures is also consolidated. Additionally, the error logging during vulnerability searches is made more informative. No changes in functionality are intended. Some minor changes were made to other files to improve logging and argument handling. A duplicated CSRF vulnerability was removed from vulnerable.php and an os import was removed from vulnerable.py as it was unused.
This pull request enhances the embedding management and analysis process, improves function extraction, and adds detailed vulnerability information to reports. Key changes include:

- Streamlined argument handling with default values.
- Improved function extraction using regex and LLM with fallback mechanisms.
- Enhanced vulnerability analysis with detailed prompts and vulnerability-specific information.
- Added support for vulnerability embeddings with rich prompts.
- Improved reporting with detailed vulnerability information and enhanced formatting.
- Updated cache file naming and structure.
- Added new vulnerability types and patterns.
- Improved logging and error handling.
- Updated documentation and help texts.
- Code cleanup and refactoring for better readability and maintainability.
@psyray
Copy link
Copy Markdown
Owner Author

psyray commented Mar 17, 2025

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @psyray - I've reviewed your changes and found some issues that need to be addressed.

Blocking issues:

  • Hardcoded database password found. (link)
  • Hardcoded credentials found. (link)
  • Hardcoded credentials found. (link)

Overall Comments:

  • Consider adding a command-line option to specify the output directory.
Here's what I looked at during the review
  • 🟡 General issues: 11 issues found
  • 🔴 Security: 3 blocking issues
  • 🟡 Testing: 1 issue found
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread oasis/embedding.py
Comment thread oasis/embedding.py
Comment thread oasis/embedding.py Outdated
Comment thread oasis/embedding.py
Comment thread oasis/oasis.py
Comment thread test_files/Vulnerable.cs
Comment thread test_files/vulnerable.sh
Comment thread oasis/embedding.py Outdated
Comment thread test_files/vulnerable.py
Comment thread test_files/vulnerable.py
psyray added 2 commits March 17, 2025 04:10
This change enhances logging by using logger.exception() for errors, providing more detailed tracebacks. It also refactors the embedding generation process for better clarity and efficiency, including handling for chunked embeddings and improved embedding dimension consistency checks. Additionally, the change includes more robust error handling during file processing and cache management.
This update significantly improves report generation and introduces a new audit feature. Reports now support multiple models, have a cleaner structure, and include an executive summary. Additionally, the audit feature allows for comprehensive vulnerability analysis and reporting. Several dependencies have been updated, and Python 3.9+ is now required. Minor bug fixes and improvements are also included.
@psyray psyray merged commit 8e11d64 into master Mar 17, 2025
@psyray psyray deleted the refactor branch March 17, 2025 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant