Feature/64 create framework for benchmarking by jathavaan · Pull Request #66 · kartAI/doppa-data

jathavaan · 2026-02-18T10:16:11Z

This pull request introduces a CPU and RAM monitoring system for the main pipeline execution, refactors DataFrame-to-bytes conversion to use Parquet format, and updates dependency injection and configuration to support these features. The monitoring system logs resource usage during pipeline runs and uploads the results to blob storage for benchmarking and analysis.

Resource monitoring and logging:

Added a monitor_cpu_and_ram decorator in src/application/common/monitor.py that samples process CPU and RAM usage during pipeline execution, logs the data to Parquet and CSV, and uploads results to blob storage in the new benchmarks container. The decorator is applied to the main function in main.py, generating a unique run_id for each run. [1] [2] [3]
Updated configuration to specify a MONITOR_LOG_DIRECTORY for storing local logs.
Modified dependency injection setup to wire the new monitor module.

DataFrame serialization improvements:

Changed the interface and implementation for converting DataFrames to bytes: replaced convert_df_to_bytes with convert_df_to_parquet_bytes, standardizing on Parquet format for serialization. [1] [2]
Updated release creation logic to use the new Parquet-based conversion method when uploading release metadata.

Minor cleanup:

Removed unused imports in release_pipeline.py.

Copilot

Pull request overview

This pull request introduces a CPU and RAM monitoring system for benchmarking the main pipeline execution. The monitoring decorator samples resource usage during runs, logs data locally and to Azure blob storage. Additionally, it refactors DataFrame-to-bytes conversion to consistently use Parquet format across the codebase.

Changes:

Added a monitoring decorator (monitor_cpu_and_ram) that samples CPU and RAM usage during pipeline execution, with results saved to local files and uploaded to blob storage
Refactored DataFrame serialization by replacing convert_df_to_bytes with convert_df_to_parquet_bytes for consistent Parquet-based serialization
Updated configuration and dependency injection to support the new monitoring module

Reviewed changes

Copilot reviewed 10 out of 13 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
src/application/common/monitor.py	New monitoring module with CPU/RAM sampling decorator and helper functions for benchmarking
src/application/common/init.py	Exports the new monitor_cpu_and_ram decorator
main.py	Applies the monitoring decorator to the main function with a unique run ID
src/application/contracts/bytes_service_interface.py	Renames method from convert_df_to_bytes to convert_df_to_parquet_bytes
src/infra/infrastructure/services/bytes_service.py	Implements convert_df_to_parquet_bytes using BytesIO and Parquet serialization
src/infra/infrastructure/services/release_service.py	Updates to use the new convert_df_to_parquet_bytes method
src/presentation/configuration/app_config.py	Wires the monitor module for dependency injection
src/presentation/entrypoints/release_pipeline.py	Removes unused Dict import
src/config.py	Adds MONITOR_LOG_DIRECTORY configuration
src/domain/enums/storage_container.py	Adds BENCHMARKS container enum value
requirements.txt	Adds development dependencies (objprint, pandas-stubs, types-pytz, viztracer) and removes platform markers from pywin32/pywinpty
.gitignore	Excludes generated monitoring logs (CSV and Parquet files)
monitor_logs/.gitkeep	Placeholder for monitoring logs directory

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/application/common/monitor.py

jathavaan added 6 commits February 3, 2026 13:54

#64 Added psutil as dependency

1c05396

#64 Created directory for logs

95b6306

#64 Added cpu and ram monitoring

e213975

#64 Added monitor logs container

daf68f7

#64 Created method to convert dataframe to parquet bytes

da36ad9

#64 Added repeating runs and save logs to blob storage

0ec336b

jathavaan self-assigned this Feb 18, 2026

Copilot AI review requested due to automatic review settings February 18, 2026 10:16

jathavaan linked an issue Feb 18, 2026 that may be closed by this pull request

Create framework for benchmarking #64

Closed

Copilot started reviewing on behalf of jathavaan February 18, 2026 10:16 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

jathavaan added 6 commits February 18, 2026 11:31

#64 Removed redundant data writes

8640d08

#64 Removed redundant pass statement

c297604

#64 Fixed incorrect type hints

1630adc

#64 Updated return type of _initialize_cpu_metrics

5124313

#64 Removed unused method

aeffba9

#64 Changed to correct import type for Callable

fd83aeb

jathavaan merged commit 2503db2 into main Feb 18, 2026

jathavaan deleted the feature/64-create-framework-for-benchmarking branch February 18, 2026 10:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/64 create framework for benchmarking#66

Feature/64 create framework for benchmarking#66
jathavaan merged 12 commits intomainfrom
feature/64-create-framework-for-benchmarking

jathavaan commented Feb 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

jathavaan commented Feb 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments