feat(infra): add rabbitmq monitoring with prometheus and grafana dashboard#3508
Conversation
…board - enable rabbitmq_prometheus plugin on RabbitmqCluster - add ServiceMonitor for direct prometheus scraping (15s interval) - provision RabbitMQ Overview dashboard (grafana.com #10991) on stage and production - remove redundant prometheus/rabbitmq receiver from otel-collector Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 이 PR은 RabbitMQ 모니터링 시스템을 개선하기 위해 기존의 OTel Collector를 통한 간접적인 스크래핑 방식을 Prometheus ServiceMonitor를 이용한 직접 스크래핑 방식으로 전환합니다. 이를 통해 모니터링 구조를 단순화하고, Grafana에 RabbitMQ Overview 대시보드를 통합하여 RabbitMQ의 상태를 보다 효율적으로 시각화할 수 있도록 합니다. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request integrates RabbitMQ monitoring into the Kubernetes infrastructure. Key changes include enabling the rabbitmq_prometheus plugin, configuring a ServiceMonitor for Prometheus to directly scrape RabbitMQ metrics, and adding a new Grafana dashboard for RabbitMQ overview. The previous RabbitMQ scraping configuration via the OTel collector has been removed. The review comments point out a potential issue with the Grafana dashboard's datasource configuration in both production and stage environments, suggesting that the current datasource: Prometheus might cause a UID mismatch and prevent the dashboard from correctly finding its data source. An explicit mapping of the datasource input variable is recommended to resolve this.
The gnetId 10991 dashboard uses 'datasource' as its input variable name. Using a plain string could cause UID mismatch with the provisioned datasource. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
### Description 채점 요청 파이프라인(`API → RabbitMQ → Iris`)의 k6 로드테스트를 추가합니다. - **150 VU**: 140 정상 + 10 빌런 (자원 소모 악성 코드) - **언어별 페이로드**: C, Cpp, Java, Python3, PyPy3 — 정상/빌런 각 5종 - **이중 결과 출력**: Prometheus remote write (Grafana 연동) + 로컬 JSON (보고서용) - **환경변수 기반**: `BASE_URL`, `USERNAME`, `PASSWORD`, `PROBLEM_ID` — 실행 시 동적 지정 - **에러 로그**: 응답 body를 500자로 truncate하여 과도한 로깅 방지 ``` k6 run \ -e BASE_URL=https://codedang.com/api \ -e USERNAME=... -e PASSWORD=... -e PROBLEM_ID=... \ --out experimental-prometheus-rw \ --out "json=results/$(date +%Y%m%d-%H%M%S).json" \ -e K6_PROMETHEUS_RW_SERVER_URL=https://prometheus.codedang.com/api/v1/write \ submission-load.js ``` ### Stage 환경 검증 결과 VU를 축소(normal 2 + villain 1, 40초)하여 stage 환경에서 스크립트 동작을 검증했습니다. ``` ✓ login succeeded ✓ submission accepted ✓ http_req_failed: 0.00% (0 out of 43) ✓ submission_errors: 0 ✓ p(95) = 79ms < 5000ms threshold submission_duration: avg=36.68ms med=33.89ms p(95)=60.42ms iterations: 42 (로그인 1회 + 제출 42건) ``` #3508 에서 추가한 RabbitMQ Overview 대시보드(Grafana #10991)로 테스트 중 메트릭도 시각적으로 확인: <img width="1893" height="788" alt="image" src="https://github.com/user-attachments/assets/bd10d564-e907-4f9b-8181-b76636daa292" /> - Incoming messages/s가 테스트 기간 동안 ~0.87로 정상 유입 - Ready messages 0, Unacknowledged 0 — 큐 적체 없이 정상 소비 - Publishers 8, Consumers 8, Queues 4 — 정상 토폴로지 ### Additional context - Related: #3508 (RabbitMQ 모니터링 대시보드 추가) - Grafana에서 k6 메트릭(`k6_*`)과 RabbitMQ/Iris 메트릭을 같은 시간축으로 비교 가능 - `user_type` (`normal`/`villain`), `language` 태그로 필터링 가능 - `k6 inspect`로 스크립트 문법 검증 완료 --- ### Before submitting the PR, please make sure you do the following - [x] Read the [Contributing Guidelines](https://github.com/skkuding/next/blob/main/CONTRIBUTING.md) - [x] Read the [Contributing Guidelines](https://github.com/skkuding/next/blob/main/CONTRIBUTING.md#pr-and-branch) and follow the [Commit Convention](https://github.com/skkuding/next/blob/main/CONTRIBUTING.md#commit-convention) - [x] Provide a description in this PR that addresses **what** the PR is solving, or reference the issue that it solves (e.g. `fixes #123`). - [ ] Ideally, include relevant tests that fail without this PR but pass with it. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
RabbitMQ Grafana 대시보드( 스테이지 및 프로덕션 오버레이를 모두 다음과 같이 업데이트했습니다:
|
Description
RabbitMQ에 대한 Prometheus 모니터링 및 Grafana 대시보드를 stage/production 모두에 추가합니다.
rabbitmq_prometheus플러그인 명시적 활성화 (:15692엔드포인트)release: prometheuslabel)prometheus/rabbitmqreceiver 제거 — ServiceMonitor로 대체Data flow (after):
Additional context
rabbitmqServiceMonitor 표시 확인Before submitting the PR, please make sure you do the following
fixes #123).🤖 Generated with Claude Code