Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Prometheus alerting functionality #1424

Merged
merged 105 commits into from
Nov 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
466da76
Code adaptation k8s: service discovery and registration adaptation, c…
Sep 23, 2023
7d2f4b6
Initial submission of the help charts script for openim API
Sep 24, 2023
8099514
change the help charts script
Sep 24, 2023
1955fa6
change the help charts script
Sep 25, 2023
36da94a
change helm chart codes
xuexihuang Sep 27, 2023
226d715
change dockerfiles script
Sep 28, 2023
0789da0
change chart script:add configmap mounts
Sep 28, 2023
a4268d0
change chart script:change repository
Sep 28, 2023
58d83ba
change chart script:msggateway add one service
Sep 28, 2023
8cfcc12
change config.yaml
Sep 28, 2023
45aaa4f
roll back some config values
Sep 28, 2023
4a91b90
change chart script:change Ingress rule with a rewrite annotation
Oct 1, 2023
ea8a749
add mysql charts scrible
xuexihuang Oct 2, 2023
2c0f06e
change chart script:add mysql.config.yaml
Oct 2, 2023
777eab6
add nfs provisioner charts
xuexihuang Oct 2, 2023
48e891f
change chart script:add nfs.config.yaml
Oct 2, 2023
f1cc601
add ingress-nginx charts
xuexihuang Oct 2, 2023
6051cb6
change chart script:add ingress-nginx.config.yaml
Oct 2, 2023
9861c0a
add redis &mongodb charts
xuexihuang Oct 3, 2023
ed746e5
add kafka&minio charts
xuexihuang Oct 3, 2023
85b26e7
change chart script:change redis.values.yaml
Oct 3, 2023
e1966c9
change chart script:add redis.config.yaml
Oct 3, 2023
7476421
change chart script:change redis.config.yaml
Oct 3, 2023
c5a0366
change chart script:change mongodb.value.yaml
Oct 3, 2023
0639ef0
change chart script:change mongodb.value.yaml
Oct 3, 2023
2564856
change chart script:add mongodb.config.yaml
Oct 3, 2023
3a285aa
change chart script:change minio.values.yaml
Oct 3, 2023
43a35de
change chart script:add minio.config.yaml
Oct 3, 2023
160fd2b
change chart script:change kafka.values.yaml
Oct 3, 2023
4b7fdf2
change chart script:add kafka.config.yaml
Oct 3, 2023
0762704
change chart script:change services.config.yaml
Oct 4, 2023
f245afb
bug fix:Delete websocket's Port restrictions
Oct 5, 2023
e84530b
bug fix:change port value
Oct 5, 2023
60763a8
change chart script:Submit a stable version script
Oct 5, 2023
84032a5
fix bug:Implement option interface
Oct 8, 2023
61298ef
fix bug:change K8sDR.Register
Oct 9, 2023
b9f40f9
change config.yaml
Oct 9, 2023
4fe0bc3
change chats script:minio service add ingress
Oct 9, 2023
a986346
change chats script:minio service add ingress
Oct 9, 2023
c08f297
change chats script:kafka.replicaCount=3& change minio.api ingress
Oct 11, 2023
832e498
delete change chats script
Oct 16, 2023
eafb088
Merge remote-tracking branch 'upstream/main' into main
Oct 16, 2023
4c19a80
change config.yaml
Oct 16, 2023
ef2b7db
change openim.yaml
Oct 16, 2023
da3f4a7
Merge remote-tracking branch 'upstream/main' into main
Oct 29, 2023
a922523
merge go.sum
Oct 29, 2023
477e1f2
Add monitoring function and struct for Prometheus on gin and GRPC
Oct 29, 2023
4001fb1
Add GRPC and gin server monitoring logic
Oct 30, 2023
e6adaa7
Add GRPC and gin server monitoring logic2
Oct 30, 2023
8e50fe8
Add GRPC and gin server monitoring logic3
Oct 30, 2023
a4a195a
Add GRPC and gin server monitoring logic4
Oct 30, 2023
ea8f720
Add GRPC and gin server monitoring logic5
Oct 31, 2023
68e31bf
Add GRPC and gin server monitoring logic6
Oct 31, 2023
76f1f70
Add GRPC and gin server monitoring logic7
Oct 31, 2023
85bd8b1
delete:old monitoring code
Oct 31, 2023
0625095
Merge remote-tracking branch 'upstream/main' into main
Oct 31, 2023
a11e346
add for test
Nov 1, 2023
4bf5b00
fix bug:change packname
Nov 1, 2023
d4524a6
fix bug:delete getPromPort funciton
Nov 2, 2023
1c7b2d4
fix bug:delete getPromPort funciton
Nov 2, 2023
d4db91e
fix bug:change logs
Nov 2, 2023
6a83403
fix bug:change registerName logic in GetGrpcCusMetrics function
Nov 2, 2023
60c0923
add getPrometheus url api
Nov 3, 2023
dfd36a6
Merge remote-tracking branch 'upstream/main' into main
Nov 3, 2023
419c97c
fix:config path logic
Nov 4, 2023
c6fabc3
fix:prometheus enable function
Nov 6, 2023
0309f62
Merge remote-tracking branch 'upstream/main' into main
Nov 6, 2023
e1bdbc4
fix:prometheus enable function
Nov 6, 2023
ccb85ee
fix:transfer Multi process monitoring logic
Nov 6, 2023
0139030
del:del not using manifest
Nov 6, 2023
2df4bb8
fix:openim-msgtransfer.sh
Nov 6, 2023
182f14d
fix:openim-msgtransfer.sh
Nov 6, 2023
a544351
Merge branch 'main' into main
cubxxw Nov 7, 2023
c7b6b0f
merge upstream/main to main
Nov 11, 2023
8ccb984
cicd: robot automated Change
xuexihuang Nov 11, 2023
7f59b4d
delete not using files
Nov 11, 2023
9ee525a
Merge remote-tracking branch 'origin/main' into main
Nov 11, 2023
b62684c
add prometheus docker-compose for monitor
Nov 11, 2023
01ca57d
fix prometheus.yaml
Nov 11, 2023
c684fb2
fix environment.sh
Nov 11, 2023
2dd05a8
fix init-config.sh
Nov 13, 2023
36c55df
fix init-config.sh
Nov 13, 2023
4588302
fix env_template.yaml
Nov 13, 2023
c55e37f
fix docker-compose.yml
Nov 13, 2023
805c5f3
fix docker-compose.yml
Nov 13, 2023
ac87168
add openim_admin_front service
Nov 13, 2023
2d70645
change openim-admin-front
Nov 13, 2023
fae3c33
del not using files
Nov 13, 2023
c0c2146
add node-exporter-dashaboard.yaml
Nov 13, 2023
7994ba1
Merge branch 'main' into main
cubxxw Nov 14, 2023
0ff1c30
cicd: robot automated Change
cubxxw Nov 14, 2023
6b9e988
Merge remote-tracking branch 'upstream/main' into main
Nov 15, 2023
6f7a64c
cicd: robot automated Change
xuexihuang Nov 15, 2023
6f38ba4
feature: add alertmanager function
Nov 15, 2023
d99454d
Merge remote-tracking branch 'origin/main' into main
Nov 15, 2023
4fea9e9
feature: add alertmanager function
Nov 15, 2023
4cad51b
feature: add alertmanager function
Nov 15, 2023
dbd03a6
feature: add alertmanager function
Nov 15, 2023
84618b4
feature: add alertmanager function
Nov 15, 2023
03f0fdc
del:delete not using files
Nov 15, 2023
1628b04
del:delete not using files
Nov 15, 2023
c1232c4
change:change to personal email info
Nov 15, 2023
51704e0
feat: deployment and design of management backend and monitoring
cubxxw Nov 16, 2023
badc92f
feat: deployment and design of management backend and monitoring
cubxxw Nov 16, 2023
0a0897f
feat: deployment and design of management backend and monitoring
cubxxw Nov 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion .github/workflows/e2e-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,4 +97,15 @@ jobs:

- name: Exec OpenIM System uninstall
run: |
sudo ./scripts/install/install.sh -u
sudo ./scripts/install/install.sh -u

- name: gobenchdata publish
uses: bobheadxi/gobenchdata@v1
with:
PRUNE_COUNT: 30
GO_TEST_FLAGS: -cpu 1,2
PUBLISH: true
PUBLISH_BRANCH: gh-pages
env:
GITHUB_TOKEN: ${{ secrets.BOT_GITHUB_TOKEN }}
continue-on-error: true
2 changes: 1 addition & 1 deletion .github/workflows/link-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
# ./*.md all markdown files in the root directory
args: --verbose -E -i --no-progress --exclude-path './CHANGELOG' './**/*.md'
env:
GITHUB_TOKEN: ${{secrets.GH_PAT}}
GITHUB_TOKEN: ${{secrets.BOT_GITHUB_TOKEN}}

- name: Create Issue From File
if: env.lychee_exit_code != 0
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ Before you start, please make sure your changes are in demand. The best for that
- [OpenIM Makefile Utilities](https://github.com/openimsdk/open-im-server/tree/main/docs/contrib/util-makefile.md)
- [OpenIM Script Utilities](https://github.com/openimsdk/open-im-server/tree/main/docs/contrib/util-scripts.md)
- [OpenIM Versioning](https://github.com/openimsdk/open-im-server/tree/main/docs/contrib/version.md)

- [Manage backend and monitor deployment](https://github.com/openimsdk/open-im-server/tree/main/docs/contrib/prometheus-grafana.md)

## :busts_in_silhouette: Community

Expand Down
32 changes: 32 additions & 0 deletions config/alertmanager.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
###################### AlertManager Configuration ######################
# AlertManager configuration using environment variables
#
# Resolve timeout
# SMTP configuration for sending alerts
# Templates for email notifications
# Routing configurations for alerts
# Receiver configurations
global:
resolve_timeout: 5m
smtp_from: alert@openim.io
smtp_smarthost: smtp.163.com:465
smtp_auth_username: alert@openim.io
smtp_auth_password: YOURAUTHPASSWORD
smtp_require_tls: false
smtp_hello: xxx监控告警

templates:
- /etc/alertmanager/email.tmpl

route:
group_wait: 5s
group_interval: 5s
repeat_interval: 5m
receiver: email
receivers:
- name: email
email_configs:
- to: {EMAIL_TO:-'alert@example.com'}
html: '{{ template "email.to.html" . }}'
headers: { Subject: "[OPENIM-SERVER]Alarm" }
send_resolved: true
16 changes: 16 additions & 0 deletions config/email.tmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{{ define "email.to.html" }}
{{ range .Alerts }}
<!-- Begin of OpenIM Alert -->
<div style="border:1px solid #ccc; padding:10px; margin-bottom:10px;">
<h3>OpenIM Alert</h3>
<p><strong>Alert Program:</strong> Prometheus Alert</p>
<p><strong>Severity Level:</strong> {{ .Labels.severity }}</p>
<p><strong>Alert Type:</strong> {{ .Labels.alertname }}</p>
<p><strong>Affected Host:</strong> {{ .Labels.instance }}</p>
<p><strong>Affected Service:</strong> {{ .Labels.job }}</p>
<p><strong>Alert Subject:</strong> {{ .Annotations.summary }}</p>
<p><strong>Trigger Time:</strong> {{ .StartsAt.Format "2006-01-02 15:04:05" }}</p>
</div>
<!-- End of OpenIM Alert -->
{{ end }}
{{ end }}
11 changes: 11 additions & 0 deletions config/instance-down-rules.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
groups:
- name: instance_down
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
8 changes: 4 additions & 4 deletions config/prometheus.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@ global:

# Alertmanager configuration
alerting:
#alertmanagers:
# - static_configs:
# - targets: ['172.29.166.17:9093'] #alertmanager地址
alertmanagers:
- static_configs:
- targets: ['172.28.0.1:19093']

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "node_down.yml"
- "instance-down-rules.yml"
# - "first_rules.yml"
# - "second_rules.yml"

Expand Down
32 changes: 32 additions & 0 deletions deployments/templates/alertmanager.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
###################### AlertManager Configuration ######################
# AlertManager configuration using environment variables
#
# Resolve timeout
# SMTP configuration for sending alerts
# Templates for email notifications
# Routing configurations for alerts
# Receiver configurations
global:
resolve_timeout: ${ALERTMANAGER_RESOLVE_TIMEOUT}
smtp_from: ${ALERTMANAGER_SMTP_FROM}
smtp_smarthost: ${ALERTMANAGER_SMTP_SMARTHOST}
smtp_auth_username: ${ALERTMANAGER_SMTP_AUTH_USERNAME}
smtp_auth_password: ${ALERTMANAGER_SMTP_AUTH_PASSWORD}
smtp_require_tls: ${ALERTMANAGER_SMTP_REQUIRE_TLS}
smtp_hello: ${ALERTMANAGER_SMTP_HELLO}

templates:
- /etc/alertmanager/email.tmpl

route:
group_wait: 5s
group_interval: 5s
repeat_interval: 5m
receiver: email
receivers:
- name: email
email_configs:
- to: ${ALERTMANAGER_EMAIL_TO}
html: '{{ template "email.to.html" . }}'
headers: { Subject: "[OPENIM-SERVER]Alarm" }
send_resolved: true
13 changes: 10 additions & 3 deletions deployments/templates/env_template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ OPENIM_CHAT_NETWORK_ADDRESS=${OPENIM_CHAT_NETWORK_ADDRESS}
# Address or hostname for the Prometheus network.
# Default: PROMETHEUS_NETWORK_ADDRESS=172.28.0.11
PROMETHEUS_NETWORK_ADDRESS=${PROMETHEUS_NETWORK_ADDRESS}

# Address or hostname for the Grafana network.
# Default: GRAFANA_NETWORK_ADDRESS=172.28.0.12
GRAFANA_NETWORK_ADDRESS=${GRAFANA_NETWORK_ADDRESS}
Expand All @@ -106,7 +106,10 @@ NODE_EXPORTER_NETWORK_ADDRESS=${NODE_EXPORTER_NETWORK_ADDRESS}
# Address or hostname for the OpenIM admin network.
# Default: OPENIM_ADMIN_NETWORK_ADDRESS=172.28.0.14
OPENIM_ADMIN_FRONT_NETWORK_ADDRESS=${OPENIM_ADMIN_FRONT_NETWORK_ADDRESS}


# Address or hostname for the alertmanager network.
# Default: ALERT_MANAGER_NETWORK_ADDRESS=172.28.0.15
ALERT_MANAGER_NETWORK_ADDRESS=${ALERT_MANAGER_NETWORK_ADDRESS}
# ===============================================
# = Component Extension Configuration =
# ===============================================
Expand Down Expand Up @@ -305,4 +308,8 @@ GRAFANA_PORT=${GRAFANA_PORT}

# Port for the admin front.
# Default: OPENIM_ADMIN_FRONT_PORT=11002
OPENIM_ADMIN_FRONT_PORT=${OPENIM_ADMIN_FRONT_PORT}
OPENIM_ADMIN_FRONT_PORT=${OPENIM_ADMIN_FRONT_PORT}

# Port for the alertmanager.
# Default: ALERT_MANAGER_PORT=19093
ALERT_MANAGER_PORT=${ALERT_MANAGER_PORT}
8 changes: 4 additions & 4 deletions deployments/templates/prometheus.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@ global:

# Alertmanager configuration
alerting:
#alertmanagers:
# - static_configs:
# - targets: ['172.29.166.17:9093'] #alertmanager地址
alertmanagers:
- static_configs:
- targets: ['${ALERT_MANAGER_ADDRESS}:${ALERT_MANAGER_PORT}']

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "node_down.yml"
- "instance-down-rules.yml"
# - "first_rules.yml"
# - "second_rules.yml"

Expand Down
15 changes: 15 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -162,12 +162,27 @@ services:
restart: always
volumes:
- ./config/prometheus.yml:/etc/prometheus/prometheus.yml
- ./config/instance-down-rules.yml:/etc/prometheus/instance-down-rules.yml
ports:
- "${PROMETHEUS_PORT}:9090"
networks:
server:
ipv4_address: ${PROMETHEUS_NETWORK_ADDRESS}

alertmanager:
image: prom/alertmanager
container_name: alertmanager
hostname: alertmanager
restart: always
volumes:
- ./config/alertmanager.yml:/etc/alertmanager/alertmanager.yml
- ./config/email.tmpl:/etc/alertmanager/email.tmpl
ports:
- "${ALERT_MANAGER_PORT}:9093"
networks:
server:
ipv4_address: ${ALERT_MANAGER_NETWORK_ADDRESS}

grafana:
image: grafana/grafana
container_name: grafana
Expand Down
Loading
Loading