From 29496970cea0a4241b11806f9152350ea4ad0b6a Mon Sep 17 00:00:00 2001 From: Reinaldy Rafli Date: Sat, 21 Dec 2024 07:43:18 +0700 Subject: [PATCH 1/4] docs(self-hosted): initial attempt for cleaning up troubleshooting section --- develop-docs/self-hosted/troubleshooting.mdx | 252 ------------------ .../self-hosted/troubleshooting/docker.mdx | 83 ++++++ .../self-hosted/troubleshooting/index.mdx | 15 ++ .../self-hosted/troubleshooting/kafka.mdx | 94 +++++++ .../self-hosted/troubleshooting/postgres.mdx | 32 +++ .../self-hosted/troubleshooting/redis.mdx | 7 + .../self-hosted/troubleshooting/sentry.mdx | 53 ++++ 7 files changed, 284 insertions(+), 252 deletions(-) delete mode 100644 develop-docs/self-hosted/troubleshooting.mdx create mode 100644 develop-docs/self-hosted/troubleshooting/docker.mdx create mode 100644 develop-docs/self-hosted/troubleshooting/index.mdx create mode 100644 develop-docs/self-hosted/troubleshooting/kafka.mdx create mode 100644 develop-docs/self-hosted/troubleshooting/postgres.mdx create mode 100644 develop-docs/self-hosted/troubleshooting/redis.mdx create mode 100644 develop-docs/self-hosted/troubleshooting/sentry.mdx diff --git a/develop-docs/self-hosted/troubleshooting.mdx b/develop-docs/self-hosted/troubleshooting.mdx deleted file mode 100644 index a383577c961d9..0000000000000 --- a/develop-docs/self-hosted/troubleshooting.mdx +++ /dev/null @@ -1,252 +0,0 @@ ---- -title: Self-Hosted Troubleshooting -sidebar_title: Troubleshooting -sidebar_order: 9000 ---- - -Please keep in mind that the [self-hosted repository](https://github.com/getsentry/self-hosted) is geared towards low traffic loads (less than ~1 million submitted Sentry events per month). Folks needing larger setups or having event spikes can expand from here based on their specific needs and environments. If this is not your cup of tea, you are always welcome to [try out hosted Sentry](https://sentry.io/signup/). - -## General - -You can see the logs of each service by running `docker compose logs `. You can use the `-f` flag to "follow" the logs as they come in, and use the `-t` flag for timestamps. If you don't pass any service names, you will get the logs for all running services. See the [reference for the logs command](https://docs.docker.com/compose/reference/logs/) for more info. - -## Sentry - -### CSRF verification failed - -Since version 24.1.0, Sentry migrated to Django 4 which contains stricter CSRF protection. By default, the trusted CSRF origins is set to your `system.url-prefix`, but in some cases where your Sentry deployment can be accessed from multiple domains, you will need to configure `CSRF_TRUSTED_ORIGINS` on your `sentry.conf.py` file. - -```python -# Assuming your Sentry instance can be accessed from sentry.example.com, 10.100.10.10 and 127.0.0.1. -CSRF_TRUSTED_ORIGINS = ["https://sentry.example.com", "http://10.100.10.10", "http://127.0.0.1:9000"] -``` - -See [Django's documentation on CSRF](https://docs.djangoproject.com/en/4.2/ref/settings/#std-setting-CSRF_TRUSTED_ORIGINS) for further detail. - -### `sentry-data` volume not being cleaned up - -You may see the `sentry-data` taking too much disk space. You can clean it manually (or putting the cleanup cronjob in place). - -Find the Docker mountpoint for the volume by executing: -```bash -docker volume inspect sentry-data - -# Or if you prefer to do it directly (assuming you have `jq` on your system): -docker volume inspect sentry-data | jq -r .[0].Mountpoint -``` - -Then run the following command to remove the contents of the volume for the last 30 days (change the `30` to whatever you want, it's in days): -```bash -# `/var/lib/docker/volumes/sentry-data/_data` refers to the mountpoint of the volume -# from the output of the previous command. Change it if it's different. -find /var/lib/docker/volumes/sentry-data/_data -type f -mtime +30 -delete -``` - -## Kafka - -One of the most likely things to cause issues is Kafka. The most commonly reported error is - -```log -Exception: KafkaError{code=OFFSET_OUT_OF_RANGE,val=1,str="Broker: Offset out of range"} -``` - -This happens where Kafka and the consumers get out of sync. Possible reasons are: - -1. Running out of disk space or memory -2. Having a sustained event spike that causes very long processing times, causing Kafka to drop messages as they go past the retention time -3. Date/time out of sync issues due to a restart or suspend/resume cycle - -### Recovery - -Note: These solutions may result in data loss when resetting the offset of the snuba consumers. - -#### Proper solution - -The _proper_ solution is as follows ([reported](https://github.com/getsentry/self-hosted/issues/478#issuecomment-666254392) by [@rmisyurev](https://github.com/rmisyurev)): - -1. Receive consumers list: - ```shell - docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --list - ``` -2. Get group info: - ```shell - docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --describe - ``` -3. Watching what is going to happen with offset by using dry-run (optional): - ```shell - docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --topic events --reset-offsets --to-latest --dry-run - ``` -4. Set offset to latest and execute: - ```shell - docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --topic events --reset-offsets --to-latest --execute - ``` - - -You can replace snuba-consumers with other consumer groups or events with other topics when needed. - - -#### Another option - -This option is as follows ([reported](https://github.com/getsentry/self-hosted/issues/1690) by [@gabn88](https://github.com/gabn88)): - -1. Set offset to latest and execute: - ```shell - docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --all-groups --all-topics --reset-offsets --to-latest --execute - ``` - -Unlike the proper solution, this involves resetting the offsets of all consumer groups and all topics. - -#### Nuclear option - -The _nuclear option_ is removing all Kafka-related volumes and recreating them which _will_ cause data loss. Any data that was pending there will be gone upon deleting these volumes. - -1. Stop the instance: - ```shell - docker compose down --volumes - ``` -2. Remove the Kafka & Zookeeper related volumes: - ```shell - docker volume rm sentry-kafka - docker volume rm sentry-zookeeper - ``` - - 3. Run the install script again: - ```shell - ./install.sh - ``` - 4. Start the instance: - ```shell - docker compose up -d - ``` -### Reducing disk usage - -If you want to reduce the disk space used by Kafka, you'll need to carefully calculate how much data you are ingesting, how much data loss you can tolerate and then follow the recommendations on [this awesome StackOverflow post](https://stackoverflow.com/a/52970982/90297) or [this post on our community forum](https://forum.sentry.io/t/sentry-disk-cleanup-kafka/11337/2?u=byk). - -## Redis - -Redis is used both as a transactional data store and a work queue of Celery in the self-hosted setup. For this reason, it may get overwhelmed during event spikes. We have made some significant improvements regarding this starting from version `20.10.1`. If you are still having issues, you may look into scaling out Redis itself or switching to a different Celery broker, such as [RabbitMQ](https://www.rabbitmq.com/). - -## Workers - -If you are seeing an error such as - -``` -Background workers haven’t checked in recently. It seems that you have a backlog of 200 tasks. Either your workers aren’t running or you need more capacity. -``` - -you may benefit from using additional, dedicated workers. This is achieved by creating new `worker` services in `docker-compose.override.yml` and tying them to specific queues using the `-Q queue_name` argument. An example would be: - -```yaml -worker1: - << : *sentry_defaults - command: run worker -Q events.process_event -``` - -To see a more complete example, please see [a sample solution on our community forum](https://forum.sentry.io/t/how-to-clear-backlog-and-monitor-it/10715/14?u=byk). - -## Postgres - -Postgres is used for the primary datastore, as well as the [nodestore](https://develop.sentry.dev/services/nodestore/) which is used to store key/value data. The `nodestore_node` table can grow rapidly, especially when heavily utilising the Performance Monitoring feature as trace data is stored in this table. - -The `nodestore_node` table is cleaned up as part of the `cleanup` task, however Postgres may not get a chance to vacuum the table (especially under heavy load), so even the rows may be deleted, they're still taking up space on disk. - -You can use `pg-repack` which repacks a table live by creating a new table and copying data across, before dropping the old one. You'll want to run this after the clean up script, and note that as it creates a table, **disk usage will spike before going back down**. - -An example script below: - -```shell -# Only keep the last 7 days of nodestore data. We heavily use performance monitoring. -docker compose run --rm -T web cleanup --days 7 -m nodestore -l debug -# This ensures pg-repack exists before running as the container gets recreated on upgrades -docker compose run --rm -T postgres bash -c "apt update && apt install -y --no-install-recommends postgresql-14-repack && su postgres -c 'pg_repack -E info -t nodestore_node'" -``` - -If the script above still does not clear up enough disk space, you can try the following in Postgres. This will lead to data loss in event details. _SENTRY_RETENTION_DAYS_ will need to be filled in manually. -```sql -DELETE FROM public.nodestore_node WHERE "timestamp" < NOW() - INTERVAL '[SENTRY_RETENTION_DAYS]'; -VACUUM FULL public.nodestore_node; -``` - -VACUUM FULL actively compacts tables by writing a complete new version of the table file with no dead space. This minimizes the size of the table, but can take a long time. It also requires extra disk space for the new copy of the table, until the operation completes. - -## Docker Containers' Healthcheck - -There may be some circumstances which you may want to increase or decrease healthcheck interval, timeout or retries for your custom needs. This can be achieved by editing `HEALTHCHECK_INTERVAL`, `HEALTHCHECK_TIMEOUT`, `HEALTHCHECK_RETRIES` variables' values in `.env`. - -Occasionally, you might see an error like this -``` -container for service "${servicename}" is unhealthy -``` - -This can usually be resolved by running `docker compose down` and `docker compose up -d` or rerunning the install script. - -## Docker Network Conflicting IP Address - -Self-hosted Sentry is using Docker's bridge networking, in which use a specific private IP range. By default, Docker uses `172.17.0.0/16` range (`172.17.0.0`-`172.17.255.255`). This may cause conflict with your private network. You can change Docker's default IP range by configuring the `/etc/docker/daemon.json` file. If the file does not exists, you can create it yourself. - -Assuming your safe IP range is `10.147.0.0/16` and `10.146.0.0/16`, your configuration would be: - -```json -{ - "default-address-pools": [ - { - "base": "10.147.0.0/16", - "size": 24 - }, - { - "base": "10.146.0.0/16", - "size": 24 - } - ] -} -``` - -To apply new Docker daemon configuration, restart your Docker service with `systemctl restart docker`. - -Make sure you are using [valid private IP ranges](https://en.wikipedia.org/wiki/Reserved_IP_addresses), that is between these ranges: -- `10.0.0.0/8` (address range of `10.0.0.0`–`10.255.255.255`) -- `100.64.0.0/10` (address range of `100.64.0.0`–`100.127.255.255`) -- `172.16.0.0/12` (address range of `172.16.0.0`–`172.31.255.255`) -- `192.0.0.0/24` (address range of `192.0.0.0`–`192.0.0.255`) -- `192.168.0.0/16` (address range of `192.168.0.0`–`192.168.255.255`) -- `198.18.0.0/15` (address range of `198.18.0.0`–`198.19.255.255`) - -For further reading, you can see Matthew Stratiotto's article on [The definitive guide to docker's default-address-pools option](https://straz.to/2021-09-08-docker-address-pools/). - -## Docker Logs Disk Usage - -If you are suspecting persisted logs from Docker container logs consumes a lot of your disk space, you can configure the amount of persisted logs on Docker by configuring the `/etc/docker/daemon.json` file. If the file does not exists, you can create it yourself. - -```json -{ - "log-driver": "local", - "log-opts": {"max-size": "10m", "max-file": "3"} -} -``` - -To apply new Docker daemon configuration, restart your Docker service with `systemctl restart docker`. - -If you want to delete immediate Docker logs, you can execute this as `root` user: - -```shell -truncate -s 0 /var/lib/docker/containers/**/*-json.log -``` - -## Docker Image and Builder Cleanup - -Executing `./install.sh` will build a new Sentry Docker container, executing it often might cause Docker to consume your disk space. You can safely prune old or unneeded Docker containers, image, or builder to re-acquire used disk space. Executing these will not affect current running containers and volumes. - -```shell -docker container prune -docker builder prune -docker image prune --all -# WARNING: Executing "volume prune" might delete `sentry-vroom` volume as it's not an external volume. -docker volume prune -docker network prune -``` - -Append `-f` flag for no confirmation on deletions. - -## Other - -If you are still stuck, you can always visit our [GitHub issues](https://github.com/getsentry/self-hosted/issues) to search for existing issues or create a new issue and ask for help. You can also visit [Sentry Community Discord server](https://discord.gg/sentry) on #self-hosted channel. Please keep in mind that we expect the community to help itself, but Sentry employees also try to monitor and answer questions when they have time. diff --git a/develop-docs/self-hosted/troubleshooting/docker.mdx b/develop-docs/self-hosted/troubleshooting/docker.mdx new file mode 100644 index 0000000000000..0e144123af771 --- /dev/null +++ b/develop-docs/self-hosted/troubleshooting/docker.mdx @@ -0,0 +1,83 @@ +--- +title: Troubleshooting Docker +sidebar_title: Docker +sidebar_order: 3 +--- + +## Container Healthcheck + +There may be some circumstances which you may want to increase or decrease healthcheck interval, timeout or retries for your custom needs. This can be achieved by editing `HEALTHCHECK_INTERVAL`, `HEALTHCHECK_TIMEOUT`, `HEALTHCHECK_RETRIES` variables' values in `.env`. + +Occasionally, you might see an error like this +``` +container for service "${servicename}" is unhealthy +``` + +This can usually be resolved by running `docker compose down` and `docker compose up -d` or rerunning the install script. + +## Docker Network Conflicting IP Address + +Self-hosted Sentry is using Docker's bridge networking, in which use a specific private IP range. By default, Docker uses `172.17.0.0/16` range (`172.17.0.0`-`172.17.255.255`). This may cause conflict with your private network. You can change Docker's default IP range by configuring the `/etc/docker/daemon.json` file. If the file does not exists, you can create it yourself. + +Assuming your safe IP range is `10.147.0.0/16` and `10.146.0.0/16`, your configuration would be: + +```json +{ + "default-address-pools": [ + { + "base": "10.147.0.0/16", + "size": 24 + }, + { + "base": "10.146.0.0/16", + "size": 24 + } + ] +} +``` + +To apply new Docker daemon configuration, restart your Docker service with `systemctl restart docker`. + +Make sure you are using [valid private IP ranges](https://en.wikipedia.org/wiki/Reserved_IP_addresses), that is between these ranges: +- `10.0.0.0/8` (address range of `10.0.0.0`–`10.255.255.255`) +- `100.64.0.0/10` (address range of `100.64.0.0`–`100.127.255.255`) +- `172.16.0.0/12` (address range of `172.16.0.0`–`172.31.255.255`) +- `192.0.0.0/24` (address range of `192.0.0.0`–`192.0.0.255`) +- `192.168.0.0/16` (address range of `192.168.0.0`–`192.168.255.255`) +- `198.18.0.0/15` (address range of `198.18.0.0`–`198.19.255.255`) + +For further reading, you can see Matthew Stratiotto's article on [The definitive guide to docker's default-address-pools option](https://straz.to/2021-09-08-docker-address-pools/). + +## Logs Disk Usage + +If you are suspecting persisted logs from Docker container logs consumes a lot of your disk space, you can configure the amount of persisted logs on Docker by configuring the `/etc/docker/daemon.json` file. If the file does not exists, you can create it yourself. + +```json +{ + "log-driver": "local", + "log-opts": {"max-size": "10m", "max-file": "3"} +} +``` + +To apply new Docker daemon configuration, restart your Docker service with `systemctl restart docker`. + +If you want to delete immediate Docker logs, you can execute this as `root` user: + +```shell +truncate -s 0 /var/lib/docker/containers/**/*-json.log +``` + +## Image and Builder Cleanup + +Executing `./install.sh` will build a new Sentry Docker container, executing it often might cause Docker to consume your disk space. You can safely prune old or unneeded Docker containers, image, or builder to re-acquire used disk space. Executing these will not affect current running containers and volumes. + +```shell +docker container prune +docker builder prune +docker image prune --all +# WARNING: Executing "volume prune" might delete `sentry-vroom` volume as it's not an external volume. +docker volume prune +docker network prune +``` + +Append `-f` flag for no confirmation on deletions. diff --git a/develop-docs/self-hosted/troubleshooting/index.mdx b/develop-docs/self-hosted/troubleshooting/index.mdx new file mode 100644 index 0000000000000..270a182782343 --- /dev/null +++ b/develop-docs/self-hosted/troubleshooting/index.mdx @@ -0,0 +1,15 @@ +--- +title: Self-Hosted Troubleshooting +sidebar_title: Troubleshooting +sidebar_order: 9000 +--- + +Please keep in mind that the [self-hosted repository](https://github.com/getsentry/self-hosted) is geared towards low traffic loads (less than ~1 million submitted Sentry events per month). Folks needing larger setups or having event spikes can expand from here based on their specific needs and environments. If this is not your cup of tea, you are always welcome to [try out hosted Sentry](https://sentry.io/signup/). + +## General + +You can see the logs of each service by running `docker compose logs `. You can use the `-f` flag to "follow" the logs as they come in, and use the `-t` flag for timestamps. If you don't pass any service names, you will get the logs for all running services. See the [reference for the logs command](https://docs.docker.com/compose/reference/logs/) for more info. + +## Other + +If you are still stuck, you can always visit our [GitHub issues](https://github.com/getsentry/self-hosted/issues) to search for existing issues or create a new issue and ask for help. You can also visit [Sentry Community Discord server](https://discord.gg/sentry) on #self-hosted channel. Please keep in mind that we expect the community to help itself, but Sentry employees also try to monitor and answer questions when they have time. diff --git a/develop-docs/self-hosted/troubleshooting/kafka.mdx b/develop-docs/self-hosted/troubleshooting/kafka.mdx new file mode 100644 index 0000000000000..5ed42a0204430 --- /dev/null +++ b/develop-docs/self-hosted/troubleshooting/kafka.mdx @@ -0,0 +1,94 @@ +--- +title: Troubleshooting Kafka +sidebar_title: Kafka +sidebar_order: 2 +--- + +## Offset Out Of Range Error + +```log +Exception: KafkaError{code=OFFSET_OUT_OF_RANGE,val=1,str="Broker: Offset out of range"} +``` + +This happens where Kafka and the consumers get out of sync. Possible reasons are: + +1. Running out of disk space or memory +2. Having a sustained event spike that causes very long processing times, causing Kafka to drop messages as they go past the retention time +3. Date/time out of sync issues due to a restart or suspend/resume cycle + +### Recovery + +Note: These solutions may result in data loss when resetting the offset of the snuba consumers. + +#### Proper solution + +The _proper_ solution is as follows ([reported](https://github.com/getsentry/self-hosted/issues/478#issuecomment-666254392) by [@rmisyurev](https://github.com/rmisyurev)): + +1. Receive consumers list: + ```shell + docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --list + ``` +2. Get group info: + ```shell + docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --describe + ``` +3. Watching what is going to happen with offset by using dry-run (optional): + ```shell + docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --topic events --reset-offsets --to-latest --dry-run + ``` +4. Set offset to latest and execute: + ```shell + docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --topic events --reset-offsets --to-latest --execute + ``` + + +You can replace snuba-consumers with other consumer groups or events with other topics when needed. + + +#### Another option + +This option is as follows ([reported](https://github.com/getsentry/self-hosted/issues/1690) by [@gabn88](https://github.com/gabn88)): + +1. Set offset to latest and execute: + ```shell + docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --all-groups --all-topics --reset-offsets --to-latest --execute + ``` + +Unlike the proper solution, this involves resetting the offsets of all consumer groups and all topics. + +#### Nuclear option + +The _nuclear option_ is removing all Kafka-related volumes and recreating them which _will_ cause data loss. Any data that was pending there will be gone upon deleting these volumes. + +1. Stop the instance: + ```shell + docker compose down --volumes + ``` +2. Remove the Kafka & Zookeeper related volumes: + ```shell + docker volume rm sentry-kafka + docker volume rm sentry-zookeeper + ``` + + 3. Run the install script again: + ```shell + ./install.sh + ``` + 4. Start the instance: + ```shell + docker compose up -d + ``` +## Reducing disk usage + +If you want to reduce the disk space used by Kafka, you'll need to carefully calculate how much data you are ingesting, how much data loss you can tolerate and then follow the recommendations on [this awesome StackOverflow post](https://stackoverflow.com/a/52970982/90297) or [this post on our community forum](https://forum.sentry.io/t/sentry-disk-cleanup-kafka/11337/2?u=byk). + +You could, however, add these on the Kafka container's environment variables: +```yaml +services: + kafka: + # ... + environment: + KAFKA_LOG_RETENTION_HOURS: 24 + KAFKA_LOG_CLEANER_ENABLE: true + KAFKA_LOG_CLEANUP_POLICY: delete +``` diff --git a/develop-docs/self-hosted/troubleshooting/postgres.mdx b/develop-docs/self-hosted/troubleshooting/postgres.mdx new file mode 100644 index 0000000000000..4a83249b3c27a --- /dev/null +++ b/develop-docs/self-hosted/troubleshooting/postgres.mdx @@ -0,0 +1,32 @@ +--- +title: Troubleshooting Postgres +sidebar_title: Postgres +sidebar_order: 4 +--- + +## Nodestore table growing rapidly + +[Nodestore](https://develop.sentry.dev/backend/application-domains/nodestore/) is used to store key/value data. The `nodestore_node` table can grow rapidly, especially when heavily utilising the Performance Monitoring feature as trace data is stored in this table. + +The `nodestore_node` table is cleaned up as part of the `cleanup` task, however Postgres may not get a chance to vacuum the table (especially under heavy load), so even the rows may be deleted, they're still taking up space on disk. + +You can use `pg-repack` which repacks a table live by creating a new table and copying data across, before dropping the old one. You'll want to run this after the clean up script, and note that as it creates a table, **disk usage will spike before going back down**. + +An example script below: + +```shell +# Only keep the last 7 days of nodestore data. We heavily use performance monitoring. +docker compose run --rm -T web cleanup --days 7 -m nodestore -l debug +# This ensures pg-repack exists before running as the container gets recreated on upgrades +docker compose run --rm -T postgres bash -c "apt update && apt install -y --no-install-recommends postgresql-14-repack && su postgres -c 'pg_repack -E info -t nodestore_node'" +``` + +If the script above still does not clear up enough disk space, you can try the following in Postgres. This will lead to data loss in event details. _SENTRY_RETENTION_DAYS_ will need to be filled in manually. +```sql +DELETE FROM public.nodestore_node WHERE "timestamp" < NOW() - INTERVAL '[SENTRY_RETENTION_DAYS]'; +VACUUM FULL public.nodestore_node; +``` + +VACUUM FULL actively compacts tables by writing a complete new version of the table file with no dead space. This minimizes the size of the table, but can take a long time. It also requires extra disk space for the new copy of the table, until the operation completes. + +For higher throughput events, it is recommended to use community implementation of Nodestore that's backed by blob storage such as S3 or similar. diff --git a/develop-docs/self-hosted/troubleshooting/redis.mdx b/develop-docs/self-hosted/troubleshooting/redis.mdx new file mode 100644 index 0000000000000..1c228a39d797d --- /dev/null +++ b/develop-docs/self-hosted/troubleshooting/redis.mdx @@ -0,0 +1,7 @@ +--- +title: Troubleshooting Redis +sidebar_title: Redis +sidebar_order: 5 +--- + +Redis is used both as a transactional data store and a work queue of Celery in the self-hosted setup. For this reason, it may get overwhelmed during event spikes. We have made some significant improvements regarding this starting from version `20.10.1`. If you are still having issues, you may look into scaling out Redis itself or switching to a different Celery broker, such as [RabbitMQ](https://www.rabbitmq.com/). diff --git a/develop-docs/self-hosted/troubleshooting/sentry.mdx b/develop-docs/self-hosted/troubleshooting/sentry.mdx new file mode 100644 index 0000000000000..fc1298243b761 --- /dev/null +++ b/develop-docs/self-hosted/troubleshooting/sentry.mdx @@ -0,0 +1,53 @@ +--- +title: Troubleshooting Sentry +sidebar_title: Sentry +sidebar_order: 1 +--- + +## CSRF verification failed + +Since version 24.1.0, Sentry migrated to Django 4 which contains stricter CSRF protection. By default, the trusted CSRF origins is set to your `system.url-prefix`, but in some cases where your Sentry deployment can be accessed from multiple domains, you will need to configure `CSRF_TRUSTED_ORIGINS` on your `sentry.conf.py` file. + +```python +# Assuming your Sentry instance can be accessed from sentry.example.com, 10.100.10.10 and 127.0.0.1. +CSRF_TRUSTED_ORIGINS = ["https://sentry.example.com", "http://10.100.10.10", "http://127.0.0.1:9000"] +``` + +See [Django's documentation on CSRF](https://docs.djangoproject.com/en/4.2/ref/settings/#std-setting-CSRF_TRUSTED_ORIGINS) for further detail. + +## `sentry-data` volume not being cleaned up + +You may see the `sentry-data` taking too much disk space. You can clean it manually (or putting the cleanup cronjob in place). + +Find the Docker mountpoint for the volume by executing: +```bash +docker volume inspect sentry-data + +# Or if you prefer to do it directly (assuming you have `jq` on your system): +docker volume inspect sentry-data | jq -r .[0].Mountpoint +``` + +Then run the following command to remove the contents of the volume for the last 30 days (change the `30` to whatever you want, it's in days): +```bash +# `/var/lib/docker/volumes/sentry-data/_data` refers to the mountpoint of the volume +# from the output of the previous command. Change it if it's different. +find /var/lib/docker/volumes/sentry-data/_data -type f -mtime +30 -delete +``` + +## Workers + +If you are seeing an error such as + +``` +Background workers haven’t checked in recently. It seems that you have a backlog of 200 tasks. Either your workers aren’t running or you need more capacity. +``` + +you may benefit from using additional, dedicated workers. This is achieved by creating new `worker` services in `docker-compose.override.yml` and tying them to specific queues using the `-Q queue_name` argument. An example would be: + +```yaml +worker1: + << : *sentry_defaults + command: run worker -Q events.process_event +``` + +To see a more complete example, please see [a sample solution on our community forum](https://forum.sentry.io/t/how-to-clear-backlog-and-monitor-it/10715/14?u=byk). From 68a9fd5674236d06d55ebf35b33ecd1f4f943087 Mon Sep 17 00:00:00 2001 From: Reinaldy Rafli Date: Fri, 27 Dec 2024 17:35:30 +0700 Subject: [PATCH 2/4] docs(self-hosted): better title wording for troubleshooting --- develop-docs/self-hosted/troubleshooting/kafka.mdx | 2 +- develop-docs/self-hosted/troubleshooting/postgres.mdx | 2 +- develop-docs/self-hosted/troubleshooting/sentry.mdx | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/develop-docs/self-hosted/troubleshooting/kafka.mdx b/develop-docs/self-hosted/troubleshooting/kafka.mdx index 5ed42a0204430..17c8868f4983d 100644 --- a/develop-docs/self-hosted/troubleshooting/kafka.mdx +++ b/develop-docs/self-hosted/troubleshooting/kafka.mdx @@ -82,7 +82,7 @@ The _nuclear option_ is removing all Kafka-related volumes and recreating them w If you want to reduce the disk space used by Kafka, you'll need to carefully calculate how much data you are ingesting, how much data loss you can tolerate and then follow the recommendations on [this awesome StackOverflow post](https://stackoverflow.com/a/52970982/90297) or [this post on our community forum](https://forum.sentry.io/t/sentry-disk-cleanup-kafka/11337/2?u=byk). -You could, however, add these on the Kafka container's environment variables: +You could, however, add these on the Kafka container's environment variables (by [@csvan](https://github.com/getsentry/self-hosted/issues/3389#issuecomment-2453567691)): ```yaml services: kafka: diff --git a/develop-docs/self-hosted/troubleshooting/postgres.mdx b/develop-docs/self-hosted/troubleshooting/postgres.mdx index 4a83249b3c27a..69e9a2a60c92d 100644 --- a/develop-docs/self-hosted/troubleshooting/postgres.mdx +++ b/develop-docs/self-hosted/troubleshooting/postgres.mdx @@ -27,6 +27,6 @@ DELETE FROM public.nodestore_node WHERE "timestamp" < NOW() - INTERVAL '[SENTRY_ VACUUM FULL public.nodestore_node; ``` -VACUUM FULL actively compacts tables by writing a complete new version of the table file with no dead space. This minimizes the size of the table, but can take a long time. It also requires extra disk space for the new copy of the table, until the operation completes. +`VACUUM FULL` actively compacts tables by writing a complete new version of the table file with no dead space. This minimizes the size of the table, but can take a long time. It also requires extra disk space for the new copy of the table, until the operation completes. For higher throughput events, it is recommended to use community implementation of Nodestore that's backed by blob storage such as S3 or similar. diff --git a/develop-docs/self-hosted/troubleshooting/sentry.mdx b/develop-docs/self-hosted/troubleshooting/sentry.mdx index fc1298243b761..4591d56cad46c 100644 --- a/develop-docs/self-hosted/troubleshooting/sentry.mdx +++ b/develop-docs/self-hosted/troubleshooting/sentry.mdx @@ -34,7 +34,7 @@ Then run the following command to remove the contents of the volume for the last find /var/lib/docker/volumes/sentry-data/_data -type f -mtime +30 -delete ``` -## Workers +## Background workers haven't checked in recently If you are seeing an error such as From bdb58e789dd34ac866f429c383fdf25b046369dd Mon Sep 17 00:00:00 2001 From: Reinaldy Rafli Date: Thu, 2 Jan 2025 12:27:57 +0700 Subject: [PATCH 3/4] docs(self-hosted): component specific pagegrid --- develop-docs/self-hosted/troubleshooting/index.mdx | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/develop-docs/self-hosted/troubleshooting/index.mdx b/develop-docs/self-hosted/troubleshooting/index.mdx index 270a182782343..7f54104bb9ce6 100644 --- a/develop-docs/self-hosted/troubleshooting/index.mdx +++ b/develop-docs/self-hosted/troubleshooting/index.mdx @@ -10,6 +10,12 @@ Please keep in mind that the [self-hosted repository](https://github.com/getsent You can see the logs of each service by running `docker compose logs `. You can use the `-f` flag to "follow" the logs as they come in, and use the `-t` flag for timestamps. If you don't pass any service names, you will get the logs for all running services. See the [reference for the logs command](https://docs.docker.com/compose/reference/logs/) for more info. +## Component Specific + +Visit the following pages for troubleshooting guides for each component: + + + ## Other If you are still stuck, you can always visit our [GitHub issues](https://github.com/getsentry/self-hosted/issues) to search for existing issues or create a new issue and ask for help. You can also visit [Sentry Community Discord server](https://discord.gg/sentry) on #self-hosted channel. Please keep in mind that we expect the community to help itself, but Sentry employees also try to monitor and answer questions when they have time. From 0770d7e68338a7ae1df31095a8042098dd887b19 Mon Sep 17 00:00:00 2001 From: Reinaldy Rafli Date: Wed, 8 Jan 2025 20:11:06 +0700 Subject: [PATCH 4/4] Update docker.mdx Co-authored-by: Burak Yigit Kaya --- develop-docs/self-hosted/troubleshooting/docker.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/develop-docs/self-hosted/troubleshooting/docker.mdx b/develop-docs/self-hosted/troubleshooting/docker.mdx index 0e144123af771..a8719b078078d 100644 --- a/develop-docs/self-hosted/troubleshooting/docker.mdx +++ b/develop-docs/self-hosted/troubleshooting/docker.mdx @@ -13,7 +13,7 @@ Occasionally, you might see an error like this container for service "${servicename}" is unhealthy ``` -This can usually be resolved by running `docker compose down` and `docker compose up -d` or rerunning the install script. +This can usually be resolved by running `docker compose down` and `docker compose up --wait` or rerunning the install script. ## Docker Network Conflicting IP Address