Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transactions, replays, attachments and profiles are not visible in the stats section #55786

Open
kadet1090 opened this issue Sep 5, 2023 · 13 comments

Comments

@kadet1090
Copy link

kadet1090 commented Sep 5, 2023

Self-Hosted Version

23.7.2

CPU Architecture

x86_64

Docker Version

non-important (k8s cluster)

Docker Compose Version

non-important (k8s cluster)

Steps to Reproduce

Yes, I know that I've got non supported setup - I'm not asking for solving my particular case, rather for more information where to begin troubleshooting which should be useful if anyone has similar problem.

I do not know how to reproduce this, probably this is very specific issue - I'm looking for guidance where to look. I'd gladly attach any logs needed - I need to know at which container I should be looking :)

Expected Result

Everything with my setup works fine, except for the stats - only errors are collected and counted correctly. Other metrics, such as transactions or profiles are not counted and are always displayed as 0. Where should I begin to look for the issue? Transactions are visible in the "Performance", could be seen in dashboards and "Discovery", only place that says that I've got no transactions is stats.

AFAIR it stopped working after I've updated to 23.7.0, before that it was ok.

So the question is - how stats are counted, so I could pinpoint the issue? I'll describe solution for anyone to use where found.

image
image

Actual Result

Stats should be counted for all event types.

Event ID

No response

@zetaab
Copy link

zetaab commented Sep 6, 2023

we are facing same issue currently. Anyways the organisation stats api endpoint is this https://github.com/getsentry/sentry/blob/master/src/sentry/api/endpoints/organization_stats_v2.py#L134 this is talking about "outcomes".

Also https://github.com/getsentry/snuba/blob/master/docs/source/architecture/overview.rst#sessions-and-outcomes says

while Outcomes mainly provide data to the Sentry stats page.
Both pipelines have their own Kafka topic, Kafka consumer and they write on their own table in Clickhouse.

going to clickhouse and checking the stats:

select * from outcomes_hourly_dist order by timestamp desc limit 1000;
select * from outcomes_raw_dist order by timestamp desc limit 1000;

which clearly shows that stats are not flowing in for some reason.

@zetaab
Copy link

zetaab commented Sep 6, 2023

checking the outcomes topic from kafka

start some container which has kafka:

kubectl run kafka --image=quay.io/strimzi/kafka:0.34.0-kafka-3.4.0 -i --tty --rm -- sh

first get max offset (and partitions) for topic:
bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list <broker>:9092 --topic outcomes

in my case the result is

sh-4.4$ bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list <broker>:9092 --topic outcomes
outcomes:0:7409576

so the max offset is 7409576

if I check the content using bin/kafka-console-consumer.sh --bootstrap-server <broker>:9092 --topic offset --offset 7409576 --partition 0 I can see that there are newer content.

So what is wrong here, lets check the logs from snuba-outcomes-consumer:

% kubectl logs sentry-snuba-outcomes-consumer-85d596c668-r4875
2023-09-06 05:11:10,714 Initializing Snuba...
2023-09-06 05:11:26,316 Snuba initialization took 15.598081367090344s
2023-09-06 05:11:26,329 Consumer Starting
2023-09-06 05:11:26,330 librdkafka log level: 6
2023-09-06 05:11:27,261 New partitions assigned: {Partition(topic=Topic(name='outcomes'), index=0): 7236764}
2023-09-06 05:31:53,612 Partitions to revoke: [Partition(topic=Topic(name='outcomes'), index=0)]
2023-09-06 05:31:53,613 Closing <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fc1f54ce2b0>...
2023-09-06 05:31:53,615 Waiting for <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fc1f54ce2b0> to exit...
2023-09-06 05:31:58,616 Timed out with 2 futures in queue
2023-09-06 05:31:58,617 <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fc1f54ce2b0> exited successfully, releasing assignment.
2023-09-06 05:31:58,618 Partition revocation complete.
2023-09-06 05:31:58,625 New partitions assigned: {Partition(topic=Topic(name='outcomes'), index=0): 7247390}
2023-09-06 05:32:14,649 Partitions to revoke: [Partition(topic=Topic(name='outcomes'), index=0)]
2023-09-06 05:32:14,650 Closing <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fc1fba0b310>...
2023-09-06 05:32:14,650 Waiting for <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fc1fba0b310> to exit...
2023-09-06 05:32:19,710 Timed out with 2 futures in queue
2023-09-06 05:32:19,718 <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fc1fba0b310> exited successfully, releasing assignment.
2023-09-06 05:32:19,719 Partition revocation complete.
2023-09-06 05:32:19,727 New partitions assigned: {Partition(topic=Topic(name='outcomes'), index=0): 7247519}

like we can clearly see, snuba-outcomes-consumer is using offset 7247519 which is about 200000 messages behind the current. So for some reason snuba-outcomes-consumer is stuck.

When starting the outcomes-consumer component with --log-level DEBUG I can see following

2023-09-06 05:49:39,042 Caught MessageRejected() while submitting BrokerValue(_BrokerValue__payload=KafkaPayload(key=None, value=b'{"timestamp":"2023-09-05T15:32:49.000000Z","org_id":5,"project_id":89,"key_id":93,"outcome":0,"reason":null,"event_id":"9fa05181447e4b088a29abc2bd045e8c","category":9,"quantity":1}', headers=[]), partition=Partition(topic=Topic(name='outcomes'), index=0), offset=7256468, timestamp=datetime.datetime(2023, 9, 5, 18, 53, 45, 417000)), pausing consumer...

so for some reason kafka is rejecting data (disk usage currently 88% might be one reason).

@zetaab
Copy link

zetaab commented Sep 6, 2023

now we have more disk in kafka. Looks like outcomes started to execute backlog and clickhouse is receiving new rows

@kadet1090
Copy link
Author

I've checked this services in my setup, and I don't see any lag, offset seems also fine but there are still no stats. So our problems are probably different in nature. Thanks for starting point, I'll look into that and if I found anything that can be useful I'll add another comment.

@hubertdeng123
Copy link
Member

I'm not too familiar with outcomes myself, but can redirect to the team in particular that may have more information here.

@getsantry
Copy link
Contributor

getsantry bot commented Sep 6, 2023

Assigning to @getsentry/support for routing ⏲️

@hubertdeng123 hubertdeng123 transferred this issue from getsentry/self-hosted Sep 6, 2023
@getsantry
Copy link
Contributor

getsantry bot commented Sep 6, 2023

Routing to @getsentry/product-owners-stats for triage ⏲️

@Dhrumil-Sentry
Copy link

@hubertdeng123 This is something likely to do with how self-hosted sentry is deployed - we can add it to the backlog for an investigation later

@hubertdeng123
Copy link
Member

hubertdeng123 commented Sep 7, 2023

Thanks @Dhrumil-Sentry! I agree, as they're using an unsupported version so something is going wrong with how it is deployed. They'd like to know more of the technical details around what may be going wrong. Looks like the stats page pulls directly from snuba, so if there is no information there that's the main issue

@hubertdeng123
Copy link
Member

Just to confirm, @zetaab are you using self-hosted on docker compose? And Stats is working fine for you now?

@zetaab
Copy link

zetaab commented Sep 7, 2023

We are using kubernetes https://github.com/sentry-kubernetes/charts/tree/develop/sentry

And stats are working

@kadet1090
Copy link
Author

kadet1090 commented Sep 7, 2023

My setup is based on the official, docker-compose.yaml which I simply "translated" to k8s manifests (I've also looked at chart linked above, but it had more infrastructure that my org requires). So things that are actually different are storage and networking - but both of those things are probably not the cause as everything else works. Something that is very strange for me is that in my setup Errors are counted correctly, and also Replays seem to work now.

As for the logs:

  • kafka: https://bin.kadet.net/KrpR
  • snuba-outcomes-consumer (same stuff, repeater over and over again, no visible errors or warningas)
     2023-09-07 17:20:26,227 http://clickhouse:8123 "POST /?load_balancing=in_order&insert_distributed_sync=1&    query=INSERT+INTO+default.outcomes_raw_dist++FORMAT+JSONEachRow HTTP/1.1" 200 None
     2023-09-07 17:20:26,227 Received response for <HTTPWriteBatch: [] rows (522 bytes)>.
     2023-09-07 17:20:26,228 Waited 0.1904 seconds for 3 rows to be written to <HTTPBatchWriter: default.outcomes_raw_dist on clickhouse:8123>.
     2023-09-07 17:20:27,157 Waited 0.0029 seconds for offsets to be committed to <arroyo.backends.kafka.consumer.KafkaConsumer object at 0x7f5d7cf9aee0>.
     2023-09-07 17:20:28,159 Finished sending data from <HTTPWriteBatch: [] rows (348 bytes)>.
     2023-09-07 17:20:28,399 http://clickhouse:8123 "POST /?load_balancing=in_order&insert_distributed_sync=1&query=INSERT+INTO+default.outcomes_raw_dist++FORMAT+JSONEachRow HTTP/1.1" 200 None
     2023-09-07 17:20:28,400 Received response for <HTTPWriteBatch: [] rows (348 bytes)>.
     2023-09-07 17:20:28,400 Waited 0.2426 seconds for 2 rows to be written to <HTTPBatchWriter: default.outcomes_raw_dist on clickhouse:8123>.
     2023-09-07 17:20:29,160 Waited 0.0019 seconds for offsets to be committed to <arroyo.backends.kafka.consumer.KafkaConsumer object at 0x7f5d7cf9aee0>.
    
  • snuba migrations list https://bin.kadet.net/R6zq

@kadet1090
Copy link
Author

Okey, to iterate further, I've got data in the outcomes_raw_dist table. I've looked up categories in the https://github.com/getsentry/relay/blob/509fcdddf93132afd1250b83e375dceee038e4db/relay-base-schema/src/data_category.rs#L15

sentry-clickhouse :) SELECT * FROM default.outcomes_raw_dist WHERE category = 2 ORDER BY timestamp desc LIMIT 10;

SELECT *
FROM default.outcomes_raw_dist
WHERE category = 2
ORDER BY timestamp DESC
LIMIT 10

Query id: 5c4bd723-95eb-4ada-820b-a9aefd286619

┌─org_id─┬─project_id─┬─key_id─┬───────────timestamp─┬─outcome─┬─reason──────┬─event_id─┬─quantity─┬─category─┬─size─┐
│      1 │          1 │      1 │ 2023-09-07 17:33:00 │       5 │ sample_rate │ ᴺᵁᴸᴸ     │        5 │        2 │    0 │
└────────┴────────────┴────────┴─────────────────────┴─────────┴─────────────┴──────────┴──────────┴──────────┴──────┘
┌─org_id─┬─project_id─┬─key_id─┬───────────timestamp─┬─outcome─┬─reason──────┬─event_id─┬─quantity─┬─category─┬─size─┐
│      1 │          4 │      4 │ 2023-09-07 10:59:00 │       5 │ sample_rate │ ᴺᵁᴸᴸ     │        1 │        2 │    0 │
│      1 │          1 │      1 │ 2023-09-07 10:56:00 │       5 │ sample_rate │ ᴺᵁᴸᴸ     │        3 │        2 │    0 │
│      1 │          4 │      4 │ 2023-09-07 10:54:00 │       5 │ sample_rate │ ᴺᵁᴸᴸ     │        4 │        2 │    0 │
│      1 │          4 │      4 │ 2023-09-07 10:53:00 │       5 │ sample_rate │ ᴺᵁᴸᴸ     │        1 │        2 │    0 │
│      1 │          4 │      4 │ 2023-09-07 10:52:00 │       5 │ sample_rate │ ᴺᵁᴸᴸ     │        2 │        2 │    0 │
│      1 │          4 │      4 │ 2023-09-07 10:51:00 │       5 │ sample_rate │ ᴺᵁᴸᴸ     │        1 │        2 │    0 │
│      1 │          1 │      1 │ 2023-09-07 05:22:00 │       5 │ sample_rate │ ᴺᵁᴸᴸ     │        3 │        2 │    0 │
│      1 │          1 │      1 │ 2023-09-06 07:48:00 │       5 │ sample_rate │ ᴺᵁᴸᴸ     │        2 │        2 │    0 │
│      1 │          1 │      1 │ 2023-09-06 07:46:00 │       5 │ sample_rate │ ᴺᵁᴸᴸ     │        2 │        2 │    0 │
└────────┴────────────┴────────┴─────────────────────┴─────────┴─────────────┴──────────┴──────────┴──────────┴──────┘

10 rows in set. Elapsed: 0.028 sec. Processed 39.25 thousand rows, 2.24 MB (1.42 million rows/s., 81.06 MB/s.)
sentry-clickhouse :) SELECT * FROM default.outcomes_raw_dist WHERE category = 9 ORDER BY timestamp desc LIMIT 10;

SELECT *
FROM default.outcomes_raw_dist
WHERE category = 9
ORDER BY timestamp DESC
LIMIT 10

Query id: 6c6c145c-6c66-4b26-8e87-426a4c4b934b

┌─org_id─┬─project_id─┬─key_id─┬───────────timestamp─┬─outcome─┬─reason─┬─event_id─────────────────────────────┬─quantity─┬─category─┬─size─┐
│      1 │          4 │      4 │ 2023-09-07 17:40:29 │       0 │ ᴺᵁᴸᴸ   │ 562a3984-8051-443f-ad1e-a3080e21a824 │        1 │        9 │    0 │
└────────┴────────────┴────────┴─────────────────────┴─────────┴────────┴──────────────────────────────────────┴──────────┴──────────┴──────┘
┌─org_id─┬─project_id─┬─key_id─┬───────────timestamp─┬─outcome─┬─reason─┬─event_id─────────────────────────────┬─quantity─┬─category─┬─size─┐
│      1 │          2 │      2 │ 2023-09-07 17:40:26 │       0 │ ᴺᵁᴸᴸ   │ 33e8d3ac-15ea-4d4a-a772-cf558d39078b │        1 │        9 │    0 │
└────────┴────────────┴────────┴─────────────────────┴─────────┴────────┴──────────────────────────────────────┴──────────┴──────────┴──────┘
┌─org_id─┬─project_id─┬─key_id─┬───────────timestamp─┬─outcome─┬─reason─┬─event_id─────────────────────────────┬─quantity─┬─category─┬─size─┐
│      1 │          4 │      4 │ 2023-09-07 17:19:09 │       0 │ ᴺᵁᴸᴸ   │ f8f3404a-7c98-482b-bfcb-a4a312477f73 │        1 │        9 │    0 │
│      1 │          4 │      4 │ 2023-09-07 17:16:15 │       0 │ ᴺᵁᴸᴸ   │ be9a5f06-094e-4bab-86cf-5574eca398dc │        1 │        9 │    0 │
│      1 │          2 │      2 │ 2023-09-07 17:16:13 │       0 │ ᴺᵁᴸᴸ   │ e455fbf5-335c-487d-83a8-ebf83581d2d5 │        1 │        9 │    0 │
│      1 │          4 │      4 │ 2023-09-07 17:09:16 │       0 │ ᴺᵁᴸᴸ   │ 997e0c2d-da72-4d29-916a-e6685f8b7d9a │        1 │        9 │    0 │
│      1 │          4 │      4 │ 2023-09-07 17:02:16 │       0 │ ᴺᵁᴸᴸ   │ d4527801-ad16-45dc-9b3e-c7dbcf197c83 │        1 │        9 │    0 │
│      1 │          4 │      4 │ 2023-09-07 16:54:01 │       0 │ ᴺᵁᴸᴸ   │ a633a3ac-10f6-4954-abb8-56e27cf25aa8 │        1 │        9 │    0 │
│      1 │          4 │      4 │ 2023-09-07 16:51:05 │       0 │ ᴺᵁᴸᴸ   │ 886afd06-767d-4e29-acdc-9772b9e092b2 │        1 │        9 │    0 │
│      1 │          2 │      2 │ 2023-09-07 16:47:47 │       0 │ ᴺᵁᴸᴸ   │ 70b28e06-0904-4547-99c9-93fc44b7c9d2 │        1 │        9 │    0 │
└────────┴────────────┴────────┴─────────────────────┴─────────┴────────┴──────────────────────────────────────┴──────────┴──────────┴──────┘

Data is also available in _hourly variant:

sentry-clickhouse :) SELECT * FROM default.outcomes_hourly_dist WHERE category = 9 ORDER BY timestamp desc LIMIT 20;

SELECT *
FROM default.outcomes_hourly_dist
WHERE category = 9
ORDER BY timestamp DESC
LIMIT 20

Query id: 020e4859-8a9e-4fcb-959d-512f0d2ea43a

┌─org_id─┬─project_id─┬─key_id─┬───────────timestamp─┬─outcome─┬─reason─┬─times_seen─┬─quantity─┬─category─┬─bytes_received─┐
│      1 │          2 │      2 │ 2023-09-07 17:00:00 │       0 │ none   │          2 │        2 │        9 │              0 │
│      1 │          4 │      4 │ 2023-09-07 17:00:00 │       0 │ none   │          7 │        7 │        9 │              0 │
│      1 │          2 │      2 │ 2023-09-07 16:00:00 │       0 │ none   │          8 │        8 │        9 │              0 │
│      1 │          4 │      4 │ 2023-09-07 16:00:00 │       0 │ none   │          8 │        8 │        9 │              0 │
│      1 │          2 │      2 │ 2023-09-07 15:00:00 │       0 │ none   │          6 │        6 │        9 │              0 │
│      1 │          4 │      4 │ 2023-09-07 15:00:00 │       0 │ none   │          3 │        3 │        9 │              0 │
│      1 │          2 │      2 │ 2023-09-07 14:00:00 │       0 │ none   │         12 │       12 │        9 │              0 │
│      1 │          4 │      4 │ 2023-09-07 14:00:00 │       0 │ none   │          7 │        7 │        9 │              0 │
│      1 │          4 │      4 │ 2023-09-07 13:00:00 │       0 │ none   │         11 │       11 │        9 │              0 │
│      1 │          2 │      2 │ 2023-09-07 13:00:00 │       0 │ none   │         20 │       20 │        9 │              0 │
│      1 │         11 │     11 │ 2023-09-07 12:00:00 │       0 │ none   │          1 │        1 │        9 │              0 │
│      1 │          2 │      2 │ 2023-09-07 12:00:00 │       0 │ none   │         30 │       30 │        9 │              0 │
│      1 │          4 │      4 │ 2023-09-07 12:00:00 │       0 │ none   │          9 │        9 │        9 │              0 │
│      1 │          2 │      2 │ 2023-09-07 11:00:00 │       0 │ none   │         17 │       17 │        9 │              0 │
│      1 │          4 │      4 │ 2023-09-07 11:00:00 │       0 │ none   │         10 │       10 │        9 │              0 │
│      1 │          2 │      2 │ 2023-09-07 10:00:00 │       0 │ none   │         46 │       46 │        9 │              0 │
│      1 │          4 │      4 │ 2023-09-07 10:00:00 │       0 │ none   │         20 │       20 │        9 │              0 │
│      1 │          2 │      2 │ 2023-09-07 09:00:00 │       0 │ none   │         27 │       27 │        9 │              0 │
│      1 │          4 │      4 │ 2023-09-07 09:00:00 │       0 │ none   │         16 │       16 │        9 │              0 │
│      1 │         11 │     11 │ 2023-09-07 08:00:00 │       0 │ none   │          1 │        1 │        9 │              0 │
└────────┴────────────┴────────┴─────────────────────┴─────────┴────────┴────────────┴──────────┴──────────┴────────────────┘

20 rows in set. Elapsed: 0.022 sec. Processed 2.31 thousand rows, 127.62 KB (107.03 thousand rows/s., 5.92 MB/s.)

So... I'm puzzled, problem must be on the reading side, but where could it be?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Status: No status
Development

No branches or pull requests

4 participants