Add spans data schema #153

dbanda · 2023-06-22T22:30:54Z

Add spans data schema. This can help us validate messages in the spans data bag.

Also:

Fixed typo in the readme
Add title to the top level transaction message

github-actions · 2023-06-22T22:31:49Z

versions in use:

The following repositories use one of the schemas you are editing. It is recommended to roll out schema changes in small PRs, meaning that if those used versions lag behind the latest, it is probably best to update those services before rolling out your change.

getsentry/sentry: pip:sentry-kafka-schemas==0.1.12
getsentry/snuba: pip:sentry-kafka-schemas==0.1.12

latest version: 0.1.12

benign changes

schemas/transactions.v1.schema.json

{"path": ".2.data.spans.?.data", "change": {"TypeAdd": {"added": "string"}}}

{"path": ".2.data.spans.?.data", "change": {"TypeAdd": {"added": "number"}}}

{"path": ".2.data.spans.?.data", "change": {"TypeAdd": {"added": "integer"}}}

{"path": ".2.data.spans.?.data", "change": {"TypeAdd": {"added": "array"}}}

{"path": ".2.data.spans.?.data", "change": {"TypeAdd": {"added": "boolean"}}}

✅ This PR should be safe to roll out to consumers first. Make sure to bump
the library in the following repos first:

getsentry/snuba

...then in the other repos:

getsentry/sentry

Take a look at the README for how to release a new version of sentry-kafka-schemas.

lynnagara · 2023-06-22T22:32:25Z

Is there a reason why all the spans/starfish experiments has to be on the transactions topic still, and we are not creating a separate one for the new pipeline? With all the schema changes that are going on, I think it's much safer to keep it separate.

These additional validations are not required on the transactions consumer.

dbanda · 2023-06-22T22:54:44Z

Is there a reason why all the spans/starfish experiments has to be on the transactions topic still

@lynnagara eventually we will have spans on a new topic and ultimately spans will replace transactions. But right now, we decided to leave it there as we iterate. My understanding is starfish wants to test things and move fast. The issue is that we are stuffing a lot of things in the spans data bag and don't have any checking on it. I could create a separate schema for SpansData, but that would have to mean adding some customized code within the spans consumer to check the data field. Either way if fine with me, but I would like to have a schema defined somewhere so that we have a unified source of truth and catch typing bugs.

untitaker · 2023-06-22T23:00:42Z

We have had a similar problem already. I think long-term we might want to have specialized schemas per consumer group.

With all the schema changes that are going on, I think it's much safer to keep it separate.

The schemas are not changing. We are just parsing more/less data out of the same payloads.

What is more concerning to me is that we are trying to make sense of this data while there is absolutely zero validation for spans data in Relay. The next buggy SDK could take down a spans consumer that tries to assume anything about the payloads.

lynnagara · 2023-06-22T23:03:35Z

My concern was coming from a place mostly unrelated to this schema change. I'm just not sure it's a good idea to build and iterate fast on things inside the existing pipeline just because it's "faster". We are adding risk into our existing transactions pipeline. Though if we definitely want a separate topic and schema eventually anyway, should we just go ahead and create it now?

untitaker · 2023-06-22T23:04:42Z

We are adding risk into our existing transactions pipeline

yes, multiple schemas per topic would solve this. there's not really a risk of one consumer group impacting another if they do not share a schema

lynnagara · 2023-06-22T23:05:31Z

@untitaker In this case, I don't think they should even be sharing a topic at all though. My understanding is that's a shortcut we are taking right now.

untitaker · 2023-06-22T23:06:36Z

Currently it's only sentry-kafka-schemas that makes this dangerous. I don't think there's any other reason to split out topics right now

lynnagara · 2023-06-22T23:21:17Z

It isn't sentry-kafka-schemas that makes it dangerous though, as we are currently not enforcing the schema. I'm actually more worried about the changes we are doing in consumers and producers and generally changing the shape of messages.

Anyway, most of these concerns are unrelated with this change. This particular change LGTM so will approve.

This reverts commit d288a50. This was added because spans was part of transactions in the initial implementation. It is a separate topic now. This schema seems to be causing issues in prod since not all transactions events actually conform to it.

dbanda added 2 commits June 22, 2023 13:58

add spans data bag to schema

aa2fa8a

add transactions title

9f108f3

dbanda requested a review from a team as a code owner June 22, 2023 22:30

move additional props inside type

9a986be

dbanda force-pushed the dbanda/spans-schema branch from cc3906e to 9a986be Compare June 22, 2023 22:37

style(lint): Auto commit lint changes

0de941f

lynnagara approved these changes Jun 23, 2023

View reviewed changes

dbanda merged commit d288a50 into main Jun 23, 2023
15 checks passed

dbanda deleted the dbanda/spans-schema branch June 23, 2023 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add spans data schema #153

Add spans data schema #153

dbanda commented Jun 22, 2023 •

edited

Loading

github-actions bot commented Jun 22, 2023 •

edited

Loading

lynnagara commented Jun 22, 2023 •

edited

Loading

dbanda commented Jun 22, 2023

untitaker commented Jun 22, 2023

lynnagara commented Jun 22, 2023 •

edited

Loading

untitaker commented Jun 22, 2023

lynnagara commented Jun 22, 2023

untitaker commented Jun 22, 2023

lynnagara commented Jun 22, 2023 •

edited

Loading

Add spans data schema #153

Add spans data schema #153

Conversation

dbanda commented Jun 22, 2023 • edited Loading

github-actions bot commented Jun 22, 2023 • edited Loading

lynnagara commented Jun 22, 2023 • edited Loading

dbanda commented Jun 22, 2023

untitaker commented Jun 22, 2023

lynnagara commented Jun 22, 2023 • edited Loading

untitaker commented Jun 22, 2023

lynnagara commented Jun 22, 2023

untitaker commented Jun 22, 2023

lynnagara commented Jun 22, 2023 • edited Loading

dbanda commented Jun 22, 2023 •

edited

Loading

github-actions bot commented Jun 22, 2023 •

edited

Loading

lynnagara commented Jun 22, 2023 •

edited

Loading

lynnagara commented Jun 22, 2023 •

edited

Loading

lynnagara commented Jun 22, 2023 •

edited

Loading