Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spans data schema #153

Merged
merged 4 commits into from
Jun 23, 2023
Merged

Add spans data schema #153

merged 4 commits into from
Jun 23, 2023

Conversation

dbanda
Copy link
Contributor

@dbanda dbanda commented Jun 22, 2023

Add spans data schema. This can help us validate messages in the spans data bag.

Also:

  • Fixed typo in the readme
  • Add title to the top level transaction message

@dbanda dbanda requested a review from a team as a code owner June 22, 2023 22:30
@github-actions
Copy link

github-actions bot commented Jun 22, 2023

versions in use:

The following repositories use one of the schemas you are editing. It is recommended to roll out schema changes in small PRs, meaning that if those used versions lag behind the latest, it is probably best to update those services before rolling out your change.

  • getsentry/sentry: pip:sentry-kafka-schemas==0.1.12
  • getsentry/snuba: pip:sentry-kafka-schemas==0.1.12

latest version: 0.1.12

benign changes

schemas/transactions.v1.schema.json

  • {"path": ".2.data.spans.?.data", "change": {"TypeAdd": {"added": "string"}}}
    
  • {"path": ".2.data.spans.?.data", "change": {"TypeAdd": {"added": "number"}}}
    
  • {"path": ".2.data.spans.?.data", "change": {"TypeAdd": {"added": "integer"}}}
    
  • {"path": ".2.data.spans.?.data", "change": {"TypeAdd": {"added": "array"}}}
    
  • {"path": ".2.data.spans.?.data", "change": {"TypeAdd": {"added": "boolean"}}}
    

✅ This PR should be safe to roll out to consumers first. Make sure to bump
the library in the following repos first:

getsentry/snuba

...then in the other repos:

getsentry/sentry

Take a look at the README for how to release a new version of sentry-kafka-schemas.

@lynnagara
Copy link
Member

lynnagara commented Jun 22, 2023

Is there a reason why all the spans/starfish experiments has to be on the transactions topic still, and we are not creating a separate one for the new pipeline? With all the schema changes that are going on, I think it's much safer to keep it separate.

These additional validations are not required on the transactions consumer.

@dbanda
Copy link
Contributor Author

dbanda commented Jun 22, 2023

Is there a reason why all the spans/starfish experiments has to be on the transactions topic still

@lynnagara eventually we will have spans on a new topic and ultimately spans will replace transactions. But right now, we decided to leave it there as we iterate. My understanding is starfish wants to test things and move fast. The issue is that we are stuffing a lot of things in the spans data bag and don't have any checking on it. I could create a separate schema for SpansData, but that would have to mean adding some customized code within the spans consumer to check the data field. Either way if fine with me, but I would like to have a schema defined somewhere so that we have a unified source of truth and catch typing bugs.

@untitaker
Copy link
Member

We have had a similar problem already. I think long-term we might want to have specialized schemas per consumer group.

With all the schema changes that are going on, I think it's much safer to keep it separate.

The schemas are not changing. We are just parsing more/less data out of the same payloads.

What is more concerning to me is that we are trying to make sense of this data while there is absolutely zero validation for spans data in Relay. The next buggy SDK could take down a spans consumer that tries to assume anything about the payloads.

@lynnagara
Copy link
Member

lynnagara commented Jun 22, 2023

My concern was coming from a place mostly unrelated to this schema change. I'm just not sure it's a good idea to build and iterate fast on things inside the existing pipeline just because it's "faster". We are adding risk into our existing transactions pipeline. Though if we definitely want a separate topic and schema eventually anyway, should we just go ahead and create it now?

@untitaker
Copy link
Member

We are adding risk into our existing transactions pipeline

yes, multiple schemas per topic would solve this. there's not really a risk of one consumer group impacting another if they do not share a schema

@lynnagara
Copy link
Member

@untitaker In this case, I don't think they should even be sharing a topic at all though. My understanding is that's a shortcut we are taking right now.

@untitaker
Copy link
Member

Currently it's only sentry-kafka-schemas that makes this dangerous. I don't think there's any other reason to split out topics right now

@lynnagara
Copy link
Member

lynnagara commented Jun 22, 2023

It isn't sentry-kafka-schemas that makes it dangerous though, as we are currently not enforcing the schema. I'm actually more worried about the changes we are doing in consumers and producers and generally changing the shape of messages.

Anyway, most of these concerns are unrelated with this change. This particular change LGTM so will approve.

@dbanda dbanda merged commit d288a50 into main Jun 23, 2023
15 checks passed
@dbanda dbanda deleted the dbanda/spans-schema branch June 23, 2023 18:05
lynnagara added a commit that referenced this pull request Sep 18, 2023
This reverts commit d288a50.

This was added because spans was part of transactions in the initial
implementation. It is a separate topic now.

This schema seems to be causing issues in prod since not all transactions events
actually conform to it.
lynnagara added a commit that referenced this pull request Sep 26, 2023
This reverts commit d288a50.

This was added because spans was part of transactions in the initial
implementation. It is a separate topic now.

This schema seems to be causing issues in prod since not all transactions events
actually conform to it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants