-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add spans data schema #153
Conversation
versions in use: The following repositories use one of the schemas you are editing. It is recommended to roll out schema changes in small PRs, meaning that if those used versions lag behind the latest, it is probably best to update those services before rolling out your change.
latest version: 0.1.12 benign changesschemas/transactions.v1.schema.json
✅ This PR should be safe to roll out to consumers first. Make sure to bump
...then in the other repos:
Take a look at the README for how to release a new version of sentry-kafka-schemas. |
Is there a reason why all the spans/starfish experiments has to be on the transactions topic still, and we are not creating a separate one for the new pipeline? With all the schema changes that are going on, I think it's much safer to keep it separate. These additional validations are not required on the transactions consumer. |
cc3906e
to
9a986be
Compare
@lynnagara eventually we will have spans on a new topic and ultimately spans will replace transactions. But right now, we decided to leave it there as we iterate. My understanding is starfish wants to test things and move fast. The issue is that we are stuffing a lot of things in the spans data bag and don't have any checking on it. I could create a separate schema for |
We have had a similar problem already. I think long-term we might want to have specialized schemas per consumer group.
The schemas are not changing. We are just parsing more/less data out of the same payloads. What is more concerning to me is that we are trying to make sense of this data while there is absolutely zero validation for spans data in Relay. The next buggy SDK could take down a spans consumer that tries to assume anything about the payloads. |
My concern was coming from a place mostly unrelated to this schema change. I'm just not sure it's a good idea to build and iterate fast on things inside the existing pipeline just because it's "faster". We are adding risk into our existing transactions pipeline. Though if we definitely want a separate topic and schema eventually anyway, should we just go ahead and create it now? |
yes, multiple schemas per topic would solve this. there's not really a risk of one consumer group impacting another if they do not share a schema |
@untitaker In this case, I don't think they should even be sharing a topic at all though. My understanding is that's a shortcut we are taking right now. |
Currently it's only sentry-kafka-schemas that makes this dangerous. I don't think there's any other reason to split out topics right now |
It isn't sentry-kafka-schemas that makes it dangerous though, as we are currently not enforcing the schema. I'm actually more worried about the changes we are doing in consumers and producers and generally changing the shape of messages. Anyway, most of these concerns are unrelated with this change. This particular change LGTM so will approve. |
This reverts commit d288a50. This was added because spans was part of transactions in the initial implementation. It is a separate topic now. This schema seems to be causing issues in prod since not all transactions events actually conform to it.
Add spans data schema. This can help us validate messages in the spans data bag.
Also: