Skip to content

Conversation

@ayirr7
Copy link
Contributor

@ayirr7 ayirr7 commented May 16, 2025

As I see it there are 3 options for the API design re: message serializer step:

  1. Don't have a separate Serializer() step at all, have it embedded in the sink. The sink will determine from the topic what format to serialize the message into (assuming that the sink is a stream sink).

  2. Support a separate Serializer(), but figure out the schema type from the topic name from the streaming pipeline as defined by the user. So the user does not need to tell us explicitly whether their data needs to be serialized into JSON or protobuf. Would have to figure out how we can implement this.

  3. Support a separate Serializer() and have the user tell us explicitly the message schema type.

This PR goes with Option 3 right now because:

  • The pattern is consistent with how we do Parsing. It's a separate step, and user has flexibility to plug it in wherever in the pipeline
  • Passing in the schema type allows for some flexibility; this schema type does not have then be the same as the output topic's/sink's expected schema
  • Easiest iteration on the current state of API

TODO:

  • Add a couple of more test cases with slightly more complicated pipelines (e.g. a Map based on the contents of the Protobuf message structure)

@ayirr7 ayirr7 changed the title (not ready for review) Protobuf types feat: Add support for Protobuf type in API May 19, 2025
Copy link
Collaborator

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the design. As long as we have an explicit Parser step, it makes sense to have a Serializer. We can revisit whether to expose those steps later.

Please consider not depending on sentry-protos. It seems you only need them for examples. You should be able to create some example protobuf messages without.

types-jsonschema,
"sentry-kafka-schemas>=1.2.0",
"types-protobuf",
"sentry-protos>=0.1.74"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is FSL, we cannot import it in an Apache 2.0 library

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it right that this is only used for tests? in that case why not write your own protobuf schema?

Copy link
Contributor Author

@ayirr7 ayirr7 May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that, I just relied on existing ones out of ease of use and didn't know about the license issue

elif schema_type is MessageSchema.JSON:
return json.dumps(payload).encode("utf-8")
else:
raise Exception(f"Unknown codec / message schema type {schema_type}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can catch this with the type checker with assert_never

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants