Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Avro Message Parser #41

Closed
ghost opened this issue Nov 5, 2014 · 4 comments
Closed

Support Avro Message Parser #41

ghost opened this issue Nov 5, 2014 · 4 comments

Comments

@ghost
Copy link

ghost commented Nov 5, 2014

Secor only supports Thrift and JSON messages out of the box right now. Secor should provide a configurable Avro Message Parser which uses Avros GenericRecord parser and a configurable Timestamp field.

@lefthandmagic
Copy link
Contributor

I guess you can write your custom parser and override configs to use it. Maybe also submit a PR so that it can be included in the default parser's secor ships with if it's reusable I guess.

@ghost
Copy link
Author

ghost commented Nov 17, 2014

I've looked into building a custom parser using Avro's GenericRecord. We need a Schema repository for associating Kafka topics with Avro Schemas in order to deserialize the records within a single record parser. Camus uses the same concept and Avro has a very active ticket for implementing the Schema Repo https://issues.apache.org/jira/browse/AVRO-1124

To get something up and running now, I think the best approach right now would be to introduce an interface for the Schema Repo and just have an implementation set up by local configuration. I'll work on a PR for this.

@silasdavis
Copy link

I am looking at exactly this problem and have run Camus. I would like to flush to S3 based on an size-based upload policy and also partition based on properties of the (Avro) message, which it looks like Secor allows whereas Camus does not.

In our case we have a static in-memory repository of Avro schemas indexed by the an 8-bit schema fingerprint. We use this fingerprint as the first 8-bits of the Kafka message and can use this to look up the schema to decode the rest of the message. Since schema creation and migration is linked to our source control process this works for us. Clearly a dynamically updateable schema repository is necessary if you want to aggregate arbitrary Avro messages without redeploying code.

@upio I am about to look into implementing and Avro message parser for Secor. If you get in touch soon we may be able to avoid duplicating effort.

@ghost
Copy link
Author

ghost commented Nov 19, 2014

@silasdavis It sounds like you're further along with this than I am. I haven't used Camus, only looked at how they use the Schema Repositories. So far all i've done is hook up a basic "repository" that uses a hard-coded configuration to associate a Kafka topic with a Schema. Basically an interface and a HashMap ;) I'd be interested in seeing what you have learned from using Camus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants