Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Pipeline Format #232

Closed
crberube opened this issue Mar 7, 2019 · 6 comments · Fixed by #233
Closed

Data Pipeline Format #232

crberube opened this issue Mar 7, 2019 · 6 comments · Fixed by #233

Comments

@crberube
Copy link
Contributor

crberube commented Mar 7, 2019

Right now there are a couple of quirks with the way that result data in formatted through the data pipeline. The biggest issue for us right now is the use of MarshalIdent on the JSON. I see that it was used in the past for some sort of Kafka Connector, but the result leaves us with JSON where each element begins on a new line with a prefix. This breaks a lot of data processing applications that expect JSON elements to be newline delimited.

A second quirk I noticed is that the JSON message frame saves the payload as a string of escaped JSON. I suspect this has something to do with the encryption option for Kafka? However, when things are not encrypted, it would be much cleaner if the all of the JSON was marshaled at once such that the payload field was simply another level of the overall JSON object as opposed to a string containing the JSON data.

I realize that these changes would amount to breaking changes in the data generated by the data pipeline, but would it be reasonable to add some sort of configuration option(s) to allow for these changes? This is something that I could address ASAP if we can figure out a way to cleanly handle this.

@zhouzhuojie
Copy link
Collaborator

zhouzhuojie commented Mar 7, 2019

I think it's safe to change from MarshalIndent to Marshal, no blockers.

For JSON message frame, we can add a field, for example, payload_raw_json with type json.RawMessage, and then env configuration to pick which field to use, what do you think?

@zhouzhuojie
Copy link
Collaborator

Related to #203

@crberube
Copy link
Contributor Author

crberube commented Mar 8, 2019

Sounds like a good solution, thanks! I will get a PR put together.

@zhouzhuojie
Copy link
Collaborator

I’m also working on a refactoring PR including this issue now, if you have time, I can use your review.

@zhouzhuojie zhouzhuojie mentioned this issue Mar 8, 2019
8 tasks
@zhouzhuojie
Copy link
Collaborator

@crberube #233

@crberube
Copy link
Contributor Author

crberube commented Mar 8, 2019

Awesome, I will take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants