Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for NDJSON file format #40

Closed
gabriel-vasile opened this issue Aug 30, 2019 · 5 comments · Fixed by #41
Closed

Add support for NDJSON file format #40

gabriel-vasile opened this issue Aug 30, 2019 · 5 comments · Fixed by #41
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@gabriel-vasile
Copy link
Owner

  1. Specify the MIME type and extension for which to add
    MIME type: application/x-ndjson
    extension: .ndjson
  2. Share an example file
 {"some":"thing"}
 {"foo":17,"bar":false,"quux":true}
 {"may":{"include":"nested","objects":["and","arrays"]}}
  1. Optionally, add a reference to the specification of the file format.
    https://github.com/ndjson/ndjson-spec
@gabriel-vasile gabriel-vasile added the enhancement New feature or request label Aug 30, 2019
@gabriel-vasile gabriel-vasile self-assigned this Aug 30, 2019
@gabriel-vasile gabriel-vasile added good first issue Good for newcomers help wanted Extra attention is needed labels Aug 30, 2019
@gabriel-vasile gabriel-vasile removed their assignment Aug 30, 2019
@divyanshgaba
Copy link

divyanshgaba commented Sep 1, 2019

Hi!
I would like to help with this one.
This is will be my very first contribution to an open-source project, hoping you could guide me if needed.
After going through the ndjson-spec, I suggest the following,

  1. Split the in bytes by \r\n and then each split by \n
  2. Split is valid if it does not contain \r and forms a valid JSON or is empty.
  3. If any split is invalid, return false. Otherwise, return true.

@gabriel-vasile
Copy link
Owner Author

Hi @divyanshgaba !
The algorithm seems good with one addition:

When detecting a file, the library limits itself to reading only the first 2048 bytes from the input. Because of this, it might happen that the last split is missing some data to be valid JSON. In this case we should return true.
This is similar to JSON detection: The input is scanned. If there is no error, or the error happens at the 2048th byte then we consider this is a JSON file that might have been cut at 2048 bytes. Else, if the error happened before reaching the ReadLimit, return false.

@divyanshgaba
Copy link

Understood! Should i start working on this?

@gabriel-vasile
Copy link
Owner Author

Yeah, sure!

@divyanshgaba
Copy link

I have raised a pull request #41, Kindly review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants