New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for transforming JSON #16

Open
rufuspollock opened this Issue Jul 7, 2013 · 17 comments

Comments

Projects
None yet
6 participants
@rufuspollock
Member

rufuspollock commented Jul 7, 2013

Support JSON as an input format. We'd anticipate this being a structure like:

[
  { record }
  { record }
]
@trickvi

This comment has been minimized.

Show comment
Hide comment
@trickvi

trickvi Jul 8, 2013

Member

Just to be certain. I hope. you mean we'd anticipate top level object:

{
  [
    ...
  ]
}

That's the more secure JSON way if things are eval()'d

Member

trickvi commented Jul 8, 2013

Just to be certain. I hope. you mean we'd anticipate top level object:

{
  [
    ...
  ]
}

That's the more secure JSON way if things are eval()'d

@rufuspollock

This comment has been minimized.

Show comment
Hide comment
@rufuspollock

rufuspollock Jul 8, 2013

Member

Why would be eval'ing the JSON we'd be parsing it right ...

Member

rufuspollock commented Jul 8, 2013

Why would be eval'ing the JSON we'd be parsing it right ...

@trickvi

This comment has been minimized.

Show comment
Hide comment
@trickvi

trickvi Jul 11, 2013

Member

Ah... sorry I thought this would be the output format as well and we don't know whether users will eval or not. My mistake.

Member

trickvi commented Jul 11, 2013

Ah... sorry I thought this would be the output format as well and we don't know whether users will eval or not. My mistake.

@rufuspollock

This comment has been minimized.

Show comment
Hide comment
@rufuspollock

rufuspollock Jul 15, 2013

Member

no problem ;-) and great that you read this :-0

Member

rufuspollock commented Jul 15, 2013

no problem ;-) and great that you read this :-0

@rufuspollock

This comment has been minimized.

Show comment
Hide comment
@rufuspollock

rufuspollock Nov 17, 2013

Member

@maxogden what's your best suggestion for a good node library for parsing stream json - i'd assumed it would be @dominictarr's https://github.com/dominictarr/JSONStream

I also guess it would nice to support line separated json too :-)

Member

rufuspollock commented Nov 17, 2013

@maxogden what's your best suggestion for a good node library for parsing stream json - i'd assumed it would be @dominictarr's https://github.com/dominictarr/JSONStream

I also guess it would nice to support line separated json too :-)

@maxogden

This comment has been minimized.

Show comment
Hide comment
@maxogden

maxogden Nov 17, 2013

JSONStream is good but not perfect (has memory leaks iirc). More recently there's a library called oboe.js that looks good but I haven't tried it yet.

For line separated json (aka ldjson/ndjson) I have a module called ldjson-steam

Sent from my iPhone

On Nov 17, 2013, at 1:15 AM, Rufus Pollock notifications@github.com wrote:

@maxogden what's your best suggestion for a good node library for parsing stream json - i'd assumed it would be @dominictarr's https://github.com/dominictarr/JSONStream

I also guess it would nice to support line separated json too :-)


Reply to this email directly or view it on GitHub.

maxogden commented Nov 17, 2013

JSONStream is good but not perfect (has memory leaks iirc). More recently there's a library called oboe.js that looks good but I haven't tried it yet.

For line separated json (aka ldjson/ndjson) I have a module called ldjson-steam

Sent from my iPhone

On Nov 17, 2013, at 1:15 AM, Rufus Pollock notifications@github.com wrote:

@maxogden what's your best suggestion for a good node library for parsing stream json - i'd assumed it would be @dominictarr's https://github.com/dominictarr/JSONStream

I also guess it would nice to support line separated json too :-)


Reply to this email directly or view it on GitHub.

@dominictarr

This comment has been minimized.

Show comment
Hide comment
@dominictarr

dominictarr Nov 18, 2013

JSONStream can handle line separated json too.
Line separated json is much simpler and faster than implementing a custom streaming parser
in javascript, because you can lean on both the optimized JSON.parse and RegExp implementations.

dominictarr commented Nov 18, 2013

JSONStream can handle line separated json too.
Line separated json is much simpler and faster than implementing a custom streaming parser
in javascript, because you can lean on both the optimized JSON.parse and RegExp implementations.

@konklone

This comment has been minimized.

Show comment
Hide comment
@konklone

konklone Feb 25, 2014

Relevant to this thread, I took on making an input pipe for JSON at konklone.io/json/, last week, and used it in a workshop for Open Data Day DC.

It's a static site - I adapted csvkit's recursive JSON flattening algorithm into in-browser JS, then fed the results into jquery-csv (which handles flattened JSON only), and it works. The main issue is that there is no good universal algorithm for determining the "row", but it uses a few heuristics to get it most of the time, and I'll add a way to specify it for advanced users later.

I have some feature ideas lined up, input welcome. But also, it'd be cool if the core of what I did could lower the barrier for datapipes to handle JSON.

konklone commented Feb 25, 2014

Relevant to this thread, I took on making an input pipe for JSON at konklone.io/json/, last week, and used it in a workshop for Open Data Day DC.

It's a static site - I adapted csvkit's recursive JSON flattening algorithm into in-browser JS, then fed the results into jquery-csv (which handles flattened JSON only), and it works. The main issue is that there is no good universal algorithm for determining the "row", but it uses a few heuristics to get it most of the time, and I'll add a way to specify it for advanced users later.

I have some feature ideas lined up, input welcome. But also, it'd be cool if the core of what I did could lower the barrier for datapipes to handle JSON.

@dominictarr

This comment has been minimized.

Show comment
Hide comment
@dominictarr

dominictarr Feb 26, 2014

I did this the other day, and I just made it flatten to join the paths by .

{foo: {bar: 1,  baz: 2}}

becomes

foo.bar, foo.baz
1, 2

dominictarr commented Feb 26, 2014

I did this the other day, and I just made it flatten to join the paths by .

{foo: {bar: 1,  baz: 2}}

becomes

foo.bar, foo.baz
1, 2
@konklone

This comment has been minimized.

Show comment
Hide comment
@konklone

konklone Feb 26, 2014

That's great for some things, but not others. If your object actually does have "records" inside it that should be rows, then they need to be treated a bit differently. For example, something like:

{
  "results": [
    {
      "foo": "bar",
      "bar": true,
      "okay": "never"
    },
    {
      "foo": "oof",
      "bar": false,
      "okay": "sometimes"
    }
  ]
}

Should have the array at results be iterated over, and each object in that array recursively flattened.

Right now it tries to auto-detect that case, falling back to just flattening the entire thing as one record. But advanced users may need to override it.

konklone commented Feb 26, 2014

That's great for some things, but not others. If your object actually does have "records" inside it that should be rows, then they need to be treated a bit differently. For example, something like:

{
  "results": [
    {
      "foo": "bar",
      "bar": true,
      "okay": "never"
    },
    {
      "foo": "oof",
      "bar": false,
      "okay": "sometimes"
    }
  ]
}

Should have the array at results be iterated over, and each object in that array recursively flattened.

Right now it tries to auto-detect that case, falling back to just flattening the entire thing as one record. But advanced users may need to override it.

@rufuspollock

This comment has been minimized.

Show comment
Hide comment
@rufuspollock

rufuspollock Feb 27, 2014

Member

@konklone this is cool - would you like to see this get into datapipes? (very happy either way :-) ...)

Member

rufuspollock commented Feb 27, 2014

@konklone this is cool - would you like to see this get into datapipes? (very happy either way :-) ...)

@konklone

This comment has been minimized.

Show comment
Hide comment
@konklone

konklone Feb 28, 2014

What do you think the path is to doing that? I guess the first step is to remove the dependency on jquery-csv (and thus jquery), and to clean up the code a bit and put it in a module, so you just have a general interface to JSON-flattening and CSV-exporting.

konklone commented Feb 28, 2014

What do you think the path is to doing that? I guess the first step is to remove the dependency on jquery-csv (and thus jquery), and to clean up the code a bit and put it in a module, so you just have a general interface to JSON-flattening and CSV-exporting.

@rufuspollock

This comment has been minimized.

Show comment
Hide comment
@rufuspollock

rufuspollock Mar 2, 2014

Member

@konklone I think if you could turn the core thing into a JSON module that would be perfect.

I note that you could leave out csv parsing / serializing if you wish - obviously datapipes already had that via standard node-csv of @maxogden's binary-csv. So it would be streaming JSON-flattening that would be super-useful ...

Note also we have growing developer documentation here: https://github.com/okfn/datapipes/blob/master/doc/dev.md

Member

rufuspollock commented Mar 2, 2014

@konklone I think if you could turn the core thing into a JSON module that would be perfect.

I note that you could leave out csv parsing / serializing if you wish - obviously datapipes already had that via standard node-csv of @maxogden's binary-csv. So it would be streaming JSON-flattening that would be super-useful ...

Note also we have growing developer documentation here: https://github.com/okfn/datapipes/blob/master/doc/dev.md

@konklone

This comment has been minimized.

Show comment
Hide comment
@konklone

konklone Mar 2, 2014

Is there a way to make a streaming flattener without making a new streaming
parser? Are there hooks during the parse for that sort of thing? Right now
the whole thing gets loaded into memory.

konklone commented Mar 2, 2014

Is there a way to make a streaming flattener without making a new streaming
parser? Are there hooks during the parse for that sort of thing? Right now
the whole thing gets loaded into memory.

@rufuspollock

This comment has been minimized.

Show comment
Hide comment
@rufuspollock

rufuspollock Mar 3, 2014

Member

@konklone I imagine you could use JSON-stream https://www.npmjs.org/package/JSONStream

Member

rufuspollock commented Mar 3, 2014

@konklone I imagine you could use JSON-stream https://www.npmjs.org/package/JSONStream

@stuchalk

This comment has been minimized.

Show comment
Hide comment
@stuchalk

stuchalk Oct 16, 2014

I vote for JSON!

stuchalk commented Oct 16, 2014

I vote for JSON!

@dominictarr

This comment has been minimized.

Show comment
Hide comment
@dominictarr

dominictarr Oct 16, 2014

@maxogden the memory leaks are fixed now, btw!

dominictarr commented Oct 16, 2014

@maxogden the memory leaks are fixed now, btw!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment