Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsontsv might replace json2csv #26

Closed
danchoi opened this issue Dec 3, 2014 · 7 comments
Closed

jsontsv might replace json2csv #26

danchoi opened this issue Dec 3, 2014 · 7 comments

Comments

@danchoi
Copy link

danchoi commented Dec 3, 2014

I just released jsontsv yesterday. It is similar to json2csv which you mention on your original blog post "Seven command line tools for data science." But I believe jsontsv is strictly speaking more powerful and not strictly speaking easier to use. Feedback is appreciated if you have time but in any case thanks for writing about this whole topic.

https://github.com/danchoi/jsontsv

@jeroenjanssens
Copy link
Owner

Congrats on releasing jsontsv! I have tried it out and it works really
well. I didn't notice any differences between jsontsv and json2csv
(except that the latter outputs CSV by default). Would you care to
explain why you think (or give examples that demonstrate why) jsontsv
is more powerful and easier to use than json2csv? Thanks.

On Wed, Dec 03, 2014 at 07:22:47AM -0800, Daniel Choi wrote:

I just released jsontsv yesterday. I is similar to json2csv which you mention on your original blog post "Seven command line tools for data science." But I believe jsontsv is strictly speaking more powerful and not strictly speaking easier to use. Feedback is appreciated if you have time but in any case thanks for writing about this whole topic.

https://github.com/danchoi/jsontsv


Reply to this email directly or view it on GitHub:
#26

@danchoi
Copy link
Author

danchoi commented Dec 4, 2014

Thank you @jeroenjanssens

@danchoi
Copy link
Author

danchoi commented Dec 4, 2014

1st. json2csv accepts only compact JSON input; jsontsv can accept pretty-formatted JSON input:

input1:

{"title":"Terminator 2: Judgment Day","year":1991,"stars":[{"name":"Arnold Schwarzenegger"},{"name":"Linda Hamilton"}],"ratings":{"imdb":8.5}}
{"title":"Interstellar","year":2014,"stars":[{"name":"Matthew McConaughey"},{"name":"Anne Hathaway"}],"ratings":{"imdb":8.9}}

input2:

{
  "title": "Terminator 2: Judgement Day",
  "year": 1991,
  "stars": [
    {
      "name": "Arnold Schwarzenegger"
    },
    {
      "name": "Linda Hamilton"
    }
  ],
  "ratings": {
    "imdb": 8.5
  }
}
{
  "title": "Interstellar",
  "year": 2014,
  "stars": [
    {
      "name": "Matthew McConaughey"
    },
    {
      "name": "Anne Hathaway"
    }
  ],
  "ratings": {
    "imdb": 8.9
  }
}
json2csv -k title,year,ratings.imdb < input1   # works
json2csv -k title,year,ratings.imdb < input2   # FAILS

jsontsv 'title year ratings.imdb' < input1  # works
jsontsv 'title year ratings.imdb' < input2  # works

2nd. jsontsv can reach into objects in an array. json2csv cannot.

json2csv -k stars.name < input1   
""
""

jsontsv 'stars.name' < input1
Arnold Schwarzenegger,Linda Hamilton
Matthew McConaughey,Anne Hathaway

jsontsv 'stars[0].name' < input1  
Arnold Schwarzenegger
Matthew McConaughey

@danchoi
Copy link
Author

danchoi commented Dec 4, 2014

3rd. jsontsv can also insert multicharacter delimiters (specified in an option). json2csv can only insert single-character delimiters.

@danchoi
Copy link
Author

danchoi commented Dec 4, 2014

4th. json2csv changes the precision of floats. jsontsv maintains the original precision:

$ json2csv -k title,ratings.imdb < input1
Terminator 2: Judgment Day,8.500000
Interstellar,8.900000 

$ jsontsv 'title ratings.imdb'  < input1
Terminator 2: Judgment Day      8.5
Interstellar    8.9

@danchoi
Copy link
Author

danchoi commented Dec 4, 2014

5th. jsontsv seems (on one test) to have a 33% performance advantage.

I processed a stream of 363051 JSON objects to run this test, using the two tools to extract the same 3 fields from each object. I ran the test on a 1.4GHz Intel i5 Macbook Air on battery power. Results:

jsontsv
real    0m24.081s
user    0m21.402s
sys     0m2.513s

json2csv
real    0m36.141s
user    0m34.282s
sys     0m2.610s

@jeroenjanssens
Copy link
Owner

You provide really good arguments. Thanks for that. I hope you understand that I cannot simply replace json2csv with jsontsv in the book. However, I will definitely keep jsontsv in mind when I'll make a big update to the book or even write a second edition. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants