Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance bottlenecks #222

Closed
travisghansen opened this issue Oct 18, 2019 · 6 comments
Closed

performance bottlenecks #222

travisghansen opened this issue Oct 18, 2019 · 6 comments

Comments

@travisghansen
Copy link

Description

I'd like to work with this in a project that requires relatively high-performance. I ran some tests with this and the jsonpath package to compare simple selects (I'm aware the tools have vastly different capabilities etc).

jq was magnitudes slower and I'm not sure if that comes from the exec/shell or if something else is at play. Any tips and/or help maximizing performance would be great.

Thanks!

Test Source

const jp = require("jsonpath");
const jq = require("node-jq");


async function jsonpath_query(query, data) {
  return jp.query(data, query);
}

async function jq_query(query, data) {
  const options = {
    input: "json",
    output: "json"
  };

  const values = await jq.run(query, data, options);
  return values;
}

let data = {
  "kind": "youtube#searchListResponse",
  "etag": "m2yskBQFythfE4irbTIeOgYYfBU/PaiEDiVxOyCWelLPuuwa9LKz3Gk",
  "nextPageToken": "CAUQAA",
  "regionCode": "KE",
  "pageInfo": {
    "totalResults": 4249,
    "resultsPerPage": 5
  },
  "items": [
    {
      "kind": "youtube#searchResult",
      "etag": "\"m2yskBQFythfE4irbTIeOgYYfBU/QpOIr3QKlV5EUlzfFcVvDiJT0hw\"",
      "id": {
        "kind": "youtube#channel",
        "channelId": "UCJowOS1R0FnhipXVqEnYU1A"
      }
    },
    {
      "kind": "youtube#searchResult",
      "etag": "\"m2yskBQFythfE4irbTIeOgYYfBU/AWutzVOt_5p1iLVifyBdfoSTf9E\"",
      "id": {
        "kind": "youtube#video",
        "videoId": "Eqa2nAAhHN0"
      }
    },
    {
      "kind": "youtube#searchResult",
      "etag": "\"m2yskBQFythfE4irbTIeOgYYfBU/2dIR9BTfr7QphpBuY3hPU-h5u-4\"",
      "id": {
        "kind": "youtube#video",
        "videoId": "IirngItQuVs"
      }
    }
  ]
};

console.log("hello world");

let iterations = 500;
(async () => {
  let i = 0;

  while (i < iterations) {
    //let foo = await jq_query(".items", data);
    let foo = await jsonpath_query("$.items[*]", data);
    //console.log(foo);
    i++;
  }
})().catch(e => {
  // Deal with the fact the chain failed
  console.log(e);
});

results

# 500k (first result is jsonpath)
# I'm not sure how long it would have taken, I eventually killed the jq version after 33 minutes
external-auth-server next ✏️ 3❔ 1 time node dev/scratch.js 
hello world

real	0m6.259s
user	0m7.060s
sys	0m0.186s
external-auth-server next ✏️ 3❔ 1 time node dev/scratch.js 
hello world
^C
real	33m42.429s
user	31m2.376s
sys	2m53.778s


# 500 (first result is jsonpath)
external-auth-server next ✏️ 3❔ 1 time node dev/scratch.js 
hello world

real	0m0.120s
user	0m0.134s
sys	0m0.017s

external-auth-server next ✏️ 3❔ 1 time node dev/scratch.js 
hello world

real	0m13.801s
user	0m13.209s
sys	0m0.846s
@davesnx
Copy link
Member

davesnx commented Oct 18, 2019

Hey @travisghansen

I would love to get a more deep understanding of the benchmark that you have run. The data sample is the same schema and you run jq (node-jq) vs jsonpath 500 times. I would love to see directly without the NodeJS wrappers, to detect if it's a node-jq wrapper fault and we can investigate further. If that's not the case, there's not much we can do, rather than see where Java vs Python fail at this. If it's our fault tho, I would love to improve that.

As well, I would love to try not the same schema, try bigger JSONs and think other kind of tests to have a reasonable comparisation.

If we create a solid perf benchmark, I'm happy to debug the NodeJS process to understand it better. Could you open a PR with that benchmark and move from there?

Thanks for opening the issue.

@travisghansen
Copy link
Author

Hey, I'll try to knock something out in bash real quick and see how that performs. I picked the schema above simply because for my use-case I'll be dealing with relatively small datasets but lots of iterations over and over (I'm handling auth scenarios: https://github.com/travisghansen/external-auth-server).

I'm also digging in over here: robertaboukhalil/jqkungfu#3 to see if fundamentally the same syscalls are going on etc.

Thanks for the help!

@travisghansen
Copy link
Author

Yeah, 500 iterations in bash is essentially the same as invoked from node through your lib. Probably not much we can do :(

time ./jq-iteration.sh 

real	0m13.708s
user	0m13.393s
sys	0m0.433s

@travisghansen
Copy link
Author

#!/bin/bash

json='{
  "kind": "youtube#searchListResponse",
  "etag": "m2yskBQFythfE4irbTIeOgYYfBU/PaiEDiVxOyCWelLPuuwa9LKz3Gk",
  "nextPageToken": "CAUQAA",
  "regionCode": "KE",
  "pageInfo": {
    "totalResults": 4249,
    "resultsPerPage": 5
  },
  "items": [
    {
      "kind": "youtube#searchResult",
      "etag": "\"m2yskBQFythfE4irbTIeOgYYfBU/QpOIr3QKlV5EUlzfFcVvDiJT0hw\"",
      "id": {
        "kind": "youtube#channel",
        "channelId": "UCJowOS1R0FnhipXVqEnYU1A"
      }
    },
    {
      "kind": "youtube#searchResult",
      "etag": "\"m2yskBQFythfE4irbTIeOgYYfBU/AWutzVOt_5p1iLVifyBdfoSTf9E\"",
      "id": {
        "kind": "youtube#video",
        "videoId": "Eqa2nAAhHN0"
      }
    },
    {
      "kind": "youtube#searchResult",
      "etag": "\"m2yskBQFythfE4irbTIeOgYYfBU/2dIR9BTfr7QphpBuY3hPU-h5u-4\"",
      "id": {
        "kind": "youtube#video",
        "videoId": "IirngItQuVs"
      }
    }
  ]
}';

echo $json;

iterations=500
i=0

while [ $i -lt $iterations ]
do
  echo $json | jq .items > /dev/null
  i=$[$i+1]
done

@davesnx
Copy link
Member

davesnx commented Oct 18, 2019

mmm, sad. Not much we can do. In order to help in your use-case, I recommend to forgot to pipe operations on the shell if you want to get high performance.

After reading your project, I don't really get where do you need jq. I would suggest trying with native JS or lodash. Since I always heard that lodash.map is faster than native https://jsperf.com/native-map-vs-lodash-map

Hope it helps and open any other issue that you might have!
Thanks!

@davesnx davesnx closed this as completed Oct 18, 2019
@travisghansen
Copy link
Author

@davesnx I intend to use jq for handling assertions and generally mangling data (it already does in fact, examples here: https://github.com/travisghansen/external-auth-server/blob/master/ASSERTIONS.md). While code I write is fine to do in native code, individual configurations as managed by the user are simply structured json that defines various assertions that should occur etc during run-time. I don't really want to make the configuration an exercise in knowing javascript but I would like a powerful query/selector language to handle crazy scenarios.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants