How to use fast's jsonparser #14

gizmomogwai · 2017-12-30T13:40:31Z

Hi Marco, I really like the speed that comes with your pull based approach.
I have a simple program, that I would like to implement, but I am struggling applying the pull based thing to the problem:
I want to analyse nist's cve data (https://nvd.nist.gov/vuln/data-feeds) e.g. by searching through a datafile and printing out the whole json entry that matches an id.
The datalooks like this:

"CVE_Items" : [ {
  "cve" : {
    "data_type" : "CVE",
    "data_format" : "MITRE",
    "data_version" : "4.0",
    "CVE_data_meta" : {
      "ID" : "CVE-1999-0001",
      "ASSIGNER" : "cve@mitre.org"
    },
    "affects" : {
      ...
    },
    "problemtype" : {
      ...
    },
    "references" : {
      ...
    },
    "description" : {
      ...
    }
  },
  "configurations" : {
    ...
  },
  "impact" : {
    ...
  },
  "publishedDate" : "1999-12-30T05:00Z",
  "lastModifiedDate" : "2010-12-16T05:00Z"
}, {
  "cve" : {
   ...

with your nice library I can easily write something like this:

  foreach (cveFile; cves) {
      foreach (item; cveFile.CVE_Items) {
          cveFile.cve.CVE_data_meta.keySwitch!("ID")(
                                             {
                                                 auto id = cveFile.read!string;
                                                 if (id in toFind) writeln(id);
                                             });
      }
  }

But instead of just outputting the id, i would like to dump everything, that belongs to the object that contains the matching id.

Whats the best way to do this?

The text was updated successfully, but these errors were encountered:

mleise · 2017-12-31T01:16:18Z

This could be solved if I had implemented some sort of parser "snapshotting". But right now there is no way to get back to the start of the CVE item, once you reach the "ID". I see the use case and it makes sense to implement something like saveSnapshot() and loadSnapshot() as an extension in the future. For now you'll have to digest the entire JSON and lose the benefit of paying for what you use.

mleise · 2017-12-31T01:23:08Z

All that really needs to be saved and restored is m_text and m_nesting from here: https://github.com/mleise/fast/blob/master/source/fast/json.d#L200

mleise · 2017-12-31T02:52:23Z

Example usage:

import fast.json;
import std.stdio;

struct CVE {
	string  data_type;
	string  data_format;
	string  data_version;
	CVEMeta CVE_data_meta;
}

struct CVEMeta {
	string ID;
	string ASSIGNER;
}

void main() {
	bool[string] shoppingList = ["CVE-2017-0006":true, "CVE-2017-9999":true];
	with (parseJSONFile("nvdcve-1.0-2017.json")) {
		foreach (n; CVE_Items) {
			with (cve) {
				const backup = state;
				const id = CVE_data_meta.ID.borrowString();
				if (id in shoppingList) {
					state = backup;
					writeln(json.read!CVE());
				}
			}
		}
	}
}

Runs @ ~1100 MiB/s for me when compiled with LDC2. (4th gen i5 @ 2.3 Ghz, DDR3)

gizmomogwai · 2017-12-31T15:20:18Z

wow ... thanks a lot ... will give this a try, at the moment i am still struggling in getting uncompressed data into dlang (i do not even reach java speed right now for gzipped data).

gizmomogwai · 2017-12-31T15:56:50Z

two more questions :)

how do you measure your throughput?
is there a way to get a whole json subtree parsed? i have seen the thing with associative arrays from string to a json type like string, int, float, but this does not work for deeper trees, right?

mleise · 2017-12-31T23:59:05Z

So you need fast.gzip as well? ;-) (Maybe the system zlib is faster when linked into your D program than Phobos.)
For the throughput I downloaded the 2017 CVE JSON (74 MiB unzipped) and used the simple time command on the program I posted above. That showed about 67ms in the best case.
You can parse JSON sub-trees of any depth (as in the example program above), if you know the structure. I.e. nested structs and arrays work. If you don't know the structure you need to manually iterate the elements and store them in "Variants".

gizmomogwai · 2018-01-02T07:12:53Z

Thanks again. I will look into the gzip thing and also look how to work with the subtrees! Happy new year!

mleise closed this as completed in 093c2eb Dec 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use fast's jsonparser #14

How to use fast's jsonparser #14

gizmomogwai commented Dec 30, 2017

mleise commented Dec 31, 2017

mleise commented Dec 31, 2017

mleise commented Dec 31, 2017

gizmomogwai commented Dec 31, 2017

gizmomogwai commented Dec 31, 2017

mleise commented Dec 31, 2017

gizmomogwai commented Jan 2, 2018

How to use fast's jsonparser #14

How to use fast's jsonparser #14

Comments

gizmomogwai commented Dec 30, 2017

mleise commented Dec 31, 2017

mleise commented Dec 31, 2017

mleise commented Dec 31, 2017

gizmomogwai commented Dec 31, 2017

gizmomogwai commented Dec 31, 2017

mleise commented Dec 31, 2017

gizmomogwai commented Jan 2, 2018