Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search file for ID and return full value of matching objects #9

Closed
jenseo opened this issue Mar 2, 2018 · 4 comments
Closed

Search file for ID and return full value of matching objects #9

jenseo opened this issue Mar 2, 2018 · 4 comments
Labels

Comments

@jenseo
Copy link

jenseo commented Mar 2, 2018

Hi there and first of all, thank you for this amazing parser. A true life saver.

I'm currently trying to fully understand how it works, but have run into a problem that I can't really figure out how to solve.

I have a large Json file that looks like this (part of it):

[
  {
    "id": 2584,
    "name": "John",
    "parentCategory": 2570,
    "url": "john",
    "dateUpd": "2016-06-23 14:27:32",
    "dateAdd": "2016-05-13 11:33:35",
    "urlImages": [
      "http://imageurl.com/2584_header.jpg",
      "http://imageurl.com/2584_menu.jpg",
      "http://imageurl.com/2584_mini.jpg"
    ],
    "isoCode": "sv"
  },
  {
    "id": 2429,
    "name": "Carol",
    "parentCategory": 2570,
    "url": "carol",
    "dateUpd": "2016-06-23 14:33:36",
    "dateAdd": "2016-05-13 10:11:30",
    "urlImages": [
      "http://imageurl.com/2429_header.jpg",
      "http://imageurl.com/2429_menu.jpg",
      "http://imageurl.com/2429_mini.jpg"
    ],
    "isoCode": "sv"
  },
  {
    "id": 2568,
    "name": "Andy",
    "parentCategory": 2552,
    "url": "andy",
    "dateUpd": "2016-06-23 13:55:13",
    "dateAdd": "2016-05-13 11:29:32",
    "urlImages": [
      "http://imageurl.com/2568_header.jpg",
      "http://imageurl.com/2568_menu.jpg",
      "http://imageurl.com/2568_mini.jpg"
    ],
    "isoCode": "sv"
  }
]

What I'm trying to do is search through this file after all instances where "parentCategory" equals 2570 and then print/echo the whole object that this ID is part of.

So far, this is what I've got:

$reader = new JsonReader();
$reader->json($json);

while($reader->read("parentCategory")) {
    $parentID = $reader->value();
    if ($parentID == 2570) {
      echo $reader->value()."\n";
    }
}
$reader->close();

This prints the parentCategory ID, but what I need is to be able to use the parentCategory name and value to identify the whole object it belongs to and in the end return the following:

[
  {
    "id": 2584,
    "name": "John",
    "parentCategory": 2570,
    "url": "john",
    "dateUpd": "2016-06-23 14:27:32",
    "dateAdd": "2016-05-13 11:33:35",
    "urlImages": [
      "http://imageurl.com/2584_header.jpg",
      "http://imageurl.com/2584_menu.jpg",
      "http://imageurl.com/2584_mini.jpg"
    ],
    "isoCode": "sv"
  },
  {
    "id": 2429,
    "name": "Carol",
    "parentCategory": 2570,
    "url": "carol",
    "dateUpd": "2016-06-23 14:33:36",
    "dateAdd": "2016-05-13 10:11:30",
    "urlImages": [
      "http://imageurl.com/2429_header.jpg",
      "http://imageurl.com/2429_menu.jpg",
      "http://imageurl.com/2429_mini.jpg"
    ],
    "isoCode": "sv"
  }
]

Is this achievable with your parser?

Thank you so much for any help you can give me!

@pcrov pcrov added the support label Mar 2, 2018
@pcrov
Copy link
Owner

pcrov commented Mar 2, 2018

JsonReader works in a forward-only manner, so if you might need prior data you'll need to hang onto it until that determination can be made.

The easiest way to do this would be to step into the array, grab each object in full, check the parentCategory and ignore any that don't match. E.g.:

$reader->read(); //Outer array
$reader->read(); //First object
$depth = $reader->depth();

do {
    $object = $reader->value();
    if ($object["parentCategory"] === "2570") {
        var_dump($object);
    }
} while ($reader->next() && $reader->depth() === $depth);

Note that because numbers get returned as strings (this will likely become optional in a future version) and you didn't get the opportunity to inspect their type, you'll lose their type information this way.

If you need to retain that and you know ahead of time what should be a number it's easy to fix them up:

$reader->read(); //Outer array
$reader->read(); //First object
$depth = $reader->depth();

do {
    $object = $reader->value();
    if ($object["parentCategory"] === "2570") {
        $object["id"] = +$object["id"];
        $object["parentCategory"] = +$object["parentCategory"];
        var_dump($object);
    }
} while ($reader->next() && $reader->depth() === $depth);

(The unary + will cast to int or float as appropriate automagically.)

Let me know if this doesn't work out for whatever reason. There is always another way to do things, it just might be a bit more cumbersome.

@jenseo
Copy link
Author

jenseo commented Mar 2, 2018

Wow, that's exactly what I was looking for! And yes, I will probably know before what will be numbers so I should be able to fix things up :)

Thank you so much for your help and for your great work, such a versatile tool!

@pcrov pcrov closed this as completed Mar 2, 2018
@jenseo
Copy link
Author

jenseo commented Mar 7, 2018

Hi again @pcrov ,
wanted to follow up on this and ask you about the following:

I'm importing a rather large json file and I'm trying to stream it from the API server using fopen. I'm having a bit of a problem making it efficient though, it feels like the parsing takes a really long time.

Right now my code looks like this:

$fp = fopen($filename, 'rw'); // create file

$reader = new JsonReader();
$reader->stream($fp);
$reader->read(); //Outer array
$reader->read(); //First object
$depth = $reader->depth(); //Check depth
$object_array = array(); //Set up empty array
do {
    $object = $reader->value(); //Store object before check
    if ($object["category"] === $category_id) { //Do the check
        $object_array[] = $object; //Store object in array
    }
unset($object); // free memory?
} while ($reader->next() && $reader->depth() === $depth);
$json_object = json_encode($object_array, JSON_PRETTY_PRINT); //Convert array to nice Json
echo $json_object; //Output Json
$reader->close();
fclose($fp);
unlink($filename); // delete file

As you can see, I've added the line:

unset($object);

in an attempt to free memory, but not sure if it has any effect. Does this look like a good solution to you?

Thanks!

// Jens.

@pcrov
Copy link
Owner

pcrov commented Mar 7, 2018

If you haven't already done so upgrade to the latest release, 0.7.0, as it's significantly faster than prior versions. There are still more speed improvements in the works, but nothing quite like the jump 0.7.0 made.

Make sure xdebug isn't loaded at all. Even when not enabled the extension has a massive performance impact.

Parsing a stream from a remote API directly while supported won't be as quick as parsing a local file, though from the code you've posted it looks like you're dealing with a local file already.

I wouldn't expect unset to do much useful there as $object is being immediately overwritten on the next iteration anyway. Besides, worrying about memory consumption when your problem is speed only makes sense if you're hitting swap (or garbage collection issues, but that shouldn't be a problem here), and you're going to be bound on the memory front by the growing $object_array.

At the end of the day parsing massive files in PHP can only be so fast, and the low memory consumption you get from a streaming parser will always come at the expense of speed. It's the kind of thing best suited to running in a background task and checking the result later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants