Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive iteration #36

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
<br>

## master
Nothing yet
### Added
- Recursive iteration via `recursive` option.

<br>

Expand Down
60 changes: 39 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ for PHP >=7.0. See [TL;DR](#tl-dr). No dependencies in production except optiona
+ [Parsing nested values in arrays](#parsing-nested-values)
+ [Parsing a single scalar value](#getting-scalar-values)
+ [Parsing multiple subtrees](#parsing-multiple-subtrees)
+ [Recursive iteration](#recursive)
+ [What is JSON Pointer anyway?](#json-pointer)
* [Options](#options)
* [Parsing streaming responses from a JSON API](#parsing-json-stream-api-responses)
Expand Down Expand Up @@ -318,6 +319,38 @@ foreach ($fruits as $key => $value) {
}
```

<a name="recursive"></a>
### Recursive iteration (BETA)
Recursive iteration can be enabled via `recursive` option set to `true`.
Every JSON iterable that JSON Machine encounters will then be yielded as an instance of `NestedIterator`.
No JSON array or object will be materialized and kept in memory.
The only PHP values you get materialized will be scalar values.
Let's see an example with many, many users with many, many friends

```php
<?php

use JsonMachine\Items;

$users = Items::fromFile('users.json', ['recursive' => true]);
foreach ($users as $user) { // $user instanceof Traversable, not an array/object
foreach ($user as $userField => $userValue) {
if ($userField === 'friends') {
foreach ($userValue as $friend) { // $userValue instanceof Traversable, not an array/object
foreach ($friend as $friendField => $friendValue) { // $friend instanceof Traversable, not an array/object
// do whatever you want here
}
}
}
}
}
```

> If you break an iteration of such lazy deeper-level (i.e. you skip some `"friends"` via `break`)
> and advance to a next value (i.e. next `user`), you will not be able to iterate it later.
> JSON Machine must iterate it the background to be able to read next value.
> Such an attempt will result in closed generator exception.

<a name="json-pointer"></a>
### What is JSON Pointer anyway?
It's a way of addressing one item in JSON document. See the [JSON Pointer RFC 6901](https://tools.ietf.org/html/rfc6901).
Expand Down Expand Up @@ -345,6 +378,7 @@ Some examples:
Options may change how a JSON is parsed. Array of options is the second parameter of all `Items::from*` functions.
Available options are:
- `pointer` - A JSON Pointer string that tells which part of the document you want to iterate.
- `recursive` - Bool. Any JSON array/object the parser hits will not be decoded but served lazily as a `Traversable`. Default `false`.
- `decoder` - An instance of `ItemDecoder` interface.
- `debug` - `true` or `false` to enable or disable the debug mode. When the debug mode is enabled, data such as line,
column and position in the document are available during parsing or in exceptions. Keeping debug disabled adds slight
Expand Down Expand Up @@ -516,30 +550,14 @@ but you forgot to specify a JSON Pointer. See [Parsing a subtree](#parsing-a-sub
### "That didn't help"
The other reason may be, that one of the items you iterate is itself so huge it cannot be decoded at once.
For example, you iterate over users and one of them has thousands of "friend" objects in it.
Use `PassThruDecoder` which does not decode an item, get the json string of the user
and parse it iteratively yourself using `Items::fromString()`.

```php
<?php

use JsonMachine\Items;
use JsonMachine\JsonDecoder\PassThruDecoder;

$users = Items::fromFile('users.json', ['decoder' => new PassThruDecoder]);
foreach ($users as $user) {
foreach (Items::fromString($user, ['pointer' => "/friends"]) as $friend) {
// process friends one by one
}
}
```
The most efficient solution is to set `recursive` option to `true`.
See [Recursive iteration](#recursive).

<a name="step3"></a>
### "I am still out of luck"
It probably means that the JSON string `$user` itself or one of the friends are too big and do not fit in memory.
However, you can try this approach recursively. Parse `"/friends"` with `PassThruDecoder` getting one `$friend`
json string at a time and then parse that using `Items::fromString()`... If even that does not help,
there's probably no solution yet via JSON Machine. A feature is planned which will enable you to iterate
any structure fully recursively and strings will be served as streams.
It probably means that a single JSON scalar string itself is too big to fit in memory.
For example very big base64-encoded file.
In that case you will probably be still out of luck until JSON Machine supports yielding of scalar values as PHP streams.

<a name="installation"></a>
## Installation
Expand Down
3 changes: 2 additions & 1 deletion src/Items.php
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,8 @@ public function __construct($bytesIterator, array $options = [])
$this->chunks
),
$this->jsonPointer,
$this->jsonDecoder ?: new ExtJsonDecoder()
$this->jsonDecoder ?: new ExtJsonDecoder(),
$options['recursive']
);
}

Expand Down
6 changes: 6 additions & 0 deletions src/ItemsOptions.php
Original file line number Diff line number Diff line change
Expand Up @@ -69,12 +69,18 @@ private function opt_debug(bool $debug)
return $debug;
}

private function opt_recursive(bool $recursive)
{
return $recursive;
}

public static function defaultOptions(): array
{
return [
'pointer' => '',
'decoder' => new ExtJsonDecoder(),
'debug' => false,
'recursive' => false,
];
}
}
25 changes: 25 additions & 0 deletions src/JsonDecoder/StringOnlyDecoder.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<?php

declare(strict_types=1);

namespace JsonMachine\JsonDecoder;

class StringOnlyDecoder implements ItemDecoder
{
/** @var ItemDecoder */
private $innerDecoder;

public function __construct(ItemDecoder $innerDecoder)
{
$this->innerDecoder = $innerDecoder;
}

public function decode($jsonValue)
{
if (is_string($jsonValue)) {
return $this->innerDecoder->decode($jsonValue);
}

return new ValidResult($jsonValue);
}
}
94 changes: 94 additions & 0 deletions src/NestedIterator.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
<?php

declare(strict_types=1);

namespace JsonMachine;

use Iterator;
use JsonMachine\Exception\JsonMachineException;

class NestedIterator implements \RecursiveIterator
{
/** @var Iterator */
private $iterator;

public function __construct(Iterator $iterator)
{
$this->iterator = $iterator;
}

#[\ReturnTypeWillChange]
public function current()
{
return $this->iterator->current();
}

#[\ReturnTypeWillChange]
public function next()
{
$this->iterator->next();
}

#[\ReturnTypeWillChange]
public function key()
{
return $this->iterator->key();
}

#[\ReturnTypeWillChange]
public function valid()
{
return $this->iterator->valid();
}

#[\ReturnTypeWillChange]
public function rewind()
{
$this->iterator->rewind();
}

#[\ReturnTypeWillChange]
public function hasChildren()
{
return $this->iterator->current() instanceof Iterator;
}

#[\ReturnTypeWillChange]
public function getChildren()
{
return $this->hasChildren() ? new self($this->current()) : null;
}

public function advanceToKey($key)
{
$iterator = $this->iterator;

while ($key !== $iterator->key() && $iterator->valid()) {
$iterator->next();
}

if ($key !== $iterator->key()) {
throw new JsonMachineException("Key '$key' was not found.");
}

return $iterator->current();
}

public function toArray(): array
{
return self::toArrayRecursive($this);
}

private static function toArrayRecursive(\Traversable $traversable): array
{
$array = [];
foreach ($traversable as $key => $value) {
if ($value instanceof \Traversable) {
$value = self::toArrayRecursive($value);
}
$array[$key] = $value;
}

return $array;
}
}
Loading
Loading