Skip to content

Feature | Better Reader#4

Merged
Sammyjo20 merged 9 commits intomainfrom
feature/better-reader
Nov 1, 2023
Merged

Feature | Better Reader#4
Sammyjo20 merged 9 commits intomainfrom
feature/better-reader

Conversation

@Sammyjo20
Copy link
Copy Markdown
Member

@Sammyjo20 Sammyjo20 commented Nov 1, 2023

  • Rewrote the reader to now use veewee/xml "Matchers" and is now much more memory efficient
  • Introduced $asGenerator to methods
  • Switched xpath methods to use veewee/xml wrappers for better error handling

@Sammyjo20 Sammyjo20 force-pushed the feature/better-reader branch from 122aef0 to f247a7f Compare November 1, 2023 22:16
@Sammyjo20 Sammyjo20 merged commit ae08615 into main Nov 1, 2023
@Sammyjo20 Sammyjo20 deleted the feature/better-reader branch November 1, 2023 22:18
Copy link
Copy Markdown
Collaborator

@veewee veewee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some remark here and there. It's a nice improvement and I bet it's much faster than before :)

> Reading the whole XML document may consume a large amount of memory and stil requires diving into the array to get the values you need. You may find one of the methods below to search for a specific element even more useful.
If you are reading a large XML file, you should use the `asGenerator` argument. This will return a generator which can be iterated over only keeping one element in memory at a time.
```php
$elements = $reader->elements(asGenerator: true);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a more intuitive way here could be : $reader->generate()->elements()?
Generate could also be lazy(), memorySafe(), ... according on what aspect of the generator you want to focus on?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this idea - I have implemented it in #6


return $asGenerator === true ? $results() : iterator_to_array($results());
} catch (Throwable $throwable) {
$this->__destruct();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think calling destruct on exception is a good approach here : it will kill the temporary file, making it impossible to recover from the exception on readers based on a stream.

* @throws \Saloon\XmlWrangler\Exceptions\XmlReaderException|\Throwable
*/
public function element(string $name, array $withAttributes = [], bool $nullable = false): Element|array|null
public function element(string $name, array $withAttributes = [], bool $nullable = false, bool $asGenerator = false): Element|Generator|array|null
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of optional arguments and return types is getting quite long.
It might seem very developer friendly to make the function do whatever you want, but it comes with a lot of complexity. Maybe there should be 2 or more reader implementations instead?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, there are far too many methods on these methods - so I have cleaned this up in #6

return $element[array_key_first($element)];
}, $results);
// We won't continue if the last search term is numeric because
// we will apply this logic later in our code.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a very good idea. Pasting this from DM:
I think the problem here is semantics of the path : You actually want to pick food.2.name as in: Pick food-tag at position 2 and then select the name tag.
Not name.2 (cause there simply is no name-tag at position 2)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in DM's I would like to keep this functionality as it is expected "dot notation" behaviour being able to find a specific element in an array.

}

// Return the results
$results = $this->reader->provide(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some nice matcher composing going on in this method. nice!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

return count($results) === 1 ? $results[0] : $results;
} catch (Exception $exception) {
} catch (Throwable $throwable) {
$this->__destruct();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling destruct in here is not a good idea - same as above.


$results = iterator_to_array($results());

if (empty($results) && $nullable === true) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is quite confusing.
Given you have a dynamic XML, it might change return types based on the amount of content in there.

For example:

If you match for a "static" items.item:
But you have a dynamic XML input:

  • When there are no items, you get []unless it's nullable, then you get null
  • When there is one item, you get the direct content of the parsed item tag
  • When there are two items, you get an array of [item1, item2]

I don't think this is very developer friendly, cause you make it very hard to understand what is going on type-wise. How would the developer have to deal with this situation?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - I have decided that all of these will now return a Query instance which then has methods with expected return types - this should make it a lot easier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants