Ensure backpressure is maintained in streams #71

rubensworks · 2020-11-12T14:14:03Z

There may be some places where we are not adhering to Node's backpressure conventions with regards to streams.

Concretely, we seem to be creating new streams (such as Readable, PassThrough and Transform), and push-ing into them (via a 'data' handler on another stream). Node handles backpressuring via the return value of push, which we are ignoring in this manner.

A better solution would be to simply pipe instead of calling push on each data element.

Related to rubensworks/rdf-parse.js@269c757

Related to #66

The text was updated successfully, but these errors were encountered:

Tpt · 2022-08-16T16:17:17Z

I just had a quick look at it. I see two places where there might be a significant problems with backpressure:

JsonLdParser.import does not care about backpressure at all. A quick way to add backpressure there would be to use the pipe method on the input stream when available.
If no context is provided by the caller and not in streaming mode, the JSON-LD parser buffers all data before starting parsing only when the JSON is fully read. It will push all data when the last piece of JSON is read. I am not sure how much ignoring back pressure is bad knowing we already buffered the complete JSON into memory.

rubensworks · 2022-08-17T11:18:05Z

If no context is provided by the caller and not in streaming mode, the JSON-LD parser buffers all data before starting parsing only when the JSON is fully read. It will push all data when the last piece of JSON is read. I am not sure how much ignoring back pressure is bad knowing we already buffered the complete JSON into memory.

Indeed, unless the streaming profile is enabled, the full JSON-LD document is stored in memory before processing can start.

This is because without the streaming profile (which mandates that certain JSON keys come in specific orders), the following situation could occur:

{
  ... very long JSON-LD document
  "@context": "..."
}

Since the context in the example above comes at the end, the meaning of all preceding entries could be different. So that is why buffering must take place.
With the streaming profile on the other hand, the @context MUST occur first (or after @type).

Just pinging @wouterbeek and @LaurensRietveld on this as well, as this will have an impact on the bounty's resolution. I.e., if the streaming profile is not explicitly enabled, full buffering will take place, and memory issues can still arise.

Required for #71

Related to #71

LaurensRietveld · 2022-08-19T17:27:27Z

Hi @rubensworks and @Tpt , having to explicitly enable the streaming profile in order to properly (without buffering fully) stream through the jsonld is fine from our end

rubensworks added the enhancement New feature or request label Nov 12, 2020

rubensworks added this to To do in Work via automation Nov 12, 2020

rubensworks moved this from To do to To do (minor) in Work Nov 12, 2020

rubensworks moved this from To do (minor) to To do in Work Nov 12, 2020

rubensworks added the comunica-association-bounty label May 4, 2021

This was referenced Jul 20, 2021

Performance issues / memory core dumps with big files #65

Open

Stream goes into flowing mode immediately #66

Open

Stream scalability issues #76

Open

rubensworks removed the comunica-association-bounty label Jul 20, 2021

rubensworks pushed a commit that referenced this issue Aug 17, 2022

Use .pipe in .import if available

df0af6d

Required for #71

rubensworks pushed a commit that referenced this issue Aug 17, 2022

Use .push instead of .emit('data') where possible

7233b16

Related to #71

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure backpressure is maintained in streams #71

Ensure backpressure is maintained in streams #71

rubensworks commented Nov 12, 2020 •

edited

Loading

Tpt commented Aug 16, 2022

rubensworks commented Aug 17, 2022

LaurensRietveld commented Aug 19, 2022

Ensure backpressure is maintained in streams #71

Ensure backpressure is maintained in streams #71

Comments

rubensworks commented Nov 12, 2020 • edited Loading

Tpt commented Aug 16, 2022

rubensworks commented Aug 17, 2022

LaurensRietveld commented Aug 19, 2022

rubensworks commented Nov 12, 2020 •

edited

Loading