Use streams more efficiently #802

stephenplusplus · 2015-08-15T23:58:41Z

Our support for readable streams in our library is now almost everywhere possible, which is awesome. But, the way we use them is not in the most efficient way.

How we do it, using bucket.getFiles for example:

Make API request to server to get all of the files from a bucket
Get a large JSON response with all of the files' metadata
Split the array into transformed "File" objects
Push all of the objects onto a readable stream that the user is reading from
If there's a "nextPageToken", repeat step 1.

How we could do it:

Make streaming API request to server to get all of the files from a bucket
Parse the JSON response as it comes in in chunks
Transform the returned files into "File" objects

How we do it currently (simplified):

bucket.getFiles = function() {
  var stream = through.obj();

  request.get('https://.../bucket/files', function(err, resp) {
    // A potentially huge JSON response exists in memory now
    resp.items.forEach(function(fileMetadata) {
      var file = new File(fileMetadata.name);
      file.metadata = fileMetadata;
      stream.push(file);
    });

    stream.push(null);
  });

  return stream;
};

How it would look (simplified):

bucket.getFiles = function() {
  return request.get('https://...bucket/files')
    .pipe(JSONStream.parse('items.*')) // link to JSONStream below
    .pipe(through.obj(toFileObject));

  function toFileObject(obj, enc, next) {
    var file = new File(obj.name);
    file.metadata = obj;
    this.push(file);
    next();
  }
};

With the second example, the object is only in memory until it is flushed to the next handler in the user's own pipeline. At that point, the user is free to let it buffer up until they're all in or write them to a destination in chunks, not letting any extra memory pile up.

* JSONStream: https://github.com/dominictarr/JSONStream

stephenplusplus · 2015-08-16T02:15:13Z

Here's a test implementation: https://gist.github.com/stephenplusplus/b67bcaf570f33181d297

stephenplusplus · 2015-08-21T00:44:01Z

Moved to a repo for further testing: https://github.com/stephenplusplus/gcloud-streams-test

stephenplusplus · 2015-08-31T13:08:53Z

I think after all of the profiling over in the gcloud-streams-test repo, it's safe to conclude conforming to a more stream-y approach would be nice, but it comes at quite a cost; both hackiness and memory usage. The suggested approach from this issue underperforms our approach in all aspects. The repo will stay open for discussion (see stephenplusplus/gcloud-streams-test#1), but for us right now, I think we should stick with what we're doing.

jgeewax · 2015-08-31T15:32:46Z

I'm confused how a "give me everything, then I'll stream it to you" would be more memory efficient than "when I get a page of items, I'll stream that, and then continue fetching only if you ask me for more"...

I'd expect the "full buffer" implementation to consume more memory, and more CPU intensive. Can you explain why it's the opposite?

stephenplusplus · 2015-08-31T15:59:17Z

I didn't explain the pagination part that we use with our current solution. I'll update the first post.

We get all of the results back that the API gives us, drain it, then if there's another page, we repeat. I haven't seen an API response hike up into the multi-MB payload size, but if it did, it would be quickly drained. Compare that to the all-stream solution, where we spend multi-MB in memory just to create multiple Stream objects. With our current solution, we only need to create one.

Related: https://twitter.com/stephenplusplus/status/634791341172637697

stephenplusplus · 2015-08-31T16:12:00Z

"when I get a page of items, I'll stream that, and then continue fetching only if you ask me for more"...

Sorry, just caught that. They both would behave the same here. If a user manually ends the stream (this.end()), the stream closes, and no more API requests are made.

The solution we're using now might not be right for every case, but I think given our smaller-sized API responses, it's more practical. It would be great to get more eyes on this question though (stephenplusplus/gcloud-streams-test#1) -- I think everyone is just busy, equally stumped, or indifferent :)

* fix(deps): update dependency protobufjs to v7 * fix(deps): do not depend on protobufjs Co-authored-by: Alexander Fenster <github@fenster.name>

🤖 I have created a release *beep* *boop* --- ## [7.0.1](googleapis/nodejs-translate@v7.0.0...v7.0.1) (2022-08-10) ### Bug Fixes * **deps:** do not depend on protobufjs ([#802](googleapis/nodejs-translate#802)) ([e8f13e4](googleapis/nodejs-translate@e8f13e4)) * **deps:** update dependency @google-cloud/automl to v3 ([#796](googleapis/nodejs-translate#796)) ([440b1cb](googleapis/nodejs-translate@440b1cb)) * **deps:** update dependency @google-cloud/text-to-speech to v4 ([#797](googleapis/nodejs-translate#797)) ([a210e6e](googleapis/nodejs-translate@a210e6e)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

[![WhiteSource Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [ts-loader](https://togithub.com/TypeStrong/ts-loader) | [`^8.0.0` -> `^9.0.0`](https://renovatebot.com/diffs/npm/ts-loader/8.1.0/9.0.0) | [![age](https://badges.renovateapi.com/packages/npm/ts-loader/9.0.0/age-slim)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://badges.renovateapi.com/packages/npm/ts-loader/9.0.0/adoption-slim)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://badges.renovateapi.com/packages/npm/ts-loader/9.0.0/compatibility-slim/8.1.0)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://badges.renovateapi.com/packages/npm/ts-loader/9.0.0/confidence-slim/8.1.0)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>TypeStrong/ts-loader</summary> ### [`v9.0.0`](https://togithub.com/TypeStrong/ts-loader/blob/master/CHANGELOG.md#v900) [Compare Source](https://togithub.com/TypeStrong/ts-loader/compare/v8.1.0...v9.0.0) Breaking changes: - minimum webpack version: 5 - minimum node version: 12 Changes: - [webpack 5 migration](https://togithub.com/TypeStrong/ts-loader/pull/1251) - thanks [@johnnyreilly](https://togithub.com/johnnyreilly), [@jonwallsten](https://togithub.com/jonwallsten), [@sokra](https://togithub.com/sokra), [@appzuka](https://togithub.com/appzuka), [@alexander-akait](https://togithub.com/alexander-akait) </details> --- ### Configuration :date: **Schedule**: "after 9am and before 3pm" (UTC). :vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied. :recycle: **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. :no_bell: **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box. --- This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#github/googleapis/nodejs-dialogflow).

This PR was generated using Autosynth. 🌈 Synth log will be available here: https://source.cloud.google.com/results/invocations/cc99acfa-05b8-434b-9500-2f6faf2eaa02/targets - [ ] To automatically regenerate this PR, check this box. Source-Link: googleapis/synthtool@799d8e6

🤖 I have created a release \*beep\* \*boop\* --- ## [4.9.0](https://www.github.com/googleapis/nodejs-speech/compare/v4.8.1...v4.9.0) (2021-09-20) ### Features * add output_config request field ([#801](https://www.github.com/googleapis/nodejs-speech/issues/801)) ([b4f8cbf](https://www.github.com/googleapis/nodejs-speech/commit/b4f8cbff71e3cbdce4d7fb7cbe67f7065f4c6a2b)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

stephenplusplus added the core label Aug 15, 2015

stephenplusplus closed this as completed Aug 31, 2015

JustinBeckwith assigned stephenplusplus Feb 1, 2021

sofisl mentioned this issue Nov 10, 2022

migrate code from googleapis/nodejs-translate #3520

Merged

4 tasks

sofisl pushed a commit that referenced this issue Nov 10, 2022

fix(deps): do not depend on protobufjs (#802)

b13ea40

* fix(deps): update dependency protobufjs to v7 * fix(deps): do not depend on protobufjs Co-authored-by: Alexander Fenster <github@fenster.name>

release-please bot mentioned this issue Nov 10, 2022

chore: release main #3529

Merged

sofisl mentioned this issue Nov 11, 2022

migrate code from googleapis/nodejs-dialogflow #3546

Merged

4 tasks

sofisl mentioned this issue Jan 17, 2023

migrate code from googleapis/nodejs-vision #3860

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use streams more efficiently #802

Use streams more efficiently #802

stephenplusplus commented Aug 15, 2015

stephenplusplus commented Aug 16, 2015

stephenplusplus commented Aug 21, 2015

stephenplusplus commented Aug 31, 2015

jgeewax commented Aug 31, 2015

stephenplusplus commented Aug 31, 2015

stephenplusplus commented Aug 31, 2015

Use streams more efficiently #802

Use streams more efficiently #802

Comments

stephenplusplus commented Aug 15, 2015

stephenplusplus commented Aug 16, 2015

stephenplusplus commented Aug 21, 2015

stephenplusplus commented Aug 31, 2015

jgeewax commented Aug 31, 2015

stephenplusplus commented Aug 31, 2015

stephenplusplus commented Aug 31, 2015