Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use streams more efficiently #802

Closed
stephenplusplus opened this issue Aug 15, 2015 · 6 comments
Closed

Use streams more efficiently #802

stephenplusplus opened this issue Aug 15, 2015 · 6 comments
Assignees
Labels

Comments

@stephenplusplus
Copy link
Contributor

Our support for readable streams in our library is now almost everywhere possible, which is awesome. But, the way we use them is not in the most efficient way.

How we do it, using bucket.getFiles for example:

  1. Make API request to server to get all of the files from a bucket
  2. Get a large JSON response with all of the files' metadata
  3. Split the array into transformed "File" objects
  4. Push all of the objects onto a readable stream that the user is reading from
  5. If there's a "nextPageToken", repeat step 1.

How we could do it:

  1. Make streaming API request to server to get all of the files from a bucket
  2. Parse the JSON response as it comes in in chunks
  3. Transform the returned files into "File" objects

How we do it currently (simplified):

bucket.getFiles = function() {
  var stream = through.obj();

  request.get('https://.../bucket/files', function(err, resp) {
    // A potentially huge JSON response exists in memory now
    resp.items.forEach(function(fileMetadata) {
      var file = new File(fileMetadata.name);
      file.metadata = fileMetadata;
      stream.push(file);
    });

    stream.push(null);
  });

  return stream;
};

How it would look (simplified):

bucket.getFiles = function() {
  return request.get('https://...bucket/files')
    .pipe(JSONStream.parse('items.*')) // link to JSONStream below
    .pipe(through.obj(toFileObject));

  function toFileObject(obj, enc, next) {
    var file = new File(obj.name);
    file.metadata = obj;
    this.push(file);
    next();
  }
};

With the second example, the object is only in memory until it is flushed to the next handler in the user's own pipeline. At that point, the user is free to let it buffer up until they're all in or write them to a destination in chunks, not letting any extra memory pile up.

* JSONStream: https://github.com/dominictarr/JSONStream

@stephenplusplus
Copy link
Contributor Author

Here's a test implementation: https://gist.github.com/stephenplusplus/b67bcaf570f33181d297

@stephenplusplus
Copy link
Contributor Author

Moved to a repo for further testing: https://github.com/stephenplusplus/gcloud-streams-test

@stephenplusplus
Copy link
Contributor Author

I think after all of the profiling over in the gcloud-streams-test repo, it's safe to conclude conforming to a more stream-y approach would be nice, but it comes at quite a cost; both hackiness and memory usage. The suggested approach from this issue underperforms our approach in all aspects. The repo will stay open for discussion (see stephenplusplus/gcloud-streams-test#1), but for us right now, I think we should stick with what we're doing.

@jgeewax
Copy link
Contributor

jgeewax commented Aug 31, 2015

I'm confused how a "give me everything, then I'll stream it to you" would be more memory efficient than "when I get a page of items, I'll stream that, and then continue fetching only if you ask me for more"...

I'd expect the "full buffer" implementation to consume more memory, and more CPU intensive. Can you explain why it's the opposite?

@stephenplusplus
Copy link
Contributor Author

I didn't explain the pagination part that we use with our current solution. I'll update the first post.

We get all of the results back that the API gives us, drain it, then if there's another page, we repeat. I haven't seen an API response hike up into the multi-MB payload size, but if it did, it would be quickly drained. Compare that to the all-stream solution, where we spend multi-MB in memory just to create multiple Stream objects. With our current solution, we only need to create one.

Related: https://twitter.com/stephenplusplus/status/634791341172637697

@stephenplusplus
Copy link
Contributor Author

"when I get a page of items, I'll stream that, and then continue fetching only if you ask me for more"...

Sorry, just caught that. They both would behave the same here. If a user manually ends the stream (this.end()), the stream closes, and no more API requests are made.

The solution we're using now might not be right for every case, but I think given our smaller-sized API responses, it's more practical. It would be great to get more eyes on this question though (stephenplusplus/gcloud-streams-test#1) -- I think everyone is just busy, equally stumped, or indifferent :)

sofisl pushed a commit that referenced this issue Nov 10, 2022
* fix(deps): update dependency protobufjs to v7

* fix(deps): do not depend on protobufjs

Co-authored-by: Alexander Fenster <github@fenster.name>
sofisl pushed a commit that referenced this issue Nov 10, 2022
🤖 I have created a release *beep* *boop*
---


## [7.0.1](googleapis/nodejs-translate@v7.0.0...v7.0.1) (2022-08-10)


### Bug Fixes

* **deps:** do not depend on protobufjs ([#802](googleapis/nodejs-translate#802)) ([e8f13e4](googleapis/nodejs-translate@e8f13e4))
* **deps:** update dependency @google-cloud/automl to v3 ([#796](googleapis/nodejs-translate#796)) ([440b1cb](googleapis/nodejs-translate@440b1cb))
* **deps:** update dependency @google-cloud/text-to-speech to v4 ([#797](googleapis/nodejs-translate#797)) ([a210e6e](googleapis/nodejs-translate@a210e6e))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
sofisl pushed a commit that referenced this issue Nov 11, 2022
[![WhiteSource Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [ts-loader](https://togithub.com/TypeStrong/ts-loader) | [`^8.0.0` -> `^9.0.0`](https://renovatebot.com/diffs/npm/ts-loader/8.1.0/9.0.0) | [![age](https://badges.renovateapi.com/packages/npm/ts-loader/9.0.0/age-slim)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://badges.renovateapi.com/packages/npm/ts-loader/9.0.0/adoption-slim)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://badges.renovateapi.com/packages/npm/ts-loader/9.0.0/compatibility-slim/8.1.0)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://badges.renovateapi.com/packages/npm/ts-loader/9.0.0/confidence-slim/8.1.0)](https://docs.renovatebot.com/merge-confidence/) |

---

### Release Notes

<details>
<summary>TypeStrong/ts-loader</summary>

### [`v9.0.0`](https://togithub.com/TypeStrong/ts-loader/blob/master/CHANGELOG.md#v900)

[Compare Source](https://togithub.com/TypeStrong/ts-loader/compare/v8.1.0...v9.0.0)

Breaking changes:

-   minimum webpack version: 5
-   minimum node version: 12

Changes:

-   [webpack 5 migration](https://togithub.com/TypeStrong/ts-loader/pull/1251) - thanks [@&#8203;johnnyreilly](https://togithub.com/johnnyreilly), [@&#8203;jonwallsten](https://togithub.com/jonwallsten), [@&#8203;sokra](https://togithub.com/sokra), [@&#8203;appzuka](https://togithub.com/appzuka), [@&#8203;alexander-akait](https://togithub.com/alexander-akait)

</details>

---

### Configuration

:date: **Schedule**: "after 9am and before 3pm" (UTC).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

:no_bell: **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box.

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#github/googleapis/nodejs-dialogflow).
sofisl pushed a commit that referenced this issue Jan 17, 2023
This PR was generated using Autosynth. 🌈

Synth log will be available here:
https://source.cloud.google.com/results/invocations/cc99acfa-05b8-434b-9500-2f6faf2eaa02/targets

- [ ] To automatically regenerate this PR, check this box.

Source-Link: googleapis/synthtool@799d8e6
sofisl pushed a commit that referenced this issue Jan 24, 2023
sofisl pushed a commit that referenced this issue Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants