Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch support in createReadStream #261

Merged
merged 10 commits into from
May 19, 2020

Conversation

tinchoz49
Copy link
Contributor

@tinchoz49 tinchoz49 commented May 13, 2020

It adds support for batch reading in createReadStream and improve the performance.

It's an initial idea and feedback is welcome.

Usage

feed.createReadStream({ batch: 100 }) // read batch of 100 messages

Benchmark

> node read-64kb-blocks-linear.js
685658891 bytes/s
10448 blocks/s

> node read-64kb-blocks-linear-batch.js // batch 100 
1080223162 bytes/s
16384 blocks/s

Todo

  • Add test (probably I will use the same tests that we use to test createReadStream)

@tinchoz49 tinchoz49 marked this pull request as draft May 14, 2020 02:29
@tinchoz49
Copy link
Contributor Author

tinchoz49 commented May 14, 2020

My concern about this is related with the download progress. It's not the same to wait for a single message to be download than 100 messages.

Maybe I could check if I have the range of messages downloaded for the batch I choose that option and if is not I use get.

@tinchoz49 tinchoz49 marked this pull request as ready for review May 14, 2020 18:43
@tinchoz49 tinchoz49 force-pushed the tinchoz49/batch-createreadstream branch from 47376fa to df7723b Compare May 15, 2020 14:02
@tinchoz49
Copy link
Contributor Author

My concern about this is related with the download progress. It's not the same to wait for a single message to be download than 100 messages.
Maybe I could check if I have the range of messages downloaded for the batch I choose that option and if is not I use get.

Now I check for the range downloaded and if is not downloaded it use get

@mafintosh
Copy link
Contributor

@tinchoz49 perhaps we could do a more opportunistic thing where we just check how many messages we have on disk compared to where we are and let's say we have the next 100 messages already downloaded we use the batch api to get them out

CHANGELOG.md Outdated
@@ -9,3 +9,4 @@ See [UPGRADE.md](UPGRADE.md) for notes on breaking changes for downstream develo
- Ease of use update to signatures, https://github.com/mafintosh/hypercore/issues/260
- Updates [noise-protocol](https://github.com/emilbayes/noise-protocol) to latest, which uses chacha instead of xchacha and moves avoid from the sodium kx api for better compatability with the rest of the Noise ecosystem.
- Updates [sodium-native](https://github.com/sodium-friends/sodium-native) from 2 to 3 across the board. 3 uses n-api, so no more builds needed when Node is released.
- Update the discovery key to hash in the protocol version for better migration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is from the first commit which i force pushed out. can you remove that commit as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

droped it

index.js Outdated
@@ -1346,6 +1376,16 @@ Feed.prototype.createReadStream = function (opts) {
read(size, cb)
})
}

function setStart (value) {
var _start = start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_start -> prevStart

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

r.pipe(w).on('finish', function () {
collect(feed2.createReadStream({ batch }), function (err, data) {
t.error(err, 'no error')
t.same(data, [bufferFrom('hello'), bufferFrom('world')])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR this made me realise we need this #265

@mafintosh
Copy link
Contributor

Looks GREAT @tinchoz49 !! Seems to be much faster as well.
I've added a few nits if you wanna take a look.

@tinchoz49 tinchoz49 force-pushed the tinchoz49/batch-createreadstream branch from afaaaea to 934a7e4 Compare May 18, 2020 19:18
@tinchoz49 tinchoz49 requested a review from mafintosh May 18, 2020 19:22
@mafintosh
Copy link
Contributor

@tinchoz49 Thanks, I'll land and release this tmw :)

If you wanna post your perf stats here we can include it in the changelog

@tinchoz49
Copy link
Contributor Author

Awesome! thanks for taking the time to review this.

Read 100000 Messages

  • createReadStream without batch: 867.438ms
  • createReadStream with batch = 100: 113.650ms

@tinchoz49
Copy link
Contributor Author

I forgot to add the batch option to the readme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants