New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

createReadStream() #35

Closed
wants to merge 17 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@e-e-e
Contributor

e-e-e commented Dec 31, 2017

Hey @mafintosh @noffle,

I started playing around with building a readable stream interface for hyperdb. I started this as createDiffStream is scaling badly, and wanted to see if I could get a super lean stream working.

This is really rough, and not alt all ready for merging (still using => funcs and riddled with console logs), but I figured I should open a PR to get your guys input. I have been using @noffle's architecture description as a basis, and reading your code, but still might be doing something totally stupid.

Any thoughts would be appreciated. Its quick and respects back pressure so hopefully will scale well.

One bug that I cant seem to figure out so far is illustrated in the first test - although there are multiple 'bar/cat' keys, it's only returning the latest - while for all other keys (for examplefoo/) the duplicates are returned. UPDATE: I changed the implementation of read stream so that it only returns the latest entries.

@e-e-e

This comment has been minimized.

Show comment
Hide comment
@e-e-e

e-e-e Dec 31, 2017

Contributor

also - happy new years 🍾
hope your taking a break and relaxing.

Contributor

e-e-e commented Dec 31, 2017

also - happy new years 🍾
hope your taking a break and relaxing.

@e-e-e

This comment has been minimized.

Show comment
Hide comment
@e-e-e

e-e-e Jan 4, 2018

Contributor

@mafintosh @noffle I think I have got this to a point where this is working well. Would you mind having a look and seeing if there is any thing obviously dumb or if I have really tackled this from the wrong angle completely.

There may be edge cases around conflicting nodes which I have not thought through.

I also have to write some nicer tests - the tests at the moment are just what I was playing around with to test things for myself.

Let me know your thoughts.

Contributor

e-e-e commented Jan 4, 2018

@mafintosh @noffle I think I have got this to a point where this is working well. Would you mind having a look and seeing if there is any thing obviously dumb or if I have really tackled this from the wrong angle completely.

There may be edge cases around conflicting nodes which I have not thought through.

I also have to write some nicer tests - the tests at the moment are just what I was playing around with to test things for myself.

Let me know your thoughts.

@joehand joehand referenced this pull request Jan 4, 2018

Closed

Two way replication #11

@e-e-e e-e-e changed the title from createReadStream() [WIP] to createReadStream() Jan 7, 2018

@e-e-e

This comment has been minimized.

Show comment
Hide comment
@e-e-e

e-e-e Jan 7, 2018

Contributor

@noffle and @mafintosh I think this is ready now - if you want to take a look. I just experimented with this implementation of createReadStream on an instance of hyper-graph-db with 200k+ entries and got search times down from 30-40s to 1-3s. Its really quick.

Contributor

e-e-e commented Jan 7, 2018

@noffle and @mafintosh I think this is ready now - if you want to take a look. I just experimented with this implementation of createReadStream on an instance of hyper-graph-db with 200k+ entries and got search times down from 30-40s to 1-3s. Its really quick.

@noffle

This comment has been minimized.

Show comment
Hide comment
@noffle

noffle Jan 7, 2018

Collaborator

Awesome work @e-e-e! I've been following your progress somewhat, though I'm not sure when I'll have time to go over it in detail.

Collaborator

noffle commented Jan 7, 2018

Awesome work @e-e-e! I've been following your progress somewhat, though I'm not sure when I'll have time to go over it in detail.

@e-e-e

This comment has been minimized.

Show comment
Hide comment
@e-e-e

e-e-e Jan 8, 2018

Contributor

I noticed that the node queue sort was a performance bottle neck - so just implemented a conditional check meaning it only sorts when the next node is less than the last node in the queue. Strangely though - it looks like now the queue does not actually need to be sorted at all. The next algorithm always returns them in order seemingly - I am skeptical that this is always the case so have left the sort as part of the stream - but perhaps you will have a better idea @mafintosh, and maybe we dont need it at all.

Contributor

e-e-e commented Jan 8, 2018

I noticed that the node queue sort was a performance bottle neck - so just implemented a conditional check meaning it only sorts when the next node is less than the last node in the queue. Strangely though - it looks like now the queue does not actually need to be sorted at all. The next algorithm always returns them in order seemingly - I am skeptical that this is always the case so have left the sort as part of the stream - but perhaps you will have a better idea @mafintosh, and maybe we dont need it at all.

@mafintosh

This comment has been minimized.

Show comment
Hide comment
@mafintosh

mafintosh Mar 12, 2018

Owner

We merged next :)

Owner

mafintosh commented Mar 12, 2018

We merged next :)

@mafintosh mafintosh closed this Mar 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment