Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you tee multiple times #134

Open
Kreijstal opened this issue Feb 18, 2022 · 4 comments
Open

Can you tee multiple times #134

Kreijstal opened this issue Feb 18, 2022 · 4 comments

Comments

@Kreijstal
Copy link

without fear of a memory leak, or is this simply the wrong way of doing things?

@make-github-pseudonymous-again
Copy link
Member

make-github-pseudonymous-again commented Feb 18, 2022

You cannot do tee(x) twice if x is an iterator. Or do you mean tee(tee(x, 2)[0], 2)?

If the input iterable is an iterator [or generator], then it must be discarded [read not touched] by the caller after calling tee. So in a sense tee is destructive. But you can make as many copies as you want from one source using the second parameter: tee(iterator, 5) for five copies for instance. For simplicity you can see tee as swallowing the source and outputing two or more new sources. After calling tee, look only at the output sources, don't look at the input source ever again.

Other than that, tee is certainly composable. tee returns an Iterable of Iterables each of which can be fed to tee again if you want to. Nothing strange will happen if you do that. There is no limitation. Or rather, memory is the limit!

Regarding memory leak. The current implementation keeps a copy of each yielded element for as long as needed but not more. As long as needed here means that if you consume each copy in "parallel" then constant additional space will be used for storage. On the other hand, if you keep an unconsumed copy somewhere then a single copy of each element will be kept in memory (linear additional space) until that last unconsumed copy gets garbage collected.

Does that answer your question?

@Kreijstal
Copy link
Author

yeah that's what I was afraid of, I want to tee multiple times an iterator that is not guaranteed to end, but I suppose easier is to simply, copy the information in an array as soon as it's read, and operate from there, instead of merely teeing multiple times, thanks.

@make-github-pseudonymous-again
Copy link
Member

Yep. That's a solution! Keeping a copy of the data read so far is definitely the only solution if you do not know in advance how many times you will have to tee. You can do it with an adhoc implementation. You can achieve the same result by always calling tee with one more copy than you actually need, and forward this copy untouched for the entire duration of your process. For instance:

let source = ...; // can be infinite
let copy1, copy2, copy3, copy4, copy5, ...; // as many as you like
[source, copy1, copy2] = tee(source, 3);
...
[source, copy3, copy4, copy5] = tee(source, 4);
... // etc

Note the important bit where source gets a reference to a new iterator each time you call tee. Under the hood, only one copy of each element read so far will be kept in memory.

@make-github-pseudonymous-again
Copy link
Member

PS: Actually, in the example given above, if you consume any of copy3, copy4, or copy5 faster than copy1 and copy2 you will start storing read elements twice. Moreover, you'll get some additional overhead for each time you call tee. I have an idea on how to make these problems disappear. I will see if it can be implemented easily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants