Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterator and Table interface #6

Closed
Moelf opened this issue Jul 1, 2020 · 1 comment
Closed

Iterator and Table interface #6

Moelf opened this issue Jul 1, 2020 · 1 comment
Labels
enhancement New feature or request io Input/Output related
Milestone

Comments

@Moelf
Copy link
Member

Moelf commented Jul 1, 2020

This allows smaller memory footprint by avoiding materialize the entire array into memory. And we can use something like OnlineStats

I tried to implement one for a few days now and find fBasketSeek is very annoying and want some suggestion. I'm not sure what would be the canonical way to do this but:

struct BRANCH_ITR
    io
    seeks
    thetype
    count::Int
end

function Base.iterate(S::BRANCH_ITR, idx=1)
    basket_seek = S.seeks[idx]
    if idx > S.count
        return nothing
    elseif basket_seek==0
        return (nothing, idx+1)
    else
        s = datastream(io, basketkey)
        return (readtype(s, S.thetype), idx)
    end
end

this doesn't work because readtype advance the cursor and we lost track of it in the next iteration, where would you suggest to keep track of:

  • basket cursor
  • cursor within the basket?

https://quinnj.home.blog/2020/11/13/partition-all-the-datas/ Leverage this as ROOT provides natural partition too.

@tamasgal
Copy link
Member

Sorry @Moelf I was very busy with work stuff and put UnROOT on hold (a bit selfish, but currently UnROOT works for my own analysis, so I put it on low prio in favour of finalising my PhD... 🙈 )

Anyways, the cursor interface needs to be fixed. I was playing with two different ways to implement it and the newer one is thread-safe and I think it will also easily solve the issue with your iterator interface.

So in fact, readtype() should use a copy of the Cursor instead of the actual io. What do you think?

I'd also like to add caching at some point.

@tamasgal tamasgal added enhancement New feature or request io Input/Output related labels Jul 27, 2020
@tamasgal tamasgal added this to the Version 1.0 milestone Jul 27, 2020
@Moelf Moelf changed the title Iterator interface Iterator and Table interface Nov 15, 2020
@Moelf Moelf closed this as completed in 0338002 Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request io Input/Output related
Projects
None yet
Development

No branches or pull requests

2 participants