-
Notifications
You must be signed in to change notification settings - Fork 67
Block management #135
Comments
In any block management algorithm, the final write of the file could come at a time when some blocks in the file are unused. The ROOT format allows the locations of these blocks to be saved to the file so that a subsequent process can pick up on them and use them.
So the |
I think this is described in e.g. Knuth's TAOCP Vol.1 2.5 "Dynamic Storage Allocation" |
I don't know, but I think dynamic storage allocation usually refers to allocation of RAM memory (like what |
Mkay, I didn't learn "Dynamic Storage Allocation" by heart and only skimmed through it, but what I've seen about managing free blocks using linked lists, to me looked very similar to https://root.cern.ch/doc/master/classTFree.html and https://root.cern.ch/doc/master/classTFree.html#aee8b83cc7f0d5729c6ca8e2e06c32cef |
You're right: There's a lot of freedom in how this algorithm is written. It doesn't have to act like ROOT, but it does have to save data with the same meaning or a compatible meaning for the next invocation. (When I was scanning through the list of open issues, I skipped by this one because it's a harder problem that requires understanding of more interrelated things. The TEfficiency one is somewhat more isolated, and would therefore make an easier stepping stone. But of course, it's partially up to you—partially because Reik is currently the assignee; he may have plans for this one.) |
I was going to come to this after most of the other issues assigned to me are closed so feel free to tackle this issue if you want to :) |
Actually implementing it will require a thorough understanding of the existing codebase, which would be hard to just jump into. This is something that would be easier to sketch out as an independent script—prototyping would be much faster than if it had to be embedded in the real code. I had to do exactly this to solve the However, the danger of this is that @anerokhi will solve the wrong problem, because there would be no constraint against that. For the |
Well, I was not going to solve this issue, at least not in the next 6 months. |
:) "Danger" that you would spend time on something that wouldn't be useful. |
Currently, reusing a key name or deleting a key reuses the space used by the key but not the value. The behavior is formally correct, but it is inefficient: assigning new histograms to the same name will make the output file length grow indefinitely. To handle this properly, we need to view the linear sequence of bytes in a file as reusable blocks, rather than a stream.
There are many algorithms that do this. ROOT must have one, for instance, and this is essentially what
malloc
does with RAM memory. This is an open-ended project involving some thinking about design. These systems are prone to fragmentation; perhaps introducing a (not too small, not too large) granular unit of allocation helps. Perhaps that unit can be 4K to match most filesystems, which read in 4K blocks.Designs should probably be tested outside of the uproot codebase and only integrated when a good solution is found. Maybe there's a library that does it, though I doubt that. (I'd even have a hard time looking for a search term: by what name do computer scientists refer to this problem?) Once you have a good solution, it should be integrated into
TFile._expandfile
so that both new values and new keys can take advantage of it.The text was updated successfully, but these errors were encountered: