-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When should I use memmap2? #8
Comments
Sorry, but I can't you help here. I'm using memmap in a very simple manner and don't really care about edge cases. And there are nothing special about this implementation. So I don't see a point in improving the docs. |
@RazrFalcon I am not asking about edge cases, but the main use case. Although there is nothing special about the implementation, it might still be new to those who are encountering memory mapping ideas for the first time (e.g. me). Rather than just asking my own question and moving away, I tried to be general so that it might help others too. Perhaps this is a bad habit learned from StackOverflow, where generality is mandated. Anyway, I totally understand that you don't have the time. |
It's not about time. I honestly have no idea. I just forked memmap because the original was abandoned. I'm not a memmap expert. And there are a lot of blogs on the Internet that teach about it. |
I am hoping the discussion we have hear could make it into the project's README eventually, so I'll try to keep it general rather than specific to my use case.
The problem: I keep returning to consider mmap2 for my use case, but continue to remain unsure.
The current situation is as follows:
Problem 1: There is a "primary process" which generates information that we want to keep, but keeping it all on the RAM is not feasible.
Question 1: does Problem 1 have the right shape for memmap2 to be considered? If not, what is the right shape of problem for which
memmap
should be considered? After all, here's a non-memmap
solution:Solution 1 (file):
keep a buffer of the generated data
and when the buffer is full, flush the data via
mpsc
channels to a writer process. The writer process has a handle to an open file, into which it writes received data to the hard disk using<some binary format>
+serde
, while the primary process keeps generating data.On the other hand, here is a memmap2 solution:
Solution 2 (
memmap
):Question 2: Is the following true? "The benefit of Solution 2 (
memmap
) over Solution 1 (file) is that we do not have to deal with the overhead of inter-thread communication. Put differently, the primary process does not have to wait for a buffer flush + send to complete before continuing to generate data."Question 3: Does Solution 2 make sense if you have a hard disk with slower write speed than the rate at which the primary process generates data?
One could also imagine the following solution:
Solution 3 (
memmap
, parallel):keep a buffer of the generated data
and have a main, memory mapped structure which will hold all the generated data
this main memory mapped structure is kept by a writer process, which is sent information by the primary process using
mpsc
channels, which it then "appends" to the data in the memory mapped structure it is holdingQuestion 4: Is the following statement true? "An advantage of solution 2 is that if we have a hard disk with slower write speed than the rate at which the primary process is generating data, then Solution 3 essentially covers up this issue and replaces the cost instead with that of waiting for a buffer flush + send to complete."
Question 5: Is the following statement true? "The main benefit of memory mapping is to avoid the cost of
<binary format>
encoding/decoding."(Thank you for your time.)
The text was updated successfully, but these errors were encountered: