Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When should I use memmap2? #8

Closed
bzm3r opened this issue Feb 7, 2021 · 3 comments
Closed

When should I use memmap2? #8

bzm3r opened this issue Feb 7, 2021 · 3 comments

Comments

@bzm3r
Copy link

bzm3r commented Feb 7, 2021

I am hoping the discussion we have hear could make it into the project's README eventually, so I'll try to keep it general rather than specific to my use case.

The problem: I keep returning to consider mmap2 for my use case, but continue to remain unsure.

The current situation is as follows:

Problem 1: There is a "primary process" which generates information that we want to keep, but keeping it all on the RAM is not feasible.

Question 1: does Problem 1 have the right shape for memmap2 to be considered? If not, what is the right shape of problem for which memmap should be considered? After all, here's a non-memmap solution:

Solution 1 (file):

  • keep a buffer of the generated data

  • and when the buffer is full, flush the data via mpsc channels to a writer process. The writer process has a handle to an open file, into which it writes received data to the hard disk using <some binary format> + serde, while the primary process keeps generating data.

On the other hand, here is a memmap2 solution:

Solution 2 (memmap):

  • keep generated information in a memory mapped structure.

Question 2: Is the following true? "The benefit of Solution 2 (memmap) over Solution 1 (file) is that we do not have to deal with the overhead of inter-thread communication. Put differently, the primary process does not have to wait for a buffer flush + send to complete before continuing to generate data."

Question 3: Does Solution 2 make sense if you have a hard disk with slower write speed than the rate at which the primary process generates data?

One could also imagine the following solution:

Solution 3 (memmap, parallel):

  • keep a buffer of the generated data

  • and have a main, memory mapped structure which will hold all the generated data

  • this main memory mapped structure is kept by a writer process, which is sent information by the primary process using mpsc channels, which it then "appends" to the data in the memory mapped structure it is holding

Question 4: Is the following statement true? "An advantage of solution 2 is that if we have a hard disk with slower write speed than the rate at which the primary process is generating data, then Solution 3 essentially covers up this issue and replaces the cost instead with that of waiting for a buffer flush + send to complete."

Question 5: Is the following statement true? "The main benefit of memory mapping is to avoid the cost of <binary format> encoding/decoding."

(Thank you for your time.)

@RazrFalcon
Copy link
Owner

Sorry, but I can't you help here. I'm using memmap in a very simple manner and don't really care about edge cases. And there are nothing special about this implementation. So I don't see a point in improving the docs.

@bzm3r
Copy link
Author

bzm3r commented Feb 7, 2021

@RazrFalcon I am not asking about edge cases, but the main use case. Although there is nothing special about the implementation, it might still be new to those who are encountering memory mapping ideas for the first time (e.g. me). Rather than just asking my own question and moving away, I tried to be general so that it might help others too. Perhaps this is a bad habit learned from StackOverflow, where generality is mandated.

Anyway, I totally understand that you don't have the time.

@RazrFalcon
Copy link
Owner

It's not about time. I honestly have no idea. I just forked memmap because the original was abandoned. I'm not a memmap expert. And there are a lot of blogs on the Internet that teach about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants