New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory consumption while recovering damaged data file (in scenario in-file storage) #1255
Comments
Please let me know if I should provide some other information. |
Do you have the store file that we could use to reproduce? It is possible that a corruption causes the server to think that a payload/etc.. is of a size bigger than it is, but this is just speculation at this point. I am not clear on this:
If you delete the file, which then resolves the issue, how can the file "continue to exist"? |
Yes, sure. Could you please tell me, which way can I send you these files?
I mean the situation, when someone kills server during file repair and starting it again — in that case damaged file exists, but stays not repaired. |
If you have the corruption, you could send the whole datastore directory to ivan@nats.io. |
@Jerito-kun As suspected, the high memory usage is due to a corrupted record that indicates that the message payload is ~1.6GB which then cause a memory allocation of that size. Note that the buffer is then released, so the actual memory in use is not that much and when garbage collection will kick in, the memory should be reclaimed (I have verified with Given that, I am not sure what is the best course of action here. I could make reading a record fail if it means that the server would have to create a buffer of a certain size, but which? hard-code? a new option? Or leave as-is, again knowing that the memory was allocated, but not currently in-use. |
I think, a new option will be the best. |
In case of memory corruption, it is possible that the record size is way greater than it should, which would cause the server to create a buffer of the wrong size in the attempt to read the record. This new option will limit how big the buffer needed to read the record from disk can be. Resolves #1255 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
In case of memory corruption, it is possible that the record size is way greater than it should, which would cause the server to create a buffer of the wrong size in the attempt to read the record. This new option will limit how big the buffer needed to read the record from disk can be. Resolves #1255 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
@Jerito-kun I added that in PR 1259 and 1260. This will be part of the next release. |
Reverted addition of record_size_limit But still address the memory usage caused by a corrupted data message on recovery. By using the expected record size from the index file, when checking that the last message matches the index information, we would find out that the index's stored message record size does not match the record size in the ".dat" file and would not allocate the memory to read the rest of the message. The record_size_limit that was added to solve that issue would have likely caused a lot of issues if mis-used. Resolves #1255 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
High memory consumption while recovering damaged data file (in scenario in-file storage)
In case, when (for unknown reason) one of data file was damaged (for example: bad EOF mark), server make a note in log file:
{"level":"info","time":"2022-06-23T13:14:37+03:00","message":"Server is ready"}
{"level":"info","time":"2022-06-23T13:14:38+03:00","message":"STREAM: Recovering the state..."}
{"level":"error","time":"2022-06-23T13:14:43+03:00","message":"STREAM: Verification of last message for file "C:\\ProgramData\\Some_folder\\NSS\\data\\Some_file.Log\\msgs.1.dat" failed: unable to read last record: unexpected EOF"}
{"level":"error","time":"2022-06-23T13:14:43+03:00","message":"STREAM: Error with index file "C:\\ProgramData\\Some_folder\\NSS\\data\\Some_file.Log\\msgs.1.idx": Verification of last message for file "C:\\ProgramData\\Some_folder\\NSS\\data\\Some_file.Log\\msgs.1.dat" failed: unable to read last record: unexpected EOF. Truncating and recovering from data file"}
Memory usage exceeds more than 1,6 Gb instead of ~30 Mb in normal operating mode.
Problem gone by deleting the damaged file or restarting server, but in such case damaged file continues to exist and other software, whose using this file to operate, cannot start and work.
This sutuation occurs both on Windows and Linux hosts, including 0.24.6 software version.
The text was updated successfully, but these errors were encountered: