Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R crashes while reading an fst file #271

Open
sanjmeh opened this issue Nov 7, 2022 · 15 comments
Open

R crashes while reading an fst file #271

sanjmeh opened this issue Nov 7, 2022 · 15 comments
Assignees
Labels
Milestone

Comments

@sanjmeh
Copy link

sanjmeh commented Nov 7, 2022

A simple fst read can send R crashing down, if the file is corrupted !

How could a data file be so bad that it sends R crashing? Perhaps the fst read function has some aggressive memory management that interferes with the OS.

To replicate, just executing a simple

fst(filename)

And you will get:

<fst file>
323140 rows, 4 columns (1204011660.fst)

And then a series of error messages, followed by R crashing.

[2706278:2706278:20221107,172349.750548:ERROR process_memory_range.cc:86] read out of range
[2706278:2706278:20221107,172349.750641:ERROR elf_image_reader.cc:558] missing nul-terminator
[2706278:2706278:20221107,172349.750779:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.754375:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.754446:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.754496:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.754544:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.754599:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.754816:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755118:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755175:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755228:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755292:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755729:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755814:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755867:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755921:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755983:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756097:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756154:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756204:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756255:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756320:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756367:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756419:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756469:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756521:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756573:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756619:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756669:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756716:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756769:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756819:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756873:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756923:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756976:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757028:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757079:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757193:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757244:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757325:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757375:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757425:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757472:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757521:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757578:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757630:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757683:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757733:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757785:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757837:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757893:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758081:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758130:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758180:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758228:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758311:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758359:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758401:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758456:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758506:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758557:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758610:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758658:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758721:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758765:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758840:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758917:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758996:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.759050:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.759100:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.759149:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.759200:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.759249:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.760165:ERROR file_io_posix.cc:140] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq: No such file or directory (2)
[2706278:2706278:20221107,172349.760187:ERROR file_io_posix.cc:140] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq: No such file or directory (2)
[2706278:2706280:20221107,172349.766128:ERROR directory_reader_posix.cc:42] opendir: No such file or directory (2)

I have uploaded the offending file here.
https://drive.google.com/file/d/1hYJLAcqct_5JxTNNXN1c-qKH9bWFhgmO/view?usp=sharing

@eddelbuettel
Copy link

eddelbuettel commented Nov 15, 2022

Can you please try to turn this into self-contained reproducible example with a script creating a file which subsequently crashes R?

No sane person will read a random binary file off the internet.

@sanjmeh
Copy link
Author

sanjmeh commented Nov 16, 2022

@eddelbuettel : thanks for looking at this.
Generating the corrupted file with a script looks very difficult because currently there are thousands of fst files created/ overwritten through a crontab scheduler that runs every minute (IOT data keeps coming from thousands of vehicles and we store their tracking, and fuel level data in fst files). A corruption happens in around 1 in thousand writes, and we donot (yet) know how that corruption happens. It is a random event. I suspected the multi core read/write of fst was creating impossible memory allocations but that was just a hunch. I really donot know how to recreate the corruptions using a script.

To prevent multicore I have also added the folowing two lines, as recommended in one of the github issue threads.

fst::threads_fst(nr_of_threads = 1)
fst::threads_fst(reset_after_fork = F)

But I still regularly get the corruptions and the resultant crashes.

@MarcusKlik
Copy link
Collaborator

Hi @sanjmeh, thanks for reporting. And I will definitely adhere to @eddelbuettel's warning to not try to load your binary file :-)

In the fst format, all meta-data is hashed. So if this data becomes corrupted for some reason, it's extremely unlikely that the file will read without throwing a (friendly) error. Obviously, a maleficent agent could alter the metadata and the stored hashes to overcome this problem and mess up a file read.

The metadata determines how much memory is allocated for storing the result table. However, the actual column data is decompressed from data blocks in the file using zstd or lz4 decompression. In rare cases, malformed data blocks can cause a crash in those libraries during this decompression phase.

To remedy, we could use safe versions of the lz4 and zstd decompression functions, but these will destroy the performance.

Alternatively, fst could provide an option to hash the datablock as well (something like write_fst(x, path, hash_data = TRUE)). For these hashed files, reads could be done using read_fst(path, check_hashes = TRUE)) for example.

This will have a smaller impact on performance and could be used for files read from internet or other suspicious sources (and would need to be done only once after downloading).

@MarcusKlik MarcusKlik self-assigned this Nov 16, 2022
@MarcusKlik MarcusKlik added the bug label Nov 16, 2022
@MarcusKlik MarcusKlik added this to the Candidate milestone Nov 16, 2022
@sanjmeh
Copy link
Author

sanjmeh commented Nov 16, 2022

Thank you @MarcusKlik and welcome back to your own repository. That was indeed a long break and I was afraid if you would be back soon.
Now on your suggested path:

Alternatively, fst could provide an option to hash the datablock as well (something like write_fst(x, path, hash_data = > TRUE)). For these hashed files, reads could be done using read_fst(path, check_hashes = TRUE)) for example.

I donot see the hash_data argument in write.fst()..
I presume you are proposing this functionality and it is not existing in the current version - the feature to hash data.

Meanwhile I will test the first alternative:

To remedy, we could use safe versions of the lz4 and zstd decompression functions, but these will destroy the performance.

If you may please specify how to try the safe options, it will be helpful, as I cannot locate the arguments till now.

By the way can I request you to have a look at the fst file I attached and not treat it as any random binary file from the internet. I am here to claim that it is originating from my system, and not from the internet :-)

@eddelbuettel
Copy link

@sanjmeh As another open-source volunteer I am am a little surprised by your tone. We give you our labor for free.

@sanjmeh
Copy link
Author

sanjmeh commented Nov 16, 2022

Oh my! my intention is not at all to offend you guys. You are doing a fantastic job in the open source community of R, and so would never want to turn you away. I hope I am making the fst package more popular by asking to make it more robust. Let me know what was hurtful. thanks.

@MarcusKlik
Copy link
Collaborator

Yes, unfortunately time is a scarce resource that can only be spent once (except for @eddelbuettel, my theory is that Dirk is somehow able to clone himself into identical copies that can do work in parallel, proof pending...) :-)

About your file @sanjmeh, I will scan the metadata from a container and take a look where things go wrong.

@sanjmeh
Copy link
Author

sanjmeh commented Dec 27, 2022

Hi Marcus, any progress on the bug?

@jfdesomzee
Copy link

Hello,

I'm suffering from this bug too. Never had an issue before it appears when multiple machines started to write files on the shared drive.
Is there a way to test the file before trying to load it? Whenever I read a corrupted file I R crashes if I could get an error instead my problems would be solved.
fst rocks I want to keep using it. Please help. And thank you for the good job

@AntonWijbenga
Copy link

I have previously encoutered the error as well and today again. I suspect the .fst file becomes corrupt during a 'forced' system reboot on a Windows machine (which is a secondary solution on premise, primary/production is running in the cloud on Ubuntu).

I can read the metadata of the .fst file fine, but reading the whole file causes R to crash. I would be great if somehow this just results in an error instead of crashing R. I'm happy to provide the .fst file if needed for testing.

Otherwise the fst package is great and so far I haven't encountered a better alternative (except for maybe parquet because of cross languate (i.e. Python) support).

@jfdesomzee
Copy link

I switched from fst to qs. About the same perfomances, a bit faster. Only you need to read the whole file you cannot query rows or columns. But you can store any R object and store attributes.

@sanjmeh
Copy link
Author

sanjmeh commented Feb 9, 2024

I switched from fst to qs. About the same perfomances, a bit faster. Only you need to read the whole file you cannot query rows or columns. But you can store any R object and store attributes.

And what is its advantage over RDS files?

@eddelbuettel
Copy link

@sanjmeh Start here: https://github.com/traversc/qs

qs and fst are both very good and improve over rds files which themselves are good and portable across R installations.

@AntonWijbenga
Copy link

AntonWijbenga commented Feb 12, 2024

Thank you for the tip. However, the ability to read only certain rows or columns is one of the main reasons I use the fst package.

I have matrices with measurements for each minute for a certain number of sensors. As a result I have matrices that are 1.440 (number of minutes in a day) x 18.000 or 80.000 (depending on the sensor type). Using these daily matrices and their pivoted clones, I can very quickly read just one minute of a specific day (the date is the filename, minute the n.th column) or read the 24 hour series of a sensor (again the date is the filename and the column name the ID of the sensor).

Reading such a column (or a set of hem) only takes a few milliseconds. Reading an entire year of a couple of sensor data (using their ID's) is done in a couple of seconds. It is very quick to create certain aggregates (over time) that way.

The same is true for reading several minute data for all sensors. For example, you can very quickly calculate a typical (average) value for a tuesday 11:00 based on a set of previous tuesdays (also 11:00).

The entire dataset is historically available from 2018 and is still updated every minute. It is about 500GB (compressed) and stored on SSD based storage (FSx for Lustre at AWS). Results are presented through a dashboard.

For these purposes it is simply way too slow to read the whole matrix every time. With the solution above, I can read in the 'sensor' dimension and 'time' dimension very quickly no matter if it is about recent or older data (no caching needed). I have also tested databases, but they are either too slow or too costly.

@sanjmeh
Copy link
Author

sanjmeh commented Feb 12, 2024

I have matrices with measurements for each minute for a certain number of sensors. As a result I have matrices that are 1.440 (number of minutes in a day) x 18.000 or 80.000 (depending on the sensor type). Using these daily matrices and their pivoted clones, I can very quickly read just one minute of a specific day (the date is the filename, minute the n.th column) or read the 24 hour series of a sensor (again the date is the filename and the column name the ID of the sensor).

I have exactly the same application and we also started with fst package for exactly this reason. But I now have moved to mariadb due to this occassional corruption of the fst file. We use RDS for data upto 100 MB and move the data to RDBMS with primary index as time stamp so can quickly query a specific time range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants