Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timecaf disk format uses time_t (Y2038 issue for 64-bit time_t transition on 32-bit archs) #292

Open
Julien-Elie opened this issue Dec 22, 2023 · 1 comment
Labels
bug Something isn't working C: storage Related to storage methods P: medium Medium priority

Comments

@Julien-Elie
Copy link
Contributor

The CAF file header contains a time_t field (LastCleaned).
It should be switched to uint64_t so as to be independent from the size of time_t which may vary on 32-bit architectures (i386) where time_t could be 4 or 8 bytes long depending on compilation options or the kernel used. As a 64-bit time_t transition is likely to occur on 32-bit architectures so as to avoid the Y2038 issue, INN should be resilient to that change.

A tool converting the CAF files appropriately should be provided and run by innupgrade if it detects a CAF file header in an old format.

At the same time, the size_t and off_t variables in the CAF file header could also be changed to uint64_t (see #82).

As space is allotted for a maximum of 262144 (2^18) articles in current CAF files, people complain with that limit from time to time (when they inject at high speed articles in their news spool, more than 262144 articles can arrive in a time frame of 4 minutes, which leads to storage failures).
While we're at changing the CAF file header, we could increase the allotted space to handle for instance 4,194,304 (2^22) articles. Handling 64-bit time_t would then be an opportunity to improve timecaf 🙂

Note that articles are stored in files named <patharticles>/timecaf-nn/bb/aacc.CF where the arrival time is read as 0xaabbccdd only on 4 bytes (using htonl on a time_t value).
The current code seems to work fine until 2106 (and should be tested to make sure - by adding a test in the test suite), which gives enough time to eventually change the naming convention. Storage tokens should be changed at the same time. (A similar change should also be carried for the timehash storage method).
Let's focus for the time being on the CAF header.

@Julien-Elie Julien-Elie added bug Something isn't working C: storage Related to storage methods P: medium Medium priority labels Dec 22, 2023
@Julien-Elie
Copy link
Contributor Author

Speaking about the article number limit, an evolution of the current timecaf method could be the addition of storage.conf options to parameterize it. For instance with maxart and maxtime options to specify the number of articles per CAF file and the number of seconds before creating a new CAF file (which could for instance be a multiple of 256 seconds so as to keep the current file naming scheme). If maxtime is set to 0, a new file is created when maxart articles have been received.

The CAF file could also be dynamically extended if maxtime is strictly positive, so as to keep receiving articles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working C: storage Related to storage methods P: medium Medium priority
Development

No branches or pull requests

1 participant