Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic on attempt to serialize a pre-epoch timestamp received from an ext3 filesystem #464

Open
ssokolow opened this Issue Jul 17, 2018 · 7 comments

Comments

3 participants
@ssokolow
Copy link

ssokolow commented Jul 17, 2018

While working on a tool to index files for one of my projects, I wound up getting a panic during the call to serde_json:

thread 'main' panicked at 'SystemTime must be later than UNIX_EPOCH: SystemTimeError(Duration { secs: 1, nanos: 0 })', libcore/result.rs:945:5

I tracked it down to a specific file in my home media server. Apparently, the combination of Linux and the ext3 filesystem is perfectly capable of representing at least one date before the UNIX epoch.

  • ls -lh: Dec 31 1969
  • Python's os.stat: st_mtime=-1
  • Rust's std::fs::metadata: modified: Ok(SystemTime { tv_sec: -1, tv_nsec: 0 })

...and the irony is, SystemTime is the one piece of the Rust-provided metadata that I felt was already in a format suitable for a generic index file to be shared between programs written in different languages.

Well... that and it was a hassle to track down because Rust itself didn't complain and the panic message during serialization wouldn't tell me which file of the hundreds of thousands was causing it to die. For lack of a purpose-built tool, I had to manually bisect it until I narrowed it down.

@ssokolow

This comment has been minimized.

Copy link
Author

ssokolow commented Jul 17, 2018

...and I found another one:

ssokolow@monolith rdbackup-excludes-indexer [master] % target/nightly/i686-unknown-linux-musl/release/rdbackup_exclude_indexer /srv/DVD-bound/ 2>&1 | grep 361289696
/srv/DVD-bound/Games/Ripped_CDs/Floppies Backed up using ddrescue/Microsoft® Mouse Setup v9.01 (potential data loss)/MOUSE901/SETUP.INI: Some(SystemTime { tv_sec: -361289696, tv_nsec: 0 })
thread 'main' panicked at 'SystemTime must be later than UNIX_EPOCH: SystemTimeError(Duration { secs: 361289696, nanos: 0 })', libcore/result.rs:945:5

(Yes, my folder structure for stuff bound for archival DVDs is an accreted mess that needs to be refactored.)

UPDATE: It appears that this example only fails because of integer overflow on i686-unknown-linux-musl. On x86_64-unknown-linux-gnu, it works perfectly well and ls -lh reports the mtime as Aug 26 2094.

@lambda

This comment has been minimized.

Copy link

lambda commented Jul 18, 2018

Minimal repro:

use std::time::Duration;

use serde_json;

fn main () {
    let t = std::time::UNIX_EPOCH - Duration::new(1, 0);
    println!("{}", serde_json::to_string(&t).unwrap());
}

The relevant portion of the backtrace:

thread 'main' panicked at 'SystemTime must be later than UNIX_EPOCH: SystemTimeError(1s)', libcore/result.rs:945:5
...
  10: serde::ser::impls::<impl serde::ser::Serialize for std::time::SystemTime>::serialize
             at /Users/lambda/.cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.70/src/ser/impls.rs:562
  11: serde_json::ser::to_writer
             at /Users/lambda/.cargo/registry/src/github.com-1ecc6299db9ec823/serde_json-1.0.24/src/ser.rs:1991
  12: serde_json::ser::to_vec
             at /Users/lambda/.cargo/registry/src/github.com-1ecc6299db9ec823/serde_json-1.0.24/src/ser.rs:2025
  13: serde_json::ser::to_string
             at /Users/lambda/.cargo/registry/src/github.com-1ecc6299db9ec823/serde_json-1.0.24/src/ser.rs:2056
  14: rust2018::main
             at src/main.rs:9

The code in question:

#[cfg(feature = "std")]
impl Serialize for SystemTime {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        use super::SerializeStruct;
        let duration_since_epoch = self
            .duration_since(UNIX_EPOCH)
            .expect("SystemTime must be later than UNIX_EPOCH");
        let mut state = try!(serializer.serialize_struct("SystemTime", 2));
        try!(state.serialize_field("secs_since_epoch", &duration_since_epoch.as_secs()));
        try!(state.serialize_field("nanos_since_epoch", &duration_since_epoch.subsec_nanos()));
        state.end()
    }
}

The issue is, there's no official way to get the underlying value out of SystemTime; you can only use duration_since(UNIX_EPOCH), but a duration stores a u64 number of seconds, it can't represent a negative value.

This should probably be filed as an issue against the standard library; there needs to be some way to extract the underlying value of SystemTime, even if it's platform-dependent, as otherwise you can't losslessly serialize and deserialize them.

@ssokolow

This comment has been minimized.

Copy link
Author

ssokolow commented Jul 19, 2018

It should still be possible to work around it. I'll try to verify it later today, but, as I remember, the let duration_since_epoch = self.duration_since(UNIX_EPOCH) does provide you with the information necessary to serialize it reliably.

Specifically, the Err variant of the Result has a .duration() function which will tell you how far before the epoch your SystemTime was.

That's how an mtime of -1 gets exposed in the panic message as SystemTimeError(Duration { secs: 1, nanos: 0 }).

Serializing secs_since_epoch as an i128 with the Err case being 0 - duration_since_epoch.as_secs() should do the trick.

(In other words, the output is semantically equivalent to a { sign: bool, magnitude: Duration } pair.)

That said, definitely a footgun in the standard library to be remedied.

@lambda

This comment has been minimized.

Copy link

lambda commented Jul 19, 2018

Yep, I figured that out after writing the previous comment, and started working on a patch to serde to fix it.

My main concern is that i128 was only fairly recently stabilized, so would require bumping the minimum required version of the compiler. Also, even if you made the implementation conditional on i128 support, it would be a breaking change for Serializer and Deserializer implementations that don't yet have i128 support.

So I think you'd have to do it without i128 for secs_since_epoch, which makes it more complicated. On Unix like platforms and Redox, it's at most an i64 number of seconds offset from the epoch. On Windows, it's a u64 number of hundred nanoseconds since January 1, 1601, which has a strictly lower range than an i64number of seconds, so that should be OK. But on WASM, they just use a Duration to represent the time since the epoch, so you actually could have the full u64 range that you need to be able to represent.

I think that you'll just have to use an i64 here, and consider the loss of one bit of range on the top end to be preferable to not being able to represent negative values, which are likely to be more common.

@ssokolow

This comment has been minimized.

Copy link
Author

ssokolow commented Jul 19, 2018

Hmm. Good point.

I have no problem with being clamped to a little over 292 billion years from the epoch for positive range, but I really would like to avoid this being another non-obvious location where someone who wants high reliability has to guard against failure.

Would it be a breaking change to switch to i128 down the road, once it's better supported?

@lambda

This comment has been minimized.

Copy link

lambda commented Jul 19, 2018

Would it be a breaking change to switch to i128 down the road, once it's better supported?

I think it will always be a breaking change, as some formats may never support i128.

I also think that it might be a breaking change to change the type at all, since the Deserialize impl parses into a u64, so if you changed it to serialize an i64, then programs using the older Deserialize impl wouldn't be able to parse it.

On the other hand, it might be argued that you're replacing a very likely panic on serialization with a less likely properly returned error on deserialization only in the case of two different programs using different versions of Serde. I don't know if Serde has a stability policy for the types used for serializing std types that would cover this.

Of course, one alternative is that you could just use a newtype wrapper and implement Serialize and Deserialize yourself.

@ssokolow

This comment has been minimized.

Copy link
Author

ssokolow commented Jul 20, 2018

My main concern here is getting rid of the footgun if at all possible. I really don't want to have to maintain a special "Never allow these types to creep into structs I'm deriving Serialize/Deserialize on, because the compiler certainly won't warn you" audit list.

Of course, one alternative is that you could just use a newtype wrapper and implement Serialize and Deserialize yourself.

I've already spent more time trying to get comfortable with the Serde docs for customizing deserialization of complex data than on the rest of the project put together, so I think I'll just go for pragmatism over perfection, spend 5 minutes to manually convert SystemTime instances into a custom "signed Duration" struct , and derive Serialize on that instead.

Then I can move on to hacking up a quick implementation of my "Use \u0000 as the prefix for hexadecimal byte escapes" solution for getting serde_json to accept the full range of paths actually present on my filesystem and finally have an initial dogfoodable version, days after I expected it.

@dtolnay dtolnay added the bug label Nov 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.