Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to decode my known-good test file #2

Closed
ssokolow opened this issue Aug 11, 2022 · 7 comments
Closed

Failed to decode my known-good test file #2

ssokolow opened this issue Aug 11, 2022 · 7 comments

Comments

@ssokolow
Copy link

ssokolow commented Aug 11, 2022

This minimal test program...

use std::path::PathBuf;

use binhex4::decode::hexbin;
use gumdrop::Options;

#[derive(Options)]
struct Opts {
    #[options(free, help = "BinHex files to attempt to decode")]
    input: Vec<PathBuf>,

    #[options(help = "print help message")]
    help: bool,
}

fn main() {
    let opts = Opts::parse_args_default_or_exit();

    for path in opts.input {
        let bytes = match std::fs::read(&path) {
            Ok(x) => x,
            Err(e) => {
                println!("Failed to read {}: {}", path.display(), e);
                continue;
            }
        };
        hexbin(&bytes, true).unwrap();
    }
}

...reports BadFormat...

ssokolow@monolith binhex_test [master] % cargo run testfile.txt.good.hqx
    Finished dev [optimized] target(s) in 0.02s
     Running `target/debug/binhex_test testfile.txt.good.hqx`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: BadFormat', src/main.rs:26:30
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

...when fed this minimal "known good case" test file that Python's standard library binhex module has no problem with and uudeview produces expected output from:

(This file must be converted; you knew that already.)

:$(4PFh4QD@aP,R4iG!"849K868&$33#3"3`!!!!!RTK8CA0dD@jR)$%b-`S6a3!
!:

(Yeah, you don't need to binhex text files, but I built the test file from testfile.txt because it makes it easier to quickly check for problems using less, and because text files have no internal checksums that a tool might leverage to cover up its lack of support for checking BinHex4's own CRC.)

@ssokolow
Copy link
Author

ssokolow commented Aug 11, 2022

Hmm.

  • It works with something straight out of Python's binhex.binhex("Cargo.toml", "Cargo.toml.hqx")
  • It's not the UNIX vs. Mac line endings (My test file has UNIX \n line endings and binhex.binhex produces output with Mac \r line endings.)
  • It's not the CRC unless the bug is that changing hexbin(&bytes, true).unwrap(); to hexbin(&bytes, false).unwrap(); has no effect.

I find it quite amusing that the very first test file I fed through it appears to be an edge case that I can't figure out how I created in the first place.

Maybe I'll find time to set up gdbgui some time in the next couple of days and step through it to see what's going on... or maybe I'll just set up the differential fuzzing against the Python implementation that I was planning to do anyway (using PyO3 to get the Python and Rust in the same process, if you want to leapfrog me on that) and see if that removes the need for gdbgui to figure out where they diverge.

@nramos0
Copy link
Owner

nramos0 commented Aug 11, 2022

Thanks for bringing up this issue! I think I know the issue here, binhex4 doesn't like anything that doesn't contain a string starting with "(This file must be converted with BinHex". The general parsing format is

*
BINHEX_PROMPT_PREFIX
*
:DATA:
*

where BINHEX_PROMPT_PREFIX is "(This file must be converted with BinHex" and DATA is any number of lines of binhex data. I interpreted the spec you posted to mean that this string prefix is a hard requirement, but if it is true that like in this case many files don't comply with that, I could remove this requirement when parsing.

@nramos0
Copy link
Owner

nramos0 commented Aug 11, 2022

I can verify that your file correctly decodes to

Testing 123

when I change it to

(This file must be converted with BinHex 4.0)

:$(4PFh4QD@aP,R4iG!"849K868&$33#3"3`!!!!!RTK8CA0dD@jR)$%b-`S6a3!
!:

@ssokolow
Copy link
Author

ssokolow commented Aug 11, 2022

They don't appear to be common, but they exist (a quick search through the .hqx files I happen to have sitting on my hard drive revealed three out of 836 lacking the substring BinHex) and, given the context in which such things were used, you generally want to be lenient in your parsing.

BinHex was basically a Macintosh counterpart to things like uuencode and, depending on the nature of your early Usenet client, it'd be easy for someone to assume that message isn't significant to the parsing. You might literally be copy-pasting the non-human-readable portion of the BinHex file into a message after you wrote something like "Here's the .hqx".

(It's reminiscent of how early file transfers over serial lines and modems worked, with the receiver running something like XModem and the sender then running it, and, as far as the link was concerned, the human on one end just started typing really fast, ending with the program on the receiving end exiting on its own. Very ad-hoc, lenient, and lacking formal framing.)

EDIT: Watch Cathode Ray Dude's AT&T's '60s Modem That Won't Die if you want an interesting exploration of how ad-hoc it began.

@nramos0
Copy link
Owner

nramos0 commented Aug 11, 2022

I'd be interested in seeing the result of the fuzzing! If you have a python fuzzing implementation setup, I could also look at integrating it with this package / if you wanted to do that that would be appreciated. As for the parsing, thanks for the context, I will remove the requirement and just read it like *:data:*

@ssokolow
Copy link
Author

Barring unanticipated show-stoppers, it'll be a Rust fuzzing system which binds to libpython via PyO3 to embed a Python runtime, since I prefer the experience of tools like cargo-fuzz to what's available on the Python side, but sure.

I may get to it today or it may not. I'm trying to get my sleep cycle in order and it'll depend on how much I get to before I feel sleepy.

@nramos0
Copy link
Owner

nramos0 commented Aug 11, 2022

Sounds good, no rush. I'll close this issue for now since I've just fixed this issue and updated it to v0.1.2. let me know if there are any other issues you run into with the library

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants