Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of try_into removes any compile-time benefit over runtime validation libs #3

Closed
jmetz opened this issue Feb 12, 2024 · 13 comments
Closed

Comments

@jmetz
Copy link

jmetz commented Feb 12, 2024

Eg:

    let bad_rdf = Rdf {
        format_version: Version {
            major: 1,
            minor: 2,
            patch: 3,
        },
        description: "".try_into().unwrap(),
        name: "".try_into().unwrap(),

        attachments: None,
...

As in your test, modified with bad string values... But this passes the compiler fine as it's the unwrap that will fail at runtime.

Eg a test program using this gives:

warning: `bioimg_spec` (lib) generated 2 warnings (run `cargo fix --lib -p bioimg_spec` to apply 1 suggestion)
   Compiling third_party v0.1.0 (/home/jeremy/documents/kth/bioimg_rs/third_party)
    Finished dev [unoptimized + debuginfo] target(s) in 1.54s
     Running `target/debug/third_party`
I'm a Rust coder hard coding an RDF...
thread 'main' panicked at third_party/src/main.rs:13:36:
called `Result::unwrap()` on an `Err` value: BadLength { value: "", allowed: 1..=1024 }
@jmetz
Copy link
Author

jmetz commented Feb 12, 2024

so this is no different to eg using the validator crate and calling validate()

@jmetz
Copy link
Author

jmetz commented Feb 12, 2024

PS Here's the full small test:

use bioimg_spec::rdf::{Rdf, Version, author::Author, badge::Badge, cite_entry::CiteEntry, SpdxLicense}; 
use url::Url;

fn main() {
    println!("I'm a Rust coder hard coding an RDF...");

    let bad_rdf = Rdf {
        format_version: Version {
            major: 1,
            minor: 2,
            patch: 3,
        },
        description: "".try_into().unwrap(),
        name: "".try_into().unwrap(),

        attachments: None,
        authors: Some(vec![Author {
            name: "John Doe".try_into().unwrap(),
            affiliation: "Some University".try_into().unwrap(),
            email: "john.doe@some_university.com".try_into().unwrap(),
            github_user: "john_doe".try_into().unwrap(),
            orcid: "0000-0002-8205-121X".to_owned().try_into().unwrap(), //FIXME
        }]),
        badges: Some(vec![Badge {
            label: "x".try_into().unwrap(),
            icon: Url::parse("http://some.icon/bla").unwrap().into(),
            url: Url::parse("http://some.url/to/icon").unwrap().into(),
        }]),
        cite: Some(vec![CiteEntry {
            text: "Plz cite eme".try_into().unwrap(),
            doi: "blabla".try_into().unwrap(),
            url: Url::parse("https://blas/bla").unwrap(),
        }]),
        covers: None,
        documentation: Some(Url::parse("http://example.com/docs").unwrap().into()),
        download_url: Some(Url::parse("http://blas.blus/blis").unwrap().into()),
        git_repo: Some(Url::parse("https://github.com/blas/blus").unwrap()),
        icon: Some("x".try_into().unwrap()),
        id: Some("some_id_goes_here".try_into().unwrap()),
        license: Some(SpdxLicense::Adobe_Utopia),
        links: Some(vec![]),
        maintainers: Some(vec![]),
        rdf_source: None,
        source: None,
        tags: None,
        version: Some(Version {
            major: 4,
            minor: 5,
            patch: 6,
        }),
    };

    println!("{:?}", bad_rdf);
    
}

@Tomaz-Vieira
Copy link
Collaborator

It's not try_into that "removes any compile-time benefit". It's unwrap(), and using it outside of tests is usually a programming error.

Using .unwrap() just tells the compiler that you know better, and that if you don't, you accept that your program will crash. Either way, the program will not continue execution past that point if that value is not Ok. The program will continue if you fail to call .validate(), though.

It's fine to use .unwrap() in the tests, because if a test crashes then that's a failing test, that must be fixed.
It's not fine to use .unwrap() in production code, because it's dodging the responsibility of handling a possible Err.

@jmetz
Copy link
Author

jmetz commented Feb 12, 2024

Hm no it is the try_into I think - because that can resolve to a result or an err - both are technically fine as far as the compiler is concerned, to allow for error propagation etc.

As far as I can see, the only way around this is if I could do something like:

     ...
     description: BoundedString::<1, 1023> (...) 
     ...

...but that defeats the object I think!

@jmetz
Copy link
Author

jmetz commented Feb 12, 2024

so at the end of the day, there's no benefit compared with validator et al - or?

@jmetz
Copy link
Author

jmetz commented Feb 12, 2024

If you can write code with your current lib that causes a compiler error because of entering eg a 0-length description please share - I probably missed something then.

@Tomaz-Vieira
Copy link
Collaborator

Hm no it is the try_into I think - because that can resolve to a result or an err - both are technically fine as far as the compiler is concerned

They are not both technically fine when assigning to Rdf.description; That field wants a BoundedString, not a Result, and the compiler will make sure that that is the case:

  • .try_into() produces a Result<BoundedString, BoundedStringError>.
  • The Rdf struct expects a BoundedString, not a Result<BoundedString, boundedStringError>. You can't make such an assignment without getting a compilation error.
  • I'm using .unwrap() in the test code to convert from a Result<BoundedString, BoundedStringError> to a plain BoundedString or crash otherwise.

if you can write code with your current lib that causes a compiler error

it's causing a runtime error, not a compiler error. And it is doing so because of the unwrap, just like the error message says:

"Result::unwrap()" on an "Err" value: BadLength { value: "", allowed: 1..=1024 }

@jmetz
Copy link
Author

jmetz commented Feb 12, 2024

I'm not saying that you can just remove unwrap - as I'm sure we both know the options are unwrap expect or ? if the outer scope also returns a result...

But the point is still valid that this is all runtime, unless I misunderstood - please let me know if that's not the case - and show me an example.

Or we can keep quibbling about exactly what's going on with the above... 🫣

@Tomaz-Vieira
Copy link
Collaborator

Tomaz-Vieira commented Feb 12, 2024

We might be talking past each other on the meaning of "runtime" 😛

Sure, all checks are done in runtime; There is no way they could be done in compile-time, since the input is not available during compile-time.

What I mean is that if we go the .validate() route, it is possible that we, say, create a User struct that is not valid then forget to call validate on it, and then the program continues executing, assuming that it has a valid User when it doesn't. The compiler can't protect us there, because we've hidden the meaning of "valid" from it, (because you need a User struct to begin with, in order to be able to call .validate() on it.

On the other hand, if we make it so that validation returns a Result<User, SomeError>, then we can make sure that, wherever a User is present in code, that that struct is valid; there is no way to forget calling .validate(), and even forcing things with .unwrap() will cause the program to crash instead of continuing to run with a bad value. The invariant of the User struct is always maintained.

In this second approach, the only way you could possibly get a hold of a User struct is through successful validation.

Here's an example using the .validate style, and the problems that might arise:

pub struct User{
  #[validate(minimum = 0)]
  #[validate(maximum = 120)] // users can be at most 120 years old
  age: u8,
}

fn save_user_to_disk(user: User){
  // is `user` valid? Should I validate again? is the caller of this function expecting me to validate it?
}

fn i_do_not_validate(){
  save_user_to_disk(User{age: 200})
}

fn i_validate(){
  let user = User{age: 200});
  if user.validate(){
    ...
  }
  save_user_to_disk(User{age: 200})
}

And a TryFrom example:

struct RawUser{
  age: u8,
}

mod user{
  struct User{
    age: u8
  }

  impl TryFrom<RawUser> for User{
    type Error = String
    fn try_from(raw: RawUser) -> Self{
      if raw.age > 120{
        return Err("Too old".into())
      }
      Ok(Self{age: raw.age})
    }
  }
}



fn save_user_to_disk(user: User){
  // now here I'm absolutely sure that User is valid;
  // there is no way the program could have executed up to this point with an invalid User
}

fn i_do_not_validate(){
   //this is now a compiler error. 
  // now we can only get a user via try_from
  save_user_to_disk(User{age: 200})
}

fn use_a_user() -> Result<...>{
  let user = User::try_from(RawUser{age: 200})?; //must handle/propagate error here
  save_user_to_disk(user) //this line can only be reached if user is valid
}

@jmetz
Copy link
Author

jmetz commented Feb 12, 2024

We could get the same effect with a new method for User that calls validate and returns Ok(User) or the validation error then?

But yes, I can see that we can't just instantiate User using normal struct creation syntax then 🤷

@jmetz
Copy link
Author

jmetz commented Feb 12, 2024

... Also in your TryFrom example - couldn't we still use validate with RawUser and just call validate in the TryFrom impl?

That way we still use all the stuff from the validation crate, while ensuring that the User is validated.

I think this or the new approach above both provide good solutions to this anyway.

@Tomaz-Vieira
Copy link
Collaborator

We could get the same effect with a new method for User that calls validate and returns Ok(User) or the validation error then?

Yes =) That's what I was talking about here

Whether User is instantiated inside TryFrom or in new doesn't matter too much; what matters is that it is always instantiated in a valid state.

That way we still use all the stuff from the validation crate, while ensuring that the User is validated.

True, but the problem runs deeper, and recursively so. Going back to out User example, here is what it could look like:

struct User{
  email: String,
}

And say that we do the new or TryInto strategy that we talked about, so that we instantiate a User with a valid email, that is checked via #[validate = email]. Now the problem is that the type of email is still a String, meaning that any method of User that wants to modify email must first remember to validate that string and then set self.email to that validated string. So in the end, we should probably have an Email struct, with its corresponding TryFrom<String>, so that we always know for sure that an Email instance is valid, and the compiler can keep track of that for us. And this goes all the way down for every field. In fact, User.age should probably be an Age instance, instead of a simple u8, so that it too can hold on to the invariant of being less than 120.

So the thing I'm getting at here is that if validation is also what instantiates, then the compiler can keep track of validation. If validation only checks a raw value, then the compiler can't help us; we have to track the validity of those fields ourselves. And it seems to me that these validation crates do the latter rather than the former.

And there is also the problem of cross-field validation, which would probably still need to happen inside the User factory methods rather than via the validation attributes, and I don't think spreading the validation logic around the code like that would make it very readable.

@jmetz
Copy link
Author

jmetz commented Feb 13, 2024

👍 In that case let's close / mark as wontfix for now, and we can always reference this or reopen if it gets reconsidered,

@jmetz jmetz closed this as not planned Won't fix, can't repro, duplicate, stale Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants