Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored argument parsing #170

Merged
merged 4 commits into from
Aug 16, 2017
Merged

Refactored argument parsing #170

merged 4 commits into from
Aug 16, 2017

Conversation

glandium
Copy link
Collaborator

@glandium glandium commented Aug 9, 2017

See the descriptive commit messages. I verified that this fixes the issues with openvr on OSX with the new SDK (with @luser's patch from bug 1324892).

Unfortunately, this doesn't fix the large number of warnings, and it looks like the only solution for those is to build from the actual source instead of from the preprocessed output :(

This actually makes the refactor largely unnecessary, because fixing that will make distinguishing between preprocessor and other args much less useful. I do think it makes things clearer, though, so I didn't scrape it all.

@glandium
Copy link
Collaborator Author

glandium commented Aug 9, 2017

A try build with the patch from bug 1324892 and using this version of sccache:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=99994c55212238ed5a49753a35c465ce7ba31a16

@luser
Copy link
Contributor

luser commented Aug 10, 2017

Unfortunately, this doesn't fix the large number of warnings, and it looks like the only solution for those is to build from the actual source instead of from the preprocessed output :(

That's a bummer. :-(

}
}

pub fn to_str(&self) -> Option<&'static str> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably more conventional to name this as_str, since it's not allocating anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, as_str doesn't usually return an Option (never in libstd, at least)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, as a data point OsStr::to_str doesn't allocate on unix (and returns an Option<&str>)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then it seems fine as it is.

Copy link
Contributor

@luser luser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's a lot of new code! Overall this looks great, even considering the complexity of the new code. (It would be nice if we didn't have to have a binary search implementation here, but I understand why you do given the constraints.)

I bet some of this could be done in a nicer way with a procedural macro, that might be interesting to experiment with in advance of them stabilizing.

The only thing I'm really hung up on here is your use statements for enum variant names. That's not something I've seen used in all the Rust code I've read, and I think the namespace pollution isn't worth it. If you want it because it makes writing the argument arrays easier then maybe a macro would smooth that over?

Other than that just some minor things that could be tweaked.

/// Transforms a parsed argument into an iterator.
impl IntoIterator for Argument {
type Item = OsString;
type IntoIter = std::vec::IntoIter<OsString>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a little unfortunate to allocate a Vec every time into_iter gets invoked for any Argument. Unfortunately without having generators in Rust it gets a bit verbose to write this in a simpler way. This was the best I could come up with:
https://play.rust-lang.org/?gist=148372bc1c25ad8174cb8cf74009f8ab&version=stable


/// How a value is passed to an argument with a value.
#[derive(PartialEq, Clone, Debug)]
pub enum ArgDisposition<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be generic over T? Do you use it with anything other than Delimiter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes normalize() non awkward to call.

Concatenated(T),
}

pub use self::ArgDisposition::*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty unconventional Rust style. Is this just to shorten typing for the code in the compiler implementations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? That's technically how you get Ok(), Err(), Some() and None for Result and Option...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find a strong guideline about it anywhere, but clippy has a lint for glob importing enum variants.

#[derive(PartialEq, Clone, Debug)]
pub enum ArgumentValue {
StringVal(OsString),
PathVal(PathBuf),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably leave off the Val suffix on these names, since you would generally write them as ArgumentValue::StringVal anyway.

pub fn unwrap_path(self) -> PathBuf {
match self {
PathVal(p) => p,
StringVal(_) => panic!("Can't unwrap_path an ArgumentValue::StringVal"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to just do PathBuf::from(s) here? AFAIK that doesn't allocate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that you only get a Path if the option was meant to be one.

type Info = T;

fn search(&self, key: &str) -> Option<&Self::Info> {
match (self.0.search(key), self.1.search(key)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it matters in practice, but this will wind up searching both arrays every time, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and there's no way around that.

// Refuse to cache arguments such as "-include@foo" because they're a
// mess. See https://github.com/mozilla/sccache/issues/150#issuecomment-318586953
if let Argument::WithValue(_, ref v, CanBeSeparated(_)) = item.arg {
if OsString::from(v.clone()).to_string_lossy().starts_with("@") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use OsStrEx::starts_with here as well.

return CompilerArguments::CannotCache("@");
}
}
match item.data {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how this code worked out!

None => {
match item.arg {
Argument::Raw(ref val) => {
if input_arg.is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this regress your change to make linking invocations produce NotCompilation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh, yes (why didn't this comment appear on the main PR page?)

for item in ArgsIter::new(it, &ARGS[..]) {
match item.data {
Some(TooHard) => {
return CompilerArguments::CannotCache(item.arg.to_str().expect(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make TooHard contain a &'static str and just return that in the CannotCache, but it's not particularly pressing.

@@ -77,6 +77,19 @@ impl ParsedArguments {
}
}

pub fn normalized_extension(file: &Path) -> Option<&'static str> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had actually mentioned this on a different PR a while ago, but it might be nicer to have an enum of C dialects to use here.

@@ -276,6 +291,7 @@ pub fn hash_key(compiler_digest: &str,
let mut m = Digest::new();
m.update(compiler_digest.as_bytes());
m.update(CACHE_VERSION);
m.update(extension.as_bytes());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you should rev CACHE_VERSION here, since you're changing the hash inputs.

// Set extension to one matching the language set with -x
let lang = item.arg.get_value().map(OsString::from);
let lang = lang.as_ref().map(|a| a.to_string_lossy());
extension = match lang.as_ref().map(|a| a.as_ref()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this wants an ArgumentValue::as_str or something like that? I guess you'd have to return a Cow<str> because of the to_string_lossy in there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That still requires 3 lines thanks to the borrow checker.

@@ -291,7 +299,7 @@ where

CompilerArguments::Ok(ParsedArguments {
input: input.into(),
extension: extension,
extension: extension.expect("extension should always be set"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop this if you rewrite the bit above to be like:

let extension = match extension {
  None => return CompilerArguments::CannotCache("unknown source extension"),
  Some(e) => e,
};

I wrote a macro to do this for required options in the Rust compiler implementation.

@luser
Copy link
Contributor

luser commented Aug 10, 2017

Oh heh, I forgot that the "review changes" button applies to the entire PR. Oh well. The second patch looks fine modulo those few little comments, but I'd really like to see the change to not use the enum variant names in the first patch.

Each compiler essentially had its own, rather minimalist, argument handling.
It didn't allow to fully handle the vast breadth of arguments that
compilers can take, and wouldn't allow to find out which arguments are
related to preprocessing, and which aren't. Furthermore, gcc-type
arguments are very lax, and allow things like "-includefile" and
"-include file" to mean the same thing, making things even harder.
And then there's rust, that allows "--emit=dep-info" and "--emit dep-info"
to mean the same thing.

We introduce a new generic API that can handle all sorts of arguments,
based on flat descriptions of the arguments that can be handled, and
proceed with defining more of the arguments that the various compilers
support.

While the argument list for gcc/clang is rather exhaustive, it is still
based on a combination of what ccache handles, and what sccache used to
handle, and might still miss some of the arguments that can take values.

The argument list for msvc is not exhaustive and might need some
improvement. For instance, I haven't actually checked whether some of
the arguments allow non-concatenated forms.
First and foremost, to generically convey the type of compilation, we
normalize the file extension, and will use that information to feed
the right -x argument to the preprocessor and compiler.

When the compiler command sccache is invoked with already contains a -x
argument, we set the normalized file extension accordingly, such that
it matches what the caller wants, rather than what the source file
extension says.

On the preprocessor end, we then always pass one of `-x c`, `-x c++`,
`-x objective-c` or `-x objective-c++` according to the normalized file
extension.

On the compiler end, we always pass one of `-x cpp-output`, `-x
c++-cpp-output`, `-x objective-c-cpp-output` or `-x
objective-c++-cpp-output` accordingly.

Note that we used to pass `-x objc-cpp-output` to gcc, but that's a
deprecated form that it now warns about. The new form has been available
since at least gcc 4.3, so is fine to use.

And because a same source compiled as C or C++ will yield different
object code, include the normalized extension when computing the hashed
value.

Fixes mozilla#163.
Some(Language) => {
let lang = item.arg.get_value().map(OsString::from);
let lang = lang.as_ref().map(|a| a.to_string_lossy());
language = match lang.as_ref().map(|a| a.as_ref()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a heck of a pile of as_refs.

Language::C => "c",
Language::Cxx => "c++",
Language::ObjectiveC => "objective-c",
Language::ObjectiveCxx => "objective-c++",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of a bummer that there are so many ways to spell these that you can't just have a common method on Language.

@luser luser merged commit ee10eae into mozilla:master Aug 16, 2017
@luser
Copy link
Contributor

luser commented Aug 16, 2017

Thanks for the nice refactor!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants