Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary parser change #60

Closed
wants to merge 1 commit into from
Closed

Conversation

Skepfyr
Copy link
Contributor

@Skepfyr Skepfyr commented Jul 26, 2019

I was attempting to see if I could do anything about the remaining unsafe from #53 and got slightly carried away... I thought I may as well put this up to see if you thought it would be worthwhile.
I've flipped the parser inside out so that instead of being callback-based it is an Iterator over the instructions.
It's a really large change to the API and I don't think the benefits are huge, however, it does unify a few things and is a slightly more rusty interface (IMO). It would take a bit more work to get up and running but probably only a day or two.

@Skepfyr Skepfyr changed the title Data representation parser change Binary parser change Jul 26, 2019
@Skepfyr
Copy link
Contributor Author

Skepfyr commented Aug 9, 2019

Hey @antiagainst, I would love to hear your opinion on this, if for no other reason than that it doesn't get hopelessly out of date.

Copy link
Collaborator

@antiagainst antiagainst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for dropping the ball! I like the changes to remove unsafe (num.rs) but for the decoder and parser interface, I think we need more discussion. What's the motivation for those changes?

Would you mind to separate the changes for the unsafe removal out to get it landed first?

if bytes.len() % 4 == 0 {
words.push(0)
}
let chunks = s.as_bytes().chunks_exact(4);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -24,8 +24,8 @@ use std::{error, fmt};
/// Decoder Error.
#[derive(Debug, PartialEq)]
pub enum Error {
StreamExpected(usize),
LimitReached(usize),
StreamExpected,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me.

pub struct Parser<'c, 'd> {
decoder: decoder::Decoder<'d>,
consumer: &'c mut dyn Consumer,
pub struct Instructions<T>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not so sure about this. This is changing the whole parser interface but I'm not seeing how this is to be used? Another nit comment is that Parser is a more well-known term in compilers. Instruction is already heavily overloaded. Let's not make it worse.


let b = f32_to_bytes(-12.34);
let mut d = Decoder::new(&b);
assert_eq!(Ok(-12.34), d.float32());
}

#[test]
fn test_decode_float32() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So decoder is removed but we still have tests here?

}
Ok(words)
}
pub struct Decoder {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sort of like the existing decoder and parser structure because it allows other kinds of parsers. Whether there are such a need I cannot see it right now. But given it's already implemented, I think we can just keep it for now unless it becomes problematic for some feature we want to use. WDYT?

@Skepfyr
Copy link
Contributor Author

Skepfyr commented Aug 15, 2019

Sorry, I've realised I should have explained this better. This changes the parser from being callback-based to being an Iterator of instructions, the parse_bytes and parse_words functions are the best examples of this change. This gives more control to the consumer because now it's pull-based, with the parser essentially acting as a complicated map function over the input. This is possible as instead of a generic parser infrastructure, it relies on the layout of SPIR-V modules.

This change was brought about by me trying to get rid of the unsafe in parse_words, although it's proably possible to do that without this change (but probably does require a reasonably large change). This just struck me as a slightly more idiomatic way of doing things, and as a bonus deleted some code and removed some types.

This is a very draft-y PR most of the small changes can be ignored, I was only putting it up here for the large scale structure to be considered, and I think most of the unsafe removal came in #53.

I agree this is a large change without large benefits, however seen as so much is changing right now, I feel like now is probably the best time to implement anything like this.

@antiagainst
Copy link
Collaborator

I see your motivation better now. Thanks for the detailed information! But I feel I'm still not very convinced over this. What additional benefits does this bring except from being more Rust idiomatic? I think that's a second thing compared to designing good layering and API. (Unless the existing impl is totally off w.r.t. Rust paradigm; but that means it probably won't be able to be expressed in Rust ;-P) IMO the change proposed here blurs the layering and API boundary. I think a good API should make problematic usages harder. The previous decoder and parser is very clear regarding that. If you are using decoder, it means no understanding of the grammar, so only low-level decoding functionality and you are on your own for making sure everything works. If you are using parser, then you are only exposed to either full instruction parsing or error; no way to get a single operand of an instruction. Now with the new Instructions, we can do both: decoding an instruction, and then decoding a single id, and then trying to decode another instruction, which will fail. This makes me feel uneasy.

I guess I'm not opposed changing to make it more Rust idiomatic; I'm just worried about the public interface. As long as the API is reasonable, we can change the impl whatever way we want. WDYT? (But I may misunderstand your proposal; feel free to correct me!)

@Skepfyr
Copy link
Contributor Author

Skepfyr commented Aug 16, 2019

I think the public API in this is more structured than it used to be (although I might not have removed some pubs yet). The only public way to parse something is to call parse_bytes or parse_words, these both return something that looks like Result<(ModuleHeader, Instructions<_>)>. The ModuleHeader is exactly what would have been received by the consume_header method but now the types prove that you only get one per file and you get it first; Instructions should only have one public method next (from Iterator) which returns the next fully parsed instruction (or an error).

@antiagainst
Copy link
Collaborator

I see. So actually you'd like to make the decoder internal to the parser (which now becomes Instructions) entirely and only expose APIs for working with DR instructions. Isn't that still having the layering problem? I like the decoder and parser separation because they are modelling different concepts at different levels. So IMO it's cleaner to use two classes. It's also easier to test and has the additional benefits that others have access to the decoder API. Putting all of them in one class is creating a class that operates at multiple levels and serves multiple purposes... I'm still not quite seeing the real benefits here?

Given this is quite outdated right now (sorry I wasn't able to reply promptly!), would you mind to pocket this for now and let's focus on having SR fleshed out first? I don't want the world to be entirely broken when we cut the next release and folks suddenly find the codebase is entirely different. :)

@Skepfyr
Copy link
Contributor Author

Skepfyr commented Aug 26, 2019

Yes, to me the decoder and parser are essentially the same, if you look at the current code the design of each are heavily intertwined. Although I do like elegant abstractions, I can't really see one here or why it would be useful, I would like to hear of any alternative use case for the decoder without the parser, I can't see one existing because in order to decode what comes next in the stream you need to have parsed what you already have. There is no reason it is harder to test because a unit testing module can access all the methods, they still exist but are private.

The benefit I see of doing this is to clean up the users API, it turns from a few functions, a trait implementaton, and a whole load of methods that the user doesn't really need to touch to two functions (parse_bytes and parse_words).

I'm actually happy to just leave this here, it turns out I was wrong about it getting outdated as all of the sr work doesn't touch any of this code. I'm not sure about whether it is more sensible to wait for the next release, the whole thing is essentially changing anyway it might be sensible to just get it over and done with in one go, rather than making large changes to APIs over multiple revisions.

@Skepfyr
Copy link
Contributor Author

Skepfyr commented Aug 28, 2019

I attempted to implement some more of this just to see where it would take me. I'm now no longer convinced that this change specifically is worth it.
However, I did work out what is bothering me about the current API, we have 4 separate parsers exposed that the user has to feed the data through one-by-one (although there are some functions that do multiple steps of this), Decoder, Parser, dr, and sr. While I can see the benefits of this, it increases the complexity of the library and increases the amount of allocation. I might open a separate issue on this at some point but I agree that the change in this PR is not a good idea right now and I'm happy to leave this alone while we flesh out sr.

@Skepfyr Skepfyr closed this Aug 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants