Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ParseCallbacks handler for generated TokenStreams #1638

Closed
wants to merge 3 commits into from

Conversation

jandob
Copy link

@jandob jandob commented Oct 5, 2019

This small feature enables advances users to modify the generated code before it gets converted to a string representation.

Copy link
Contributor

@emilio emilio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this means that every time we update proc_macro2 we need to do a breaking bump which is not great...

And more generally that all the code generated by bindgen is part of its public API, which I don't think it's great either.

Could you elaborate on the use cases for this?

@jandob
Copy link
Author

jandob commented Oct 5, 2019

My use case is to further process the output.
I am generating mocks for c functions, and instead of the generated:

extern "C" {
    pub fn foo();
}

i need:

pub extern "C" fn foo() {
    // generated implementation
}

Currently i send the bindgen output to syn::parse_file, process the TokenStream and run it through rustfmt. That works, but feels awkward because bindgen already has the TokenStream representation and did rustfmt before.

Essentially, i think the TokenStream is a better representation for what bindgen is actually producing: a syntax tree.
Converting the tree to source code is the job of another crate: rustfmt.

Regarding the dependency on proc_macro2:
A breaking change in proc_macro2 would require a breaking bump. But is this so bad? The bindgen output could also change because of the proc_macro2 update.

@emilio
Copy link
Contributor

emilio commented Oct 6, 2019

A breaking change in proc_macro2 would require a breaking bump. But is this so bad? The bindgen output could also change because of the proc_macro2 update.

It is not bad as in "common" or such, but it's very easy to forget about... At the very least we need to add a comment in Cargo.toml so people don't forget.

Right now bindgen necessarily needs to be updated in order to update proc_macro2 and related crates of course, but if bindgen updates correctly there's no need to change bindgen's output, you just change how you use proc_macro2...

A bit more concerning, I think, is that any change to bindgen's output becomes breaking. Though I guess that is kind of the case already, to some extent...

Anyhow I see the use case, though I'm not quite convinced that exposing the whole syntax tree bindgen generates and proc_macro2's token tree is the right level of abstraction... At the very least, can we just expose one token tree? Right after we call the function, we just call quote! to join them together. There's no good reason to provide a vector, as far as I'm concerned, right?

In any case, this is just a bit more convenient way to do this than passing a String as a writer to generate() and modify that after the fact, right? I'm not sure if the use case is common enough to make it easier than that, but maybe I'm missing something...

@jandob
Copy link
Author

jandob commented Oct 6, 2019

Ok i changed the PR a bit, now the callback gets invoked in the write method (after TokenStream::to_string() and before rustfmt).

This is how i use it:

extern crate bindgen;
extern crate syn;
extern crate quote;
use bindgen;
use syn;
use syn::visit_mut::VisitMut;
use quote::ToTokens;

#[derive(Debug)]
struct MyCallbacks;
impl bindgen::callbacks::ParseCallbacks for MyCallbacks {
    fn bindings(&self, bindings: &str) -> Option<String> {
        let mut file = syn::parse_file(bindings)
            .expect("Unable to parse bindgen output");
        let mut converter = Converter::new();
        converter.visit_file_mut(&mut file);
        Some(file.into_token_stream().to_string())
    }
}

@emilio
Copy link
Contributor

emilio commented Oct 6, 2019

My point is, why is this different to passing a string to generate()?

@bors-servo
Copy link

☔ The latest upstream changes (presumably 49af9b7) made this pull request unmergeable. Please resolve the merge conflicts.

@pvdrz
Copy link
Contributor

pvdrz commented Sep 22, 2022

@jandob @emilio I was doing some triaging and wanted to drop my 2 cents about the topic:

bindgen uses syn internally nowadays and it would be possible to expose the module with the code or provide a visitor-like API. That being said, I agree with Emilio about breaking changes, this would mean that we couldn't bump any version for syn, quote or proc-macro2 without causing a breaking change.

An alternative would be exposing proc_macro::TokenStream instead which at least has the same release cycle as the compiler and the standard library, it would be less prone to breaking things and more understandable if some utility is not available in previous rust versions.

However, I don't think exposing the code as a token stream instead of a regular string would be a great improvement as syn still has to parse the tokens and check if they form a valid AST. So, by exposing the token tree we are just saving the tokenization cost which shouldn't be that large anyway.

@pvdrz pvdrz deleted the branch rust-lang:master November 2, 2023 17:50
@pvdrz pvdrz closed this Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants