Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove repeated imports from generated code #31711

Merged
merged 16 commits into from Mar 21, 2024
Merged

Conversation

eerii
Copy link
Contributor

@eerii eerii commented Mar 16, 2024

Right most generated code files (created when running codegen/CodegenRust.py) have all the imports repeated twice. This happens because both CGDescriptor and CgBindingRoot call generate_imports, which has a big list of use statements that get pasted twice per file (or even more in some cases).

CGBindingRoot handles the imports at the top of the file, while CGDescriptor handles the ones used inside of the pub mod NAME_Binding module that each file generates.

This pr removes generate_imports in favour of two separate calls to CGImports from each of the functions. All of the imports were tested to check if they were still necessary (some of them were unused). CGBindingRoot only retains the ones explicitly used outside of the generated modules. CGDescriptor now inherits those imports (by using use super::*) so further repetition is avoided.

Additionally, any imports that were only used in a few of the generated files were removed from the import list in favour of explicit naming (for example, resolve_global -> utils::resolve_global). This was done because it doesn't make much sense to import something in 400+ generated files when only one or two use it. The criteria followed for this was that if less than 5 files use the import, it was be removed. Removed from this pr.

All of these changes amount to fewer repetition, saving about ~150.000 lines of generated code, and hopefully making the import list more maintainable in the future. It will also help when tackling all the clippy issues in script.

The changes were tested locally and with the default linux build github action, but it will probably be a good idea to do a full build test for all platforms just in case any code hidden behind platform flags slipped by.

Side note: there was an unused import for is_platform_object_dynamic that when removed revealed that this function was not used in the code. I added an allow(dead_code), but it may be necessary to see why it wasn't being used.


  • ./mach build -d does not report any errors
  • ./mach test-tidy does not report any errors
  • These changes do not require tests because they relate to code generation, not a web feature

@sagudev
Copy link
Member

sagudev commented Mar 16, 2024

Wow, that's greatly written PR's summary!

+1 for avoiding repetition of imports

Additionally, any imports that were only used in a few of the generated files were removed from the import list in favour of explicit naming

I've been thinking about this before, but I never considered to use partial path as they seem somehow weird to me, especially with JS_* functions (I'm mostly worried about visual clutter, as codegen is already complicated), but that's just my opinion. What do others think? @jdm?

@eerii
Copy link
Contributor Author

eerii commented Mar 16, 2024

@sagudev I also thought about it as I was making the changes. I tried to only do it for imports that were only referenced once in the code, so there wouldn't be long paths everywhere. But I wouldn't mind to add them back and have them explicitly imported, as they weren't too many and there's tradeoffs for both options. I'd love to hear what everyone else thinks!

@jdm
Copy link
Member

jdm commented Mar 17, 2024

I'm ok with the the module paths. My only concern is how to decide in the future whether to add an import or use a named path, both as a code author and reviewer.

@eerii
Copy link
Contributor Author

eerii commented Mar 17, 2024

That is a great point. Having looked some more at the code, there are already named paths used in some places, and they are a bit inconsistent (some start with crate::dom, others just with dom, ...). What seems to be more global is that they are only written once in CodegenRust.py. That would be an easy criteria, but it may not be the most appropriate since they could appear in many generated files. Another option could be adding an automatic check to see how many times a named path appear in generated files and suggest adding it as an import if it exceeds a certain threshold.

@sagudev
Copy link
Member

sagudev commented Mar 17, 2024

they are a bit inconsistent (some start with crate::dom, others just with dom, ...).

I think this is because rust usually suggest crate::, while if you manually write you just write it directly. I also observed similar thing with Rust-Analyzer imports, that usually use super::. It's so confusing and annoying, because use crate:: and use super:: are not merged even if they refer to same path.

Another option could be adding an automatic check to see how many times a named path appear in generated files and suggest adding it as an import if it exceeds a certain threshold.

Unfortunately there is no such thing in rustfmt and I am not really fan of adding more tidy checks.

@eerii
Copy link
Contributor Author

eerii commented Mar 17, 2024

I think this is because rust usually suggest crate::, while if you manually write you just write it directly. I also observed similar thing with Rust-Analyzer imports, that usually use super::. It's so confusing and annoying, because use crate:: and use super:: are not merged even if they refer to same path.

Hm that's annoying, specially that they are not merged automatically.

Unfortunately there is no such thing in rustfmt and I am not really fan of adding more tidy checks.

That's fair. I wrote this quick and dirty script that at least could be run manually to check if there are too many copies of a named path:

import os
import re
import sys


def main():
    path = sys.argv[1]
    if not os.path.exists(path):
        print("No such file")
        sys.exit(1)
    check_imports(path)


def check_imports(path):
    # Dictionary to save named paths and their frequency
    named_paths = {}

    # Regular expression to match named paths
    re_paths = r"([a-zA-Z0-9_]+)((::([a-zA-Z0-9_]+))+)"
    re_paths_c = re.compile(re_paths)
    re_imports = r"use\s+(([a-zA-Z0-9_]+)::)*([a-zA-Z0-9_]+)(\s+as\s+([a-zA-Z0-9_]+))?"
    re_imports_c = re.compile(re_imports)

    # Iterate all generated files
    for filename in os.listdir(path):
        file = os.path.join(path, filename)
        if not os.path.isfile(file):
            continue

        # List to save imported files so that we can exclude them from the named paths
        imported = []

        # Read every file checking for named paths
        with open(file, "r") as f:
            for line in f:
                # Trim first spaces
                line = line.lstrip()
                # If it starts with `use`, it is an import, keep going
                if line.startswith("use"):
                    name = re.search(re_imports_c, line)
                    if name:
                        name = name[5] if name[5] else name[3]
                        if name not in imported:
                            imported.append(name)
                    continue
                # Check for patters like name::name(::name...)
                res = re.findall(re_paths_c, line)
                if res:
                    for p in res:
                        # If the base name is an import, ignore
                        if p[0] in imported:
                            continue
                        p = p[0] + p[1]
                        if p in named_paths:
                            named_paths[p] += 1
                        else:
                            named_paths[p] = 1

    named_paths = sorted(named_paths.items(), key=lambda x: x[1], reverse=True)
    named_paths = [i for i in named_paths if i[1] > 10]
    for p in named_paths:
        print(f"{p[0]}: {p[1]}")


if __name__ == "__main__":
    main()
js::jsapi::JSClassOps: 780
crate::dom::bindings::codegen::Bindings::EventHandlerBinding::EventHandlerNonNull: 685
crate::dom::bindings::codegen::InheritTypes::TopTypeId: 424
js::jsapi::JSClass: 390
crate::dom::bindings::reflector::Reflector: 160
crate::dom::bindings::codegen::InheritTypes::EventTargetTypeId::Node: 81
crate::dom::bindings::codegen::InheritTypes::NodeTypeId::Element: 71
xpc::WrapperFactory::IsXrayWrapper: 68
crate::dom::bindings::codegen::InheritTypes::ElementTypeId::HTMLElement: 68
std::sync::atomic::Ordering::Acquire: 34...

There were already some named paths in the generated code, and some of them are repeated everywhere, so once we decide on where to draw the line that makes named paths acceptable it's probably a good idea to go back and change those too to be consistent.

@eerii
Copy link
Contributor Author

eerii commented Mar 20, 2024

Following the discussion in Zulip I removed the controversial named path changes so this pr only relates to repeated imports.

The last commit, labeled experiment, is a test of @mrobinson's suggestion to have imports in a separated module and export them so that generated files can use them without having to repeat them in every one. There are concerns that this may cause slower compilation, but with the CI test we may be able to see if it differs significantly from other runs. This change further reduces the generated code by about ~100.000 more lines.

Copy link
Member

@mrobinson mrobinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, while there may be some trends of lower builds speeds, I think any speedup is lost in the noise on the CI. Likely this will need to be tested locally in a more systematic way. I'm pretty confident that this isn't making things significantly worse though -- and it reduces the amount of code generated greatly.

@mrobinson
Copy link
Member

Unfortunately, while there may be some trends of lower builds speeds, I think any speedup is lost in the noise on the CI. Likely this will need to be tested locally in a more systematic way. I'm pretty confident that this isn't making things significantly worse though -- and it reduces the amount of code generated greatly.

Just to clarify that this is just my impression after reading ~10 build timings files from the CI jobs.

@mrobinson mrobinson added this pull request to the merge queue Mar 21, 2024
@eerii
Copy link
Contributor Author

eerii commented Mar 21, 2024

I would have liked to do a local build speed test but unfortunately my laptop is old and unreliable, so it may not have been the best for consistent results. Either way, I also looked at the CI timings and it doesn't appear to make things much worse, and as you say it saves many lines of generated code. Thanks for reviewing!

Merged via the queue into servo:main with commit 8c7e9a1 Mar 21, 2024
8 of 9 checks passed
@eerii eerii deleted the clean_codegen branch March 21, 2024 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants