Remove repeated imports from generated code #31711

eerii · 2024-03-16T18:35:15Z

Right most generated code files (created when running codegen/CodegenRust.py) have all the imports repeated twice. This happens because both CGDescriptor and CgBindingRoot call generate_imports, which has a big list of use statements that get pasted twice per file (or even more in some cases).

CGBindingRoot handles the imports at the top of the file, while CGDescriptor handles the ones used inside of the pub mod NAME_Binding module that each file generates.

This pr removes generate_imports in favour of two separate calls to CGImports from each of the functions. All of the imports were tested to check if they were still necessary (some of them were unused). CGBindingRoot only retains the ones explicitly used outside of the generated modules. CGDescriptor now inherits those imports (by using use super::*) so further repetition is avoided.

Additionally, any imports that were only used in a few of the generated files were removed from the import list in favour of explicit naming (for example, resolve_global -> utils::resolve_global). This was done because it doesn't make much sense to import something in 400+ generated files when only one or two use it. The criteria followed for this was that if less than 5 files use the import, it was be removed. Removed from this pr.

All of these changes amount to fewer repetition, saving about ~150.000 lines of generated code, and hopefully making the import list more maintainable in the future. It will also help when tackling all the clippy issues in script.

The changes were tested locally and with the default linux build github action, but it will probably be a good idea to do a full build test for all platforms just in case any code hidden behind platform flags slipped by.

Side note: there was an unused import for is_platform_object_dynamic that when removed revealed that this function was not used in the code. I added an allow(dead_code), but it may be necessary to see why it wasn't being used.

./mach build -d does not report any errors
./mach test-tidy does not report any errors
These changes do not require tests because they relate to code generation, not a web feature

sagudev · 2024-03-16T21:17:43Z

Wow, that's greatly written PR's summary!

+1 for avoiding repetition of imports

Additionally, any imports that were only used in a few of the generated files were removed from the import list in favour of explicit naming

I've been thinking about this before, but I never considered to use partial path as they seem somehow weird to me, especially with JS_* functions (I'm mostly worried about visual clutter, as codegen is already complicated), but that's just my opinion. What do others think? @jdm?

eerii · 2024-03-16T22:36:39Z

@sagudev I also thought about it as I was making the changes. I tried to only do it for imports that were only referenced once in the code, so there wouldn't be long paths everywhere. But I wouldn't mind to add them back and have them explicitly imported, as they weren't too many and there's tradeoffs for both options. I'd love to hear what everyone else thinks!

jdm · 2024-03-17T07:46:12Z

I'm ok with the the module paths. My only concern is how to decide in the future whether to add an import or use a named path, both as a code author and reviewer.

eerii · 2024-03-17T11:29:36Z

That is a great point. Having looked some more at the code, there are already named paths used in some places, and they are a bit inconsistent (some start with crate::dom, others just with dom, ...). What seems to be more global is that they are only written once in CodegenRust.py. That would be an easy criteria, but it may not be the most appropriate since they could appear in many generated files. Another option could be adding an automatic check to see how many times a named path appear in generated files and suggest adding it as an import if it exceeds a certain threshold.

sagudev · 2024-03-17T11:49:07Z

they are a bit inconsistent (some start with crate::dom, others just with dom, ...).

I think this is because rust usually suggest crate::, while if you manually write you just write it directly. I also observed similar thing with Rust-Analyzer imports, that usually use super::. It's so confusing and annoying, because use crate:: and use super:: are not merged even if they refer to same path.

Another option could be adding an automatic check to see how many times a named path appear in generated files and suggest adding it as an import if it exceeds a certain threshold.

Unfortunately there is no such thing in rustfmt and I am not really fan of adding more tidy checks.

eerii · 2024-03-17T12:29:51Z

I think this is because rust usually suggest crate::, while if you manually write you just write it directly. I also observed similar thing with Rust-Analyzer imports, that usually use super::. It's so confusing and annoying, because use crate:: and use super:: are not merged even if they refer to same path.

Hm that's annoying, specially that they are not merged automatically.

Unfortunately there is no such thing in rustfmt and I am not really fan of adding more tidy checks.

That's fair. I wrote this quick and dirty script that at least could be run manually to check if there are too many copies of a named path:

import os
import re
import sys


def main():
    path = sys.argv[1]
    if not os.path.exists(path):
        print("No such file")
        sys.exit(1)
    check_imports(path)


def check_imports(path):
    # Dictionary to save named paths and their frequency
    named_paths = {}

    # Regular expression to match named paths
    re_paths = r"([a-zA-Z0-9_]+)((::([a-zA-Z0-9_]+))+)"
    re_paths_c = re.compile(re_paths)
    re_imports = r"use\s+(([a-zA-Z0-9_]+)::)*([a-zA-Z0-9_]+)(\s+as\s+([a-zA-Z0-9_]+))?"
    re_imports_c = re.compile(re_imports)

    # Iterate all generated files
    for filename in os.listdir(path):
        file = os.path.join(path, filename)
        if not os.path.isfile(file):
            continue

        # List to save imported files so that we can exclude them from the named paths
        imported = []

        # Read every file checking for named paths
        with open(file, "r") as f:
            for line in f:
                # Trim first spaces
                line = line.lstrip()
                # If it starts with `use`, it is an import, keep going
                if line.startswith("use"):
                    name = re.search(re_imports_c, line)
                    if name:
                        name = name[5] if name[5] else name[3]
                        if name not in imported:
                            imported.append(name)
                    continue
                # Check for patters like name::name(::name...)
                res = re.findall(re_paths_c, line)
                if res:
                    for p in res:
                        # If the base name is an import, ignore
                        if p[0] in imported:
                            continue
                        p = p[0] + p[1]
                        if p in named_paths:
                            named_paths[p] += 1
                        else:
                            named_paths[p] = 1

    named_paths = sorted(named_paths.items(), key=lambda x: x[1], reverse=True)
    named_paths = [i for i in named_paths if i[1] > 10]
    for p in named_paths:
        print(f"{p[0]}: {p[1]}")


if __name__ == "__main__":
    main()

js::jsapi::JSClassOps: 780
crate::dom::bindings::codegen::Bindings::EventHandlerBinding::EventHandlerNonNull: 685
crate::dom::bindings::codegen::InheritTypes::TopTypeId: 424
js::jsapi::JSClass: 390
crate::dom::bindings::reflector::Reflector: 160
crate::dom::bindings::codegen::InheritTypes::EventTargetTypeId::Node: 81
crate::dom::bindings::codegen::InheritTypes::NodeTypeId::Element: 71
xpc::WrapperFactory::IsXrayWrapper: 68
crate::dom::bindings::codegen::InheritTypes::ElementTypeId::HTMLElement: 68
std::sync::atomic::Ordering::Acquire: 34...

There were already some named paths in the generated code, and some of them are repeated everywhere, so once we decide on where to draw the line that makes named paths acceptable it's probably a good idea to go back and change those too to be consistent.

eerii · 2024-03-20T21:33:16Z

Following the discussion in Zulip I removed the controversial named path changes so this pr only relates to repeated imports.

The last commit, labeled experiment, is a test of @mrobinson's suggestion to have imports in a separated module and export them so that generated files can use them without having to repeat them in every one. There are concerns that this may cause slower compilation, but with the CI test we may be able to see if it differs significantly from other runs. This change further reduces the generated code by about ~100.000 more lines.

mrobinson

Unfortunately, while there may be some trends of lower builds speeds, I think any speedup is lost in the noise on the CI. Likely this will need to be tested locally in a more systematic way. I'm pretty confident that this isn't making things significantly worse though -- and it reduces the amount of code generated greatly.

mrobinson · 2024-03-21T13:38:08Z

Unfortunately, while there may be some trends of lower builds speeds, I think any speedup is lost in the noise on the CI. Likely this will need to be tested locally in a more systematic way. I'm pretty confident that this isn't making things significantly worse though -- and it reduces the amount of code generated greatly.

Just to clarify that this is just my impression after reading ~10 build timings files from the CI jobs.

eerii · 2024-03-21T15:07:02Z

I would have liked to do a local build speed test but unfortunately my laptop is old and unreliable, so it may not have been the best for consistent results. Either way, I also looked at the CI timings and it doesn't appear to make things much worse, and as you say it saves many lines of generated code. Thanks for reviewing!

eerii added 8 commits March 15, 2024 12:49

feat: try to deduplicate imports in codegen

f00360c

feat: another attempt

f37cfb1

feat: start testing imports

a53b8a2

feat: clean all global imports

9a6a24d

feat: remove shared imports from CGDescriptor

1e97c87

feat: remove redundant imports from CGDescriptor

7105745

fix: formatting

e32d205

Merge branch 'servo:main' into clean_codegen

d305028

eerii added 5 commits March 20, 2024 18:28

Merge branch 'main' into clean_codegen

d4054d9

fix: remove libc (base level import)

31a7e16

feat: roll back named path changes

7231d61

feat: last changes and tidy

ea38c2a

experiment: move imports into a separate file

fc5327c

eerii force-pushed the clean_codegen branch from 8fe61d8 to fc5327c Compare March 20, 2024 21:29

fix: extra parenthesis

77683e3

eerii force-pushed the clean_codegen branch from f58d4fd to 76c25a3 Compare March 21, 2024 11:39

fix: remove repeated allow statement

dcfc92b

eerii force-pushed the clean_codegen branch from 76c25a3 to dcfc92b Compare March 21, 2024 11:39

Merge branch 'main' into clean_codegen

6b5c241

mrobinson approved these changes Mar 21, 2024

View reviewed changes

mrobinson added this pull request to the merge queue Mar 21, 2024

Merged via the queue into servo:main with commit 8c7e9a1 Mar 21, 2024
8 of 9 checks passed

eerii deleted the clean_codegen branch March 21, 2024 17:00

sagudev mentioned this pull request Mar 31, 2024

Extract generated class constructor hook into handwritten functions #31571

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove repeated imports from generated code #31711

Remove repeated imports from generated code #31711

eerii commented Mar 16, 2024 •

edited

sagudev commented Mar 16, 2024

eerii commented Mar 16, 2024

jdm commented Mar 17, 2024

eerii commented Mar 17, 2024

sagudev commented Mar 17, 2024

eerii commented Mar 17, 2024

eerii commented Mar 20, 2024 •

edited

mrobinson left a comment

mrobinson commented Mar 21, 2024

eerii commented Mar 21, 2024

Remove repeated imports from generated code #31711

Remove repeated imports from generated code #31711

Conversation

eerii commented Mar 16, 2024 • edited

sagudev commented Mar 16, 2024

eerii commented Mar 16, 2024

jdm commented Mar 17, 2024

eerii commented Mar 17, 2024

sagudev commented Mar 17, 2024

eerii commented Mar 17, 2024

eerii commented Mar 20, 2024 • edited

mrobinson left a comment

Choose a reason for hiding this comment

mrobinson commented Mar 21, 2024

eerii commented Mar 21, 2024

eerii commented Mar 16, 2024 •

edited

eerii commented Mar 20, 2024 •

edited