New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove repeated imports from generated code #31711
Conversation
Wow, that's greatly written PR's summary! +1 for avoiding repetition of imports
I've been thinking about this before, but I never considered to use partial path as they seem somehow weird to me, especially with JS_* functions (I'm mostly worried about visual clutter, as codegen is already complicated), but that's just my opinion. What do others think? @jdm? |
@sagudev I also thought about it as I was making the changes. I tried to only do it for imports that were only referenced once in the code, so there wouldn't be long paths everywhere. But I wouldn't mind to add them back and have them explicitly imported, as they weren't too many and there's tradeoffs for both options. I'd love to hear what everyone else thinks! |
I'm ok with the the module paths. My only concern is how to decide in the future whether to add an import or use a named path, both as a code author and reviewer. |
That is a great point. Having looked some more at the code, there are already named paths used in some places, and they are a bit inconsistent (some start with |
I think this is because rust usually suggest
Unfortunately there is no such thing in rustfmt and I am not really fan of adding more tidy checks. |
Hm that's annoying, specially that they are not merged automatically.
That's fair. I wrote this quick and dirty script that at least could be run manually to check if there are too many copies of a named path: import os
import re
import sys
def main():
path = sys.argv[1]
if not os.path.exists(path):
print("No such file")
sys.exit(1)
check_imports(path)
def check_imports(path):
# Dictionary to save named paths and their frequency
named_paths = {}
# Regular expression to match named paths
re_paths = r"([a-zA-Z0-9_]+)((::([a-zA-Z0-9_]+))+)"
re_paths_c = re.compile(re_paths)
re_imports = r"use\s+(([a-zA-Z0-9_]+)::)*([a-zA-Z0-9_]+)(\s+as\s+([a-zA-Z0-9_]+))?"
re_imports_c = re.compile(re_imports)
# Iterate all generated files
for filename in os.listdir(path):
file = os.path.join(path, filename)
if not os.path.isfile(file):
continue
# List to save imported files so that we can exclude them from the named paths
imported = []
# Read every file checking for named paths
with open(file, "r") as f:
for line in f:
# Trim first spaces
line = line.lstrip()
# If it starts with `use`, it is an import, keep going
if line.startswith("use"):
name = re.search(re_imports_c, line)
if name:
name = name[5] if name[5] else name[3]
if name not in imported:
imported.append(name)
continue
# Check for patters like name::name(::name...)
res = re.findall(re_paths_c, line)
if res:
for p in res:
# If the base name is an import, ignore
if p[0] in imported:
continue
p = p[0] + p[1]
if p in named_paths:
named_paths[p] += 1
else:
named_paths[p] = 1
named_paths = sorted(named_paths.items(), key=lambda x: x[1], reverse=True)
named_paths = [i for i in named_paths if i[1] > 10]
for p in named_paths:
print(f"{p[0]}: {p[1]}")
if __name__ == "__main__":
main()
There were already some named paths in the generated code, and some of them are repeated everywhere, so once we decide on where to draw the line that makes named paths acceptable it's probably a good idea to go back and change those too to be consistent. |
Following the discussion in Zulip I removed the controversial named path changes so this pr only relates to repeated imports. The last commit, labeled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, while there may be some trends of lower builds speeds, I think any speedup is lost in the noise on the CI. Likely this will need to be tested locally in a more systematic way. I'm pretty confident that this isn't making things significantly worse though -- and it reduces the amount of code generated greatly.
Just to clarify that this is just my impression after reading ~10 build timings files from the CI jobs. |
I would have liked to do a local build speed test but unfortunately my laptop is old and unreliable, so it may not have been the best for consistent results. Either way, I also looked at the CI timings and it doesn't appear to make things much worse, and as you say it saves many lines of generated code. Thanks for reviewing! |
Right most generated code files (created when running
codegen/CodegenRust.py
) have all the imports repeated twice. This happens because bothCGDescriptor
andCgBindingRoot
callgenerate_imports
, which has a big list of use statements that get pasted twice per file (or even more in some cases).CGBindingRoot
handles the imports at the top of the file, whileCGDescriptor
handles the ones used inside of thepub mod NAME_Binding
module that each file generates.This pr removes
generate_imports
in favour of two separate calls toCGImports
from each of the functions. All of the imports were tested to check if they were still necessary (some of them were unused).CGBindingRoot
only retains the ones explicitly used outside of the generated modules.CGDescriptor
now inherits those imports (by usinguse super::*
) so further repetition is avoided.Additionally, any imports that were only used in a few of the generated files were removed from the import list in favour of explicit naming (for example,Removed from this pr.resolve_global
->utils::resolve_global
). This was done because it doesn't make much sense to import something in 400+ generated files when only one or two use it. The criteria followed for this was that if less than 5 files use the import, it was be removed.All of these changes amount to fewer repetition, saving about ~150.000 lines of generated code, and hopefully making the import list more maintainable in the future. It will also help when tackling all the clippy issues in
script
.The changes were tested locally and with the default linux build github action, but it will probably be a good idea to do a full build test for all platforms just in case any code hidden behind platform flags slipped by.
Side note: there was an unused import for
is_platform_object_dynamic
that when removed revealed that this function was not used in the code. I added anallow(dead_code)
, but it may be necessary to see why it wasn't being used../mach build -d
does not report any errors./mach test-tidy
does not report any errors