Skip to content

Conversation

@seddonym
Copy link
Collaborator

@seddonym seddonym commented Jun 9, 2025

Moves the ImportScanner class to Rust.

In order to do this we also need to define Rust-based swappable fake/real file systems to be passed in, so we can still unit test via Python. To reduce the amount of work involved, I've defined a narrower interface for the filesystem that only implements what is needed by ImportScanner. We can broaden it later when we come to do the same with caching / module finding.

I had trouble getting the ImportScanner to pickle/unpickle correctly, which is needed to do multiprocessing, so I've removed multiprocessing. This slows things down considerably (on a large graph, locally, building the graph goes from 5s -> 15s). So really we need to keep going and move the multiprocessing to threads before this is releasable.

There's also distinctly smelly Rust code that loads the Module and DirectImport dataclasses, rather than defining them in Rust - all in the interests of trying to limit how many changes I needed to make to keep the tests passing.

Parallelism next steps

The ImportScanner class currently requires the GIL, which means we still have a bit of work to do before we can move to multithreading. I think the best thing to do is abandon the ports-and-adapters approach for ImportScanner (we only have one of them anyway) and instead make a function that we can unit test from Python, that does them in bulk along the lines of #222. We should keep the unit tests of import scanner but just adapt them to call the bulk function instead - which at first could be in Python, but then we could push that down to Rust. That would allow us to turn the ImportScanner into a pure Rust class that doesn't need the GIL, and do any mapping to Python classes / exceptions in a wrapper function in Python.

Still to do

  • Address regression by reintroducing parallelism, either by making ImportScanner pickleable (temporarily) or by doing the work described above.

@codspeed-hq
Copy link

codspeed-hq bot commented Jun 9, 2025

CodSpeed Instrumentation Performance Report

Merging #221 will degrade performances by 73.02%

Comparing rust-filesystem (0cfe293) with master (1d56065)

Summary

❌ 6 regressions
✅ 16 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
test_deep_layers_large_graph_kept 16.3 ms 19.2 ms -15.17%
test_deep_layers_large_graph_violated 10.4 ms 11.6 ms -10.62%
test_no_chain 1.1 ms 1.2 ms -11.22%
test_no_chains 1.1 ms 1.2 ms -11.2%
test_build_django_from_cache_a_few_misses[350] 298.1 ms 634.2 ms -52.99%
test_build_django_uncached 145.1 ms 537.8 ms -73.02%

@seddonym seddonym force-pushed the rust-filesystem branch 4 times, most recently from b420e06 to cf7672f Compare June 11, 2025 15:31
@seddonym seddonym changed the title Rust filesystem Rust Import Scanner Jun 11, 2025
@seddonym
Copy link
Collaborator Author

It occurred to me we could probably keep multiprocessing for the time being if, rather than passing the file system as an argument to the jobs, we pulled it from the settings object within each job. Might be worth a try so we can merge this.

@seddonym
Copy link
Collaborator Author

Replacing in favour of #229, which doesn't turn off the multiprocessing.

@seddonym seddonym closed this Jun 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants