Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Architecture review #42

Open
steveklabnik opened this issue Jul 31, 2023 · 11 comments
Open

Architecture review #42

steveklabnik opened this issue Jul 31, 2023 · 11 comments
Assignees

Comments

@steveklabnik
Copy link
Collaborator

in #33, @anonrig mentioned not being happy with how the architecture is evolving here.

I'm gonna try and take a look and see what I think, but if you have anything to elaborate on what's bugging you, that's helpful.

@steveklabnik steveklabnik self-assigned this Jul 31, 2023
@anonrig
Copy link
Member

anonrig commented Jul 31, 2023

Some key points:

  1. Thread-safety in registry is bad. elsa::FrozenMap does not provide thread-safety. Especially when used with async move call blocks. And it results in cache misses.
  2. Usage of join_all compared to tokio::spawn in an iter().for_each()
  3. Errors in join_all are swallowed and never propagated to the parent context.
  4. The usage of async recursion is really unperformant and causes stack overflow (ref: Stackoverflow error when installing #38)
  5. Tarball extraction is a async process where the distinction of CPU intensive tasks (such as decompression, checksum verification) are on the same thread.
  6. Distinction between cli and package_manager. I tried to distinguish the clap requirements and package_manager commands, but passing data between them, such as InstallArgs makes unnecessary data copies without sharing the type. When in a architecture where you share the types, it causes cyclic dependency error, and forces to create a new crate for them. (Chicken in the egg problem)
  7. Benchmarking is a huge pain.
  8. The installation and adding a new dependency should be revisited to produce better thread-safety.

@Boshen
Copy link

Boshen commented Jul 31, 2023

  1. For cache misses, I want to learn the proper approach as well. Do we put the data structure in a single thread and communicate it with message passing, or do we use some mutexes and condvars as describe in this blog post https://medium.com/@polyglot_factotum/rust-concurrency-patterns-condvars-and-locks-e278f18db74f?
  1. Tarball extraction is a async process where the distinction of CPU intensive tasks (such as decompression, checksum verification) are on the same thread.

Try https://docs.rs/tokio/latest/tokio/task/fn.spawn_blocking.html?

  1. Benchmarking is a huge pain.

Setup a mock server? Abstract away your IO and use a memory filesystem?


If I were to architect this, I may end up with a message passing system because I learned a bit of [akka[(https://akka.io/) a decade ago 😂

@OlshaMB
Copy link

OlshaMB commented Jul 31, 2023

I would to this by saying that currently function that crates/package_manager/src/add.rs and crates/package_manager/src/install.rs needs rework:

  1. add function duplicates install_package code.
  2. TarballManager has too much responsibility, I think it should only manage tarballs not download dependencies and symlink them, it should only download tarballs and unpack them, maybe also not save them as files and pipe the data back
  3. Make checks for is package early and use rust's hashmap for that, no fs lookup(currently it's only made when downloading the tarball)
    All of this would make my life easier and make me able to implement normal install logging and a progress bar

@OlshaMB
Copy link

OlshaMB commented Jul 31, 2023

Also some of the things I would also recommend researching:

  • Early build dependency tree, that you can use then to optimize further.(I am not an expert in that stuff)
  • Use http2 for downloading
  • Parallelize downloading

@OlshaMB
Copy link

OlshaMB commented Jul 31, 2023

I am happy to help out with project

@OlshaMB
Copy link

OlshaMB commented Jul 31, 2023

I am interested which options are you considering using?
My not expert ideas:

  • download queue?
  • tokio spawn?

@metcoder95
Copy link

I'm being quite slow with my PR, but definitely will look to help more in some ends

@OlshaMB
Copy link

OlshaMB commented Aug 1, 2023

I tried migrating to futures stream, but got stuck with multithereaded datasharing
my branch

@steveklabnik
Copy link
Collaborator Author

Distinction between cli and package_manager. I tried to distinguish the clap requirements and package_manager commands, but passing data between them, such as InstallArgs makes unnecessary data copies without sharing the type.

Yes, so the cli package to me feels really thin, and not useful.

When in a architecture where you share the types, it causes cyclic dependency error, and forces to create a new crate for them. (Chicken in the egg problem)

do you happen to have a branch lying around, or remember which packages caused this conflict?

@steveklabnik
Copy link
Collaborator Author

Just saving for myself, to reproduce the cache misses:

diff --git a/crates/registry/src/lib.rs b/crates/registry/src/lib.rs
index a955f0a..59a1331 100644
--- a/crates/registry/src/lib.rs
+++ b/crates/registry/src/lib.rs
@@ -63,8 +63,10 @@ impl RegistryManager {

     pub async fn get_package(&self, name: &str) -> Result<&Package, RegistryError> {
         if let Some(package) = &self.cache.get(name) {
+            println!("cache hit: {}", package.name);
             return Ok(package);
         }
+        println!("cache miss: {name}");

         let package: Package = self
             .client

an example is the ajv package showing cache miss more than one time.

@anonrig
Copy link
Member

anonrig commented Aug 3, 2023

Yes, so the cli package to me feels really thin, and not useful.

@steveklabnik This PR removes package_manager crate and moves it to cli: #51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants