Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: create SparseRepoData from memory #619

Closed
aochagavia opened this issue Apr 23, 2024 · 2 comments · Fixed by #624
Closed

Feature request: create SparseRepoData from memory #619

aochagavia opened this issue Apr 23, 2024 · 2 comments · Fixed by #624

Comments

@aochagavia
Copy link
Contributor

Motivation

I have an AWS Lambda function that solves conda environments using rattler. It fetches cached repodata.json files from S3 and loads them into the solver. Using SparseRepoData gives us excellent performance, except for the fact that we need to extract the repodata to disk first and memmap it back to memory (resulting in ~1 second of overhead). For simple scenarios, this accounts for half of the time spent solving! Therefore I'm looking into a way of avoiding the disk + memmap roundtrip.

Below is a screenshot from our traces, showing the overhead in red when solving an environment for the single matchspec python:

image

Possible API design

We need a constructor that takes the repodata file's bytes directly instead of a Path. In the sketch below I'm using &[u8] instead of Vec<u8>, because that makes it unnecessary to use self-referential borrows (I think).

impl SparseRepoData {
    pub fn from_bytes(
        channel: Channel,
        subdir: impl Into<String>,
        bytes: &[u8],
        patch_function: Option<fn(&mut PackageRecord)>,
    ) -> Result<Self, io::Error> {
        todo!()
    }
}

Open questions

  1. Do you think this addition to rattler is worth it? Or would you rather have me implement it in a separate codebase?
  2. I don't see how my in-memory approach would be compatible with SparseRepoData. From a cursory investigation it looks like I'd need to introduce a separate type (maybe InMemorySparseRepoData), but I think I could reuse LazyRepoData.
@aochagavia
Copy link
Contributor Author

@baszalmstra do you think this would be useful as described here? Or would you suggest a different approach? I'd love to contribute this upstream if you think it's worth it.

@baszalmstra
Copy link
Collaborator

Yeah! That makes a lot of sense to me! A PR is very welcome!

baszalmstra added a commit that referenced this issue Apr 29, 2024
Fixes #619

---------

Co-authored-by: Bas Zalmstra <zalmstra.bas@gmail.com>
baszalmstra added a commit to baszalmstra/rattler that referenced this issue Apr 29, 2024
Fixes conda#619

---------

Co-authored-by: Bas Zalmstra <zalmstra.bas@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants