Skip to content

refactor(mercury): avoid path existence checks with once cell#741

Merged
genedna merged 4 commits into
gitmono-dev:mainfrom
el-ev:once
Dec 13, 2024
Merged

refactor(mercury): avoid path existence checks with once cell#741
genedna merged 4 commits into
gitmono-dev:mainfrom
el-ev:once

Conversation

@el-ev

@el-ev el-ev commented Dec 12, 2024

Copy link
Copy Markdown
Contributor

Avoids expensive file (stat) operations, no atomicity issues this time.

Time spent on mercury::internal::pack::cache::Caches::generate_temp_path reproducibly reduced from ~2.7% to ~1.4% for the test internal::pack::decode::tests::test_decode_large_file_async after applying this patch.

Before After
image image

@vercel

vercel Bot commented Dec 12, 2024

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
mega ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 13, 2024 0:17am

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated no suggestions.

Comments skipped due to low confidence (1)

mercury/src/internal/pack/cache.rs:86

  • Using unwrap() within call_once could cause a panic if directory creation fails. Consider handling the error more gracefully.
self.path_prefixes[hash.as_ref()[0] as usize].call_once(|| { fs::create_dir(&path).unwrap(); });

@MrBeanCpp

Copy link
Copy Markdown
Contributor

It looks good, reducing file system access and improving performance.

While it does add a bit of complexity, I was wondering if it might be a good idea to pre-create these 256 folders during the initialization phase to achieve the best possible performance?

@el-ev

el-ev commented Dec 13, 2024

Copy link
Copy Markdown
Contributor Author

It looks good, reducing file system access and improving performance.

While it does add a bit of complexity, I was wondering if it might be a good idea to pre-create these 256 folders during the initialization phase to achieve the best possible performance?

Thank you for the suggestion. However, I believe lazy initialization with Once in generally preferable.
For packs that are relatively small(quite common), not all the 256 folders are used, so pre-creating so many folders is neither effective nor elegant. For instance, the one in test_pack_decode_with_ref_delta contains only 69 hash prefixes.
Another thing I found is that the prefix folder is created when the cache object is inserted, but it should be created when it is ejected from lru cache, also not optimal.

@MrBeanCpp

Copy link
Copy Markdown
Contributor

For packs that are relatively small(quite common), not all the 256 folders are used, so pre-creating so many folders is neither effective nor elegant.

Well, you’re right, not all 256 folders will be used, but the overhead of creating them during initialization is insignificant compared to the overall decode time.

Plus, this approach removes any runtime checks, which significantly boosts performance and avoids potential race conditions.

Also, I think there’s no real difference in elegance between initializing 256 folders and 256 Once instances. Both will be cleaned up after decoding is complete.

Another thing I found is that the prefix folder is created when the cache object is inserted, but it should be created when it is ejected from lru cache, also not optimal.

For this, I think it could be a good idea.

@genedna genedna enabled auto-merge December 13, 2024 13:20
@genedna genedna added this pull request to the merge queue Dec 13, 2024
Merged via the queue into gitmono-dev:main with commit 0c23c94 Dec 13, 2024
@el-ev

el-ev commented Dec 13, 2024

Copy link
Copy Markdown
Contributor Author

Also, I think there’s no real difference in elegance between initializing 256 folders and 256 Once instances. Both will be cleaned up after decoding is complete.

If you handle the complexity in the code, it's nothing more than 1 MiB of heap space and some atomic operations. But if you just create those folders, you are leaving some random stuff to the user. In my opinion, that's where the elegance lies in. I'm not very fond of this idea, where you pull a 10k-sized repository, and then your "git" creates over 200 empty folders. Plus, in this case the influence to the performance is indeed significant.
image

Another thing I found is that the prefix folder is created when the cache object is inserted, but it should be created when it is ejected from lru cache, also not optimal.

For this, I think it could be a good idea.

This change would rely on LazyLock, yet another kind of Once though. Specifically, store a LazyLock in each ArcWrapper, which is accessed during drop. I could work on this if you like this approach.

@el-ev

el-ev commented Dec 13, 2024

Copy link
Copy Markdown
Contributor Author

uh, this pr is merged much earlier than I have expected... I would like to continue working on the idea mentioned above if appropriate.

@Hou-Xiaoxuan

Hou-Xiaoxuan commented Dec 13, 2024

Copy link
Copy Markdown
Contributor

This change would rely on LazyLock, yet another kind of Once though. Specifically, store a LazyLock in each ArcWrapper, which is accessed during drop. I could work on this if you like this approach.

Delaying the creation of folders until the drop stage sounds like a good idea, as not all objects need to be saved to a file when dropped—especially for smaller packs. However, modifying the code to add an additional member to each ArcWrapper seems a bit excessive and complex.

@MrBeanCpp

Copy link
Copy Markdown
Contributor

But if you just create those folders, you are leaving some random stuff to the user.

The default behavior is to delete the cached folders after decoding is complete, meaning these 256 folders only exist for an average of a few dozen seconds, or even less, which won't cause any inconvenience to the user.

Plus, in this case the influence to the performance is indeed significant.

Creating 256 empty folders should be nothing for modern operating systems, while the benefits it brings are reduced code complexity and fewer runtime checks.

This change would rely on LazyLock, yet another kind of Once though. Specifically, store a LazyLock in each ArcWrapper, which is accessed during drop. I could work on this if you like this approach.

I agree with Hou's perspective. While creating folders at the time of drop may sound elegant, the additional complexity it introduces would reduce code maintainability and increase the likelihood of bugs.

In my opinion, less, simpler, and more maintainable code is also an important criterion for elegance.

@el-ev el-ev deleted the once branch December 14, 2024 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants