Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: ShallowCopy for Option<T> (and more) #48

Merged
merged 4 commits into from
May 5, 2020
Merged

WIP: ShallowCopy for Option<T> (and more) #48

merged 4 commits into from
May 5, 2020

Conversation

code-elf
Copy link
Contributor

@code-elf code-elf commented Mar 25, 2020

Hi there!

As mentioned in #47, I'm using evmap in a (soon-to-be) production project, and ended up making some changes to make evmap easier for my personal use - and I figured some of them might be interesting to you as well. I am merely presenting this to you as a list of things to pick and choose from - tell me which ones you'd be interested in merging, and I'll remove the rest from this PR. Also if there are any code style/documentation/naming/other changes you'd like me to make (including splitting things up into multiple commits/PRs), just let me know!

  • Added an evmap-derive proc-macro crate for ShallowCopy
  • Removed the requirement for Send + Sync on Predicate for reject - it's not actually sent across threads, so there's no need for it from what I could see
  • Added an IntoIterator implementation for ReadGuard (allows doing map.get(&key).iter().flatten())
  • Made read() return an Option instead, which allows many of the MapReadRef methods to return non-wrapped values (the way I understood it it can't be modified or destroyed while the lock is held anyway)
  • Added single functions to the read modules - allows for simple and efficient reading of single values
  • Added contains_value convenience function to the read modules
  • Added ShallowCopy implementation for Option
  • Added Default derive for ShallowCopy
  • Added defaults for MapReadRef type parameters M and S to make it easier in use

This change is Reviewable

@jonhoo
Copy link
Owner

jonhoo commented Mar 25, 2020

Oooh, exciting! I'm a bit strapped for time, but here are some quick thoughts to get the conversation going:


Added an evmap-derive proc-macro crate for ShallowCopy

How often does this come up? My expectation was that implementations of ShallowCopy should be very rare, and that users should be extremely careful with them. I'm hesitant to hide it from the user unless this is a very common occurrence.


Removed the requirement for Send + Sync on Predicate for reject - it's not actually sent across threads, so there's no need for it from what I could see.

Removing Sync is probably fine, but removing Send means that WriteHandle is no longer Send, which would be sad. (WriteHandle is already not Sync since it contains a ReadHandle)


Added an IntoIterator implementation for ReadGuard (allows doing map.get(&key).iter().flatten())

Seems like a great idea!


Made read() return an Option instead, which allows many of the MapReadRef methods to return non-wrapped values (the way I understood it it can't be modified or destroyed while the lock is held anyway)

This seems reasonable to me. It's a breaking change, but a good one. The docs for read() should be updated to reflect what None means though.


Added single functions to the read modules - allows for simple and efficient reading of single values

I like this idea. What do you think of the name get_first? Also, docs still refer to multiple values, but I'm guessing you've just left those for later.


Added contains_value convenience function to the read modules

Makes sense to me!


Added ShallowCopy implementation for Option

Absolutely.


Added Default derive for ShallowCopy

Do you mean CopyValue? Seems fine to me.


Added defaults for MapReadRef type parameters M and S to make it easier in use

Yeah, that makes sense.


I'll also leave some notes about specifics that you didn't list above! As for commits, having distinct changes in their own commit is nice, but not necessary. It mostly makes pinpointing where bugs are introduced years from now easier :)

src/read/read_ref.rs Outdated Show resolved Hide resolved
src/read/read_ref.rs Outdated Show resolved Hide resolved
src/values.rs Outdated Show resolved Hide resolved
@code-elf
Copy link
Contributor Author

code-elf commented Mar 25, 2020

I'll fix the other things you brought up in a moment here, but quick responses to your comments:

How often does this come up? My expectation was that implementations of ShallowCopy should be very rare, and that users should be extremely careful with them. I'm hesitant to hide it from the user unless this is a very common occurrence.

It's extremely useful when you have structs as values that don't exclusively contain Copy values (String being the biggest offender). If you can see a better way of doing that, please let me know, but that's basically what I was using it for.

Do you mean CopyValue? Seems fine to me.

Yeah, that's what I meant.

@code-elf
Copy link
Contributor Author

I've added the requested changes in an extra commit for quick review - once you're happy with all of it, I'll rebase it into something a little more structured.

Added clarification to the derive macro point: I'm using multiple structs of primitives and strings, so they can't be Copy and I'd therefore have to manually write ShallowCopy impls for each of them to use them as values in evmap.

src/values.rs Outdated Show resolved Hide resolved
evmap-derive/src/lib.rs Outdated Show resolved Hide resolved
evmap-derive/src/lib.rs Outdated Show resolved Hide resolved
@jonhoo
Copy link
Owner

jonhoo commented Mar 25, 2020

If you can see a better way of doing that, please let me know, but that's basically what I was using it for.

So, in my use-cases, I "solve" this by allocating the offending value type behind a Box or Arc. But your approach is certainly less wasteful. I worry about people using this without realizing quite what they are opting into though. ShallowCopy is deeply unsafe, and users should really know what's going on to use it. Then again, maybe a derive is a step in the right direction there, because there isn't a way for them to really mess it up (same argument as #[pin_project] in a way).

I'm inclined to include evmap-derive as well. We'll want to make sure it's well document exactly what it does though, and probably also include some examples + tests.

@code-elf
Copy link
Contributor Author

maybe a derive is a step in the right direction there, because there isn't a way for them to really mess it up

That was my thought process as well - it only works on structs/enums if all of their fields already implement ShallowCopy - you can't really apply it to types that don't technically already support it.

Speaking of which, I wonder whether there's a way around having to unwrap and then re-wrap with ManuallyDrop in the macro - any ideas? Would it be possible for the ShallowCopy impls to not have to return ManuallyDrop directly (as it was before 9.0.0), but to create this wrapper in evmap's code instead?

@jonhoo
Copy link
Owner

jonhoo commented Mar 25, 2020

Since ManuallyDrop is #[repr(transparent)] I think technically you can just cast without having to do the whole unwrap-rewrap dance. I want to keep the signature the way it is to make the API as misuse-resistant as possible, even within evmap. If it actually returned Self, then you can bet at some point, someone (quite possibly me) will forget that it needs to be wrapped and carry it along and accidentally drop it.

I think I agree with you re evmap-derive. Let's keep it.

@code-elf
Copy link
Contributor Author

I made a few more changes - let me know whether the name get_one works for you or whether we should come up with something better, still (I figured if "first" could be confusing in the doc comment, it could be doubly so in the name).

One more thing: Do you think CopyValue could/should be working on Clone types rather than only Copy ones? I don't need this personally (due to the derive macro), but I also can't really see any reason why it shouldn't.

evmap-derive/src/lib.rs Outdated Show resolved Hide resolved
@jonhoo
Copy link
Owner

jonhoo commented Mar 26, 2020

So, my thinking for get_first was that it gets the "first item it sees", no matter which one it is. The implication being that it is efficient even when there are potentially many items. I'm okay with get_one too if you prefer that. Alternatively, get_any might work, although maybe any is confusing in this context?

And no, sadly, CopyValue has to be only explicitly for Copy values. If it could take Clone values, you would end up with multiple clones of the value, both of which need to be freed. ShallowCopy is specifically used in a setting where only one of the copies is freed, and so allowing T: Clone would lead to a continuous memory leak. This is the kind of stuff that makes me really paranoid about anything that makes ShallowCopy "more accessible" 😅

@code-elf
Copy link
Contributor Author

I added some tests and fixed the existing ones - let me know if there's anything else!

@code-elf
Copy link
Contributor Author

I added one more convenience function to cover a pattern I found myself using very commonly, get_owned. It returns a cloned value if it exists, and the default value otherwise, and it saves you from writing .cloned().unwrap_or_default() all the time when doing upserts.

Again, I'm not sure how specific that is to my use case, so I'll leave it up to you whether it's something you'd be interested in merging in.

evmap-derive/Cargo.toml Outdated Show resolved Hide resolved
tests/lib.rs Outdated Show resolved Hide resolved
tests/lib.rs Outdated Show resolved Hide resolved
@jonhoo
Copy link
Owner

jonhoo commented Mar 26, 2020

I think get_owned is probably overly specialized, and can be left out :)

@code-elf
Copy link
Contributor Author

I think that should cover all of your feedback so far!

@jonhoo
Copy link
Owner

jonhoo commented Mar 26, 2020

Yup, I think we're very close now! The negative tests are, I think, the only thing really missing. Plus metadata and docs for the evmap-derive sub-crate. Metadata I can take care of after merging though. We should probably also update the docs for evmap::ShallowCopy to mention evmap_derive.

As an aside, what do you think of naming the derive crate shallowcopy_derive?

@code-elf
Copy link
Contributor Author

What kind of negative tests do you mean? Isn't that what's in failing now?

In terms of docs, are you just looking for a README.md, or something more in-depth? I'm not really sure what I'd document since it's "just a derive macro"; it doesn't do anything unusual and offers no configuration - but anything specific you'd like, I can add.

I personally think evmap-derive (dash or underscore is up to you; they seem to be about equally common) is a more useful name since it's entirely tied to evmap (unless you'd like to extract ShallowCopy into its own crate as well, but I think that would be overkill), and even though there's only one for now, it's ultimately a crate for "evmap's derive macros".

src/read/read_ref.rs Outdated Show resolved Hide resolved
@jonhoo
Copy link
Owner

jonhoo commented Mar 27, 2020

@MayaWolf How about adding a remove_all, which returns a usize of how many values where removed? I think that would let you continue to use HashBag while also supporting your use-case? For iteration it's trickier since you'd have to also eliminate duplicates from the SmallVec to support that...

@code-elf
Copy link
Contributor Author

Hmm...if we could make BAG_THRESHOLD configurable, add a remove_all, and indirect access to HashBag's set_iter, that could work just as well!

I say make it configurable since I don't think it would make sense to use smallvec for anything above size 1 in that use case, since otherwise they'd just need to be temporarily hashed in the iterator anyway.

@jonhoo
Copy link
Owner

jonhoo commented Mar 27, 2020

Hmm, making BAG_THRESHOLD should be doable I think. There'd be a question around how exactly we do that, but a simple method on WriteHandle might be sufficient. Even changing it dynamically should be possible, with a note that existing values won't immediately change to match the new limit.

Forwarding access to set_iter is more complicated, since it only makes sense (I think) if BAG_THRESHOLD == 1. One way to go about it would be to add another type parameter to all the evmap types that is either Bag or Set (both are unit types, Bag is the default). set_bag_threshold would then only be available on WriteHandle<..., Bag>, and the constructor that produces WriteHandle<..., Set> always sets the threshold to 1. Then you'd have two ReadHandle::iter impls, one for ReadHandle<..., Set> and one for ReadHandle<..., Bag>. The former uses set_iter (and asserts all the counts are 1), the latter just calls iter as usual. You may also need to specialize insert for Set so that it eliminates duplicates, which may be a bit of a pain.

Not sure all of ^ would be worth it. Another option is to make the variants of enum ValuesInner public (essentially by removing the Values wrapper type), so that users can directly operate on &SmallVec and &HashBag on their own for reads? You'd still need to add remove_all, but at least it would take care of the iterator issue.

@code-elf
Copy link
Contributor Author

I'm curious what you think about this one! I went for the second option you proposed, exposing the inner value storage directly. It allows the user to specify which inner value storage they would like to use - with the built-in choices currently being HashBag, HashSet and Option.

The changeset for this is pretty large, but that's mostly because I had to change generic parameters in a lot of places.

I also removed some of the stuff I added in previous commits, as it wasn't really applicable or necessary anymore. In the case of Option, the value is flattened before being exposed to the user, so single values can be worked with conveniently.

I tested this using both cargo test and a production application, and it seems to be working well!

Let me know what you think about this and, if you like it, what you'd like me to add!

Oh, one more thing: The implementation of Values for HashBag currently doesn't pass the hasher along, as Debug is only implemented for the version with RandomState - that's probably something that would need to be corrected in that crate.

@jonhoo
Copy link
Owner

jonhoo commented Mar 30, 2020

Ah, hmm, no, that wasn't quite what I had in mind with my second suggestion. It was much simpler: just replace Values with ValuesInner, so that when the user gets a Values they are able to directly operate on the contained SmallVec or HashBag. No generics should be needed for that.

I think the proposed change overcomplicates the interface beyond what I'd like. It also makes some of the operations pretty weird in terms of semantics. push if Values is an Option is now really a replace for example. It would also mean that we lose the ability to dynamically switch between a SmallVec and a HashBag, unless we introduce yet another impl Values type that does things dynamically.

I think, for what you need, these are the modifications to make:

  • Add remove_all
  • Replace Values with ValuesInner (basically, remove the wrapper)
  • Make BAGIFY_THRESHOLD configurable.

Your code should then have all it needs to work only with values with distinct sets (I think).

@code-elf
Copy link
Contributor Author

code-elf commented Apr 2, 2020

My thoughts were that replace could be removed as a separate function, since to my knowledge it's only really useful for single-value maps - which would be more directly covered by the Option impl. Generally, single-value maps wouldn't need any special handling in the API anymore (so I actually think it would ultimately simplify it).

I did omit the impl for dynamic smallvec+hashbag since I wasn't sure how necessary it would be (the user could just choose an optimal value-bag type themselves) but it could be added very easily to use as a reasonable default.

Overall, I do think this has potential for making things easier and more versatile to work with, but ultimately it's of course entirely up to you; if you'd prefer me to revert the last part I will.

@jonhoo
Copy link
Owner

jonhoo commented Apr 2, 2020

I think we maybe talked past each other — my worry with your proposed approach is that the semantics of each Operation variant now depends on what V: Values you have. Push does not mean the same thing if V = Option<T> and V = SmallVec<T>. And, sadly, we cannot make the variants of Operation depend on the V (unless some kind of associated type trickery...). I suppose maybe with this approach we could have different methods on impl WriteHandle<..., Option<T>> and impl WriteHandle<...., SmallVec<T>> though, which might help?

I think what I'd prefer is to land this PR without the last commit, and then we open a new PR where we see if we can find a way to make that more sweeping change in isolation :)

@code-elf code-elf marked this pull request as ready for review April 2, 2020 15:27
@code-elf
Copy link
Contributor Author

code-elf commented Apr 2, 2020

Makes a lot of sense, yeah! This one should be ready for merge now; I'll open another PR once it is!

Copy link
Owner

@jonhoo jonhoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this now!

Only thing missing is to have CI also run for evmap-derive. Once crate-ci/azure-pipelines#80 lands, it should just be a matter of copying https://github.com/jonhoo/rust-evmap/blob/6238a4a6219b3546ceef5fc55996db954571321c/azure-pipelines.yml#L2-L4 and adding the dir parameter!

@jonhoo
Copy link
Owner

jonhoo commented Apr 3, 2020

Also, quick not on commit messages: it's usually best to keep the first line brief (50 chars), so that it does not get cut off in various UIs (like GitHub's). Then an empty line, and then a more detailed explanation of the change wrapped at 72 chars. There are a lot of resources on line on exact how/why (like this one), but at least keeping the first line short is nice for GitHub specifically!

@code-elf
Copy link
Contributor Author

code-elf commented May 4, 2020

Hi! Sorry this took me so long to get back to, I haven't really been able to work on Rust stuff at all for a little while.

I've been trying a few ways of getting CI to work; unfortunately I've never worked with Azure pipelines before so I'm not super familiar with how the setup works there. When I try to simply add

 - template: default.yml@templates
   parameters:
     minrust: 1.40.0
     dir: evmap-derive

below the existing lines, it complains about duplicated job names, and with the way I've set it up now, it complains about the format of default.yml - I'm really not familiar enough with Azure to fix this, and unfortunately don't have the time right now to do in-depth research about it...do you know what the quick fix would be here?

@jonhoo
Copy link
Owner

jonhoo commented May 4, 2020

No worries at all — life takes precedence :)

Ah, yes, that's a little awkward. I think I'd just bypass the template and add:

 - job: derive
   displayName: "Test evmap-derive"
   pool:
     vmImage: ubuntu-latest
   steps:
     - template: install-rust.yml@templates
     - script: cargo test
       displayName: cargo test

Probably just above the benchmark job here:
https://github.com/jonhoo/rust-evmap/blob/6e935f2f23a2a13077778e37dc71a7b0d30a7210/azure-pipelines.yml#L6

@codecov
Copy link

codecov bot commented May 4, 2020

Codecov Report

Merging #48 into master will decrease coverage by 2.01%.
The diff coverage is 52.63%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #48      +/-   ##
==========================================
- Coverage   82.70%   80.69%   -2.02%     
==========================================
  Files          11       11              
  Lines         977      984       +7     
==========================================
- Hits          808      794      -14     
- Misses        169      190      +21     
Impacted Files Coverage Δ
src/lib.rs 54.05% <ø> (ø)
src/read/guard.rs 82.60% <0.00%> (-7.87%) ⬇️
src/shallow_copy.rs 52.63% <0.00%> (-6.20%) ⬇️
src/values.rs 81.25% <0.00%> (-3.01%) ⬇️
src/write.rs 76.57% <ø> (ø)
src/read/read_ref.rs 62.96% <40.00%> (-27.04%) ⬇️
src/read/mod.rs 71.08% <42.85%> (-2.60%) ⬇️
tests/lib.rs 97.96% <100.00%> (-0.04%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6e935f2...2b9c59f. Read the comment docs.

src/shallow_copy.rs Outdated Show resolved Hide resolved
@jonhoo
Copy link
Owner

jonhoo commented May 4, 2020

This is looking great — thanks for tidying up the commits, it made reviewing much easier. I left two relatively minor comments :)

@jonhoo
Copy link
Owner

jonhoo commented May 4, 2020

It looks like the busybusybusy_fast test is now racy, and occasionally fails:

thread 'busybusybusy_fast' panicked at 'called `Result::unwrap()` on an `Err` value: Any', tests/lib.rs:284:9

The same appears to be the case for busybusybusy_heap (line 328) and busybusybusy_slow (also line 284).

I wasn't planning on raising this, but since some last changes may have to be made, I think the docs in evmap-derive also could use text wrapping. And while we're on the topic, I noticed you sometimes you single line breaks in the docs; these (I believe) do not end up making it into the final documentation. You have to use a double-line break to make a new paragraph. See for example the text you recently wrapped in shallow_copy. If you don't want a paragraph break, it's usually nicer to just not have the single line break there at all.

src/read/mod.rs Outdated Show resolved Hide resolved
@code-elf
Copy link
Contributor Author

code-elf commented May 4, 2020

Let me know whether you're happy with this!

@jonhoo jonhoo merged commit 9cdb4fd into jonhoo:master May 5, 2020
@jonhoo
Copy link
Owner

jonhoo commented May 5, 2020

This is great, thank you for sticking with it! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants