Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Registry.to_list/1 #8963

Merged
merged 11 commits into from Apr 22, 2019

Conversation

@joladev
Copy link
Contributor

commented Apr 18, 2019

I've implemented a bare minimum of the feature discussed in the mailing list. I didn't focus too much on the tests or documentation since it's still being discussed.

As for a match version, I'm unsure of how to handle the match specs. The format of the data that you'd want to match on is, in the table, {key, {pid, value}} which we might not want to leak to the outside. In the case of count_match it's simpler, it only allows matching on the value and the provided match spec is wrapped to match the table.

Because of the uncertainty around the shape of the match specs I skipped remapping the data from {key, {pid, value}}, but a "flat" tuple like {key, pid, value} is probably preferable.

I'd like to try out a stream version, since it's similarly simple but avoids the list concatenations in the case of partitions. Maybe even implementing the to_list version with a stream under the hood would be preferable?

@josevalim

This comment has been minimized.

Copy link
Member

commented Apr 18, 2019

This looks great so far!

For the match spec, look at the match/4 function. The idea is that users of the registry can only match on the value and not on the key nor pid. So to_list should be very similar to match, what changes is the return type. :)

Regarding the stream, it is completely up to you. It is probably straight-forward to implement it as: Stream.flat_map(1..n_partitions, fn -> safe_to_list(partition) end) or something similar. But I wouldn't implement to_list on top of stream because then you pay the Stream.flat_map penalty when you know you want a list at the end. Maybe it is a matter of using Stream.flat_map in one and Enum.flat_map in the other?

@michalmuskala

This comment has been minimized.

Copy link
Member

commented Apr 18, 2019

For me matching on the key is actually one of the more interesting parts of that proposal - for example registering processes under unique names {:group, id} and matching to retrieve all {:group, _} processes. Right now this use case involves usually maintaining two registries - one for unique registrations and one for the group, which is kind of wasteful.

One issue I see with the stream, is that it might obtain views of different partitions at very different time-points, so it might be problematic because of that. The eager implementation also has differences, but they would be relatively small. Maybe a better solution would be to stream using:

Stream.concat(Enum.map(1..n_partitions, &safe_to_list(&1)))
@josevalim

This comment has been minimized.

Copy link
Member

commented Apr 18, 2019

Excellent point @michalmuskala. Although if we are doing that, I am not quite sure how to do it without exposing the whole registry implementation. Maybe we can force the match_spec to be a three element tuple and we will move the pieces internally accordingly?

@michalmuskala

This comment has been minimized.

Copy link
Member

commented Apr 18, 2019

One possible API could be documenting providing match specs for {key, pid, value} tuples and rewrite the spec before passing on to :ets.select/2:

spec = 
  for {{key, pid, value}, guards, result} <- spec, 
    do: {{key, {pid, value}}, guards, result}
@josevalim

This comment has been minimized.

Copy link
Member

commented Apr 18, 2019

@joladev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 19, 2019

Added to_list_match/2 and to_list_match/3. I realized that to_list could be implemented trivially using to_list_match so I did that. It means avoiding a lot of code duplication, but I'm not sure if :ets.select is significantly slower in this case? In that case, I'll separate the implementations again.

I tried to figure out how to remap {key, {pid, value}} to {key, pid, value} by using the last tuple item of the spec, but it's not immediately obvious to me how to do that without "naming" the parts in the match pattern. eg something like [{:"$1", {:"$2", :"$3"}}, [], [{:"$1", :"$2", :"$3"}]] works, but we want to support match patterns that don't name those parts.

I've got a question on the safe_ versions of the ETS functions that many, but not all, functions in this module have. What problem is it intended to solve? If I did one for :ets.select that returns empty list on error it would hide any malformed match patterns or guards to the end user, making it harder to debug (not that the ETS error message is very helpful...). keys/2 does do this though:

    keys =
      try do
        spec = [{{pid, :"$1", :"$2"}, [], [{{:"$1", :"$2"}}]}]
        :ets.select(pid_ets, spec)
      catch
        :error, :badarg -> []
      end

so I guess to_list and to_list_match should be consistent and do the same?

@josevalim

This comment has been minimized.

Copy link
Member

commented Apr 19, 2019

I'm not sure if :ets.select is significantly slower in this case?

Me neither, we would have to benchmark. :)

I tried to figure out how to remap {key, {pid, value}} to {key, pid, value} by using the last tuple item of the spec, but it's not immediately obvious to me how to do that without "naming" the parts in the match pattern

This indeed a problem. Maybe we should go with @michalmuskala's approach and have those functions accept full matchspecs instead of parts of it.

I've got a question on the safe_ versions of the ETS functions that many, but not all, functions in this module have.

The safe is only really necessary for safe_lookup_second, as that operation can fail if the key does not exist. safe_size doesn't need to be wrapper in safe afaik nor would the match spec related ones.

@joladev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 19, 2019

Me neither, we would have to benchmark. :)

Benchmark at: https://gist.github.com/joladev/34f85961849f78e5de39ad615d93cad4. As suspected it's not very fast for huge registries, but it's honestly not that slow really (<450ms for 1M items). I can push the repo with the benchmark if you guys want to play around with it. I was also surprised to see that using :ets.select to implement to_list was actually faster than :ets.tab2list.

This indeed a problem. Maybe we should go with @michalmuskala's approach and have those functions accept full matchspecs instead of parts of it.

I'm not super strongly against it but it's a shame to break with the convention of the module. Also, figuring out match specs is not easy, so the more we expose the harder it is to use. The benchmark shows letting ETS decide the output format is a slight performance improvement (but honestly not that much). I'm still leaning towards Enum.map(&to_entry/1). It's better UX and consistent with the module.

@michalmuskala

This comment has been minimized.

Copy link
Member

commented Apr 19, 2019

Allowing user to pass in the match spec result might be beneficial if the user wants to further process the data - it can save on one map operation inside the to_list function and then one more when the user starts processing it - so it might be significantly more performant. One common use case I can think of is when you actually want to only retrieve the pids - this can save up on a lot of copying from the ets.

Additionally match specs allow passing in multiple terms in the list - similar to a function having multiple heads. The currently implemented API removes this possibility - this is also a deficiency of all the other functions in the module.

@josevalim

This comment has been minimized.

Copy link
Member

commented Apr 20, 2019

@michalmuskala I agree with those points but at the same time we can't deny that the matchspecs API is hard to use. On the other hand, we can generally assume the other higher level APIs in this module will suffice for the majority of use cases, which means it is OK to have a low-level function.

With that said, my proposal is to go with what you originally proposed: let's receive all matchspecs and let's make sure the match part is three element tuple. We just need to discuss the names (once again, sorry) because the _match prefix is no longer equivalent to other functions. Two options are:

  • call it select, mimicking ets, and we document which matchspec to use if you want to simply get all entries
  • call it to_list and to_list_matchspec

If the lower level API is not enough, we can add more higher level APIs later. My other thought is to wait on the stream API for a bit, since using :ets.foldl may actually be the best way to stream data. So let's focus on this first problem later and we can talk about the next one later if necessary.

Thoughts?

@michalmuskala

This comment has been minimized.

Copy link
Member

commented Apr 20, 2019

Yeah, I don't expect those APIs to be used frequently, so I'd say it's not that bad that they are somewhat more complex to use - but I'm usually writing ETS match specs by hand, so I might be biased in that area 😅. I like the name select.

@joladev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 20, 2019

I like select too. I'll update the PR to expose only that function.

@josevalim just to make sure I understood you correctly, we're skipping any stream implementations for now?

@josevalim

This comment has been minimized.

Copy link
Member

commented Apr 20, 2019

@joladev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 20, 2019

Replaced to_list and to_list_match with select, updated tests. I'm holding off on improving the documentation until we're convinced this is the final approach.

The only downside I see now is that a spec like [{:_, [], [:"$_"]}] dumps the table in its original shape, ie {key, {pid, value}}. But I guess we can live with that.

@doc since: "1.x.x"
@spec select(registry, spec) :: [term]
def select(registry, spec) do
spec = to_internal_spec(spec)

This comment has been minimized.

Copy link
@josevalim

josevalim Apr 20, 2019

Member

Since this can be a list of specs, we need to do something like:

for {match, guards, return} <- specs do
  case match  do
    {_, _, _} -> match
    _ -> raise "..."
  end
end

We should probably validate that return does not contain "$_" too, for the reasons you mentioned.

This comment has been minimized.

Copy link
@michalmuskala

michalmuskala Apr 20, 2019

Member

I'm afraid this validacion can become very complicated because of the match spec format itself and then there are guards as well. I don’t think it will be practical to do this.

This comment has been minimized.

Copy link
@josevalim

josevalim Apr 20, 2019

Member

You mean the validation as a whole or the validation of the return part?

This comment has been minimized.

Copy link
@michalmuskala

michalmuskala Apr 20, 2019

Member

Of the return part - in theory it can contain arbitrarily nested lists, tuples, maps, call various ets-safe functions and probably a lot more. I don't believe we'd be able to get the validation right without some major effort.

@josevalim

This comment has been minimized.

Copy link
Member

commented Apr 20, 2019

@evadne

This comment has been minimized.

Copy link

commented Apr 20, 2019

FWIW, $_ is already flagged in match/4 and count_match/4 documentation.

Avoid usage of special match variables :"$_" and :"$$", because it might not work as expected.

@joladev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 21, 2019

Now thinking about documentation, some of the descriptions of match patterns is duplicated between match and count_match. Do we want to add that to select as well, or is it enough to refer to eg match for more information? Maybe we should add a link to the erlang match specification documentation http://erlang.org/doc/apps/erts/match_spec.html

for part <- spec do
case part do
{{key, pid, value}, guards, select} -> {{key, {pid, value}}, guards, select}
part -> part

This comment has been minimized.

Copy link
@josevalim

josevalim Apr 21, 2019

Member

We should raise on this branch. :)

This comment has been minimized.

Copy link
@joladev

joladev Apr 21, 2019

Author Contributor

Hmm, originally I wanted to support :_ which :ets.select does but considering we're recommending users not to use :"$_" in the body part it doesn't necessarily make sense to allow :_ in the match part. I can switch it to raise an argumenterror. We could have a branch with a guard to check that the first part of the tuple is an atom and raise in all other cases, but again, it might not make sense to allow :_

@josevalim

This comment has been minimized.

Copy link
Member

commented Apr 21, 2019

Do we want to add that to select as well, or is it enough to refer to eg match for more information?

I would duplicate the docs for two reasons:

  1. matchspecs are already complex so if we don't hvae to send the user aaround, it will be better

  2. the matchspecs for select are in a different format than the other ones, so we need to explain the whole list and the tuple properly :)

Thanks for all the work so far! ❤️

@joladev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 21, 2019

I'm enjoying it, thank you for all the guidance!

I've added guards for the arguments of select in the style of the other functions in this module. Added ArgumentError on match specs that don't have the shape we want.

Made a first attempt on the documentation, reusing some text from match with some modifications, and added a paragraph on the body part. It's a challenge to describe how match specs work without getting too verbose or getting stuck on details. The Erlang match specification page is long for a reason!

josevalim added some commits Apr 22, 2019

@ericmj

ericmj approved these changes Apr 22, 2019

@ericmj ericmj merged commit e261b42 into elixir-lang:master Apr 22, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@ericmj

This comment has been minimized.

Copy link
Member

commented Apr 22, 2019

Thank you! 💜

@joladev

This comment has been minimized.

Copy link
Contributor Author

commented Apr 22, 2019

Thank you all! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants
You can’t perform that action at this time.