Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map types representation #13512

Merged
merged 5 commits into from Apr 29, 2024
Merged

Map types representation #13512

merged 5 commits into from Apr 29, 2024

Conversation

gldubc
Copy link
Member

@gldubc gldubc commented Apr 24, 2024

Introducing a representation for map types, closed or open, with required or optional atom keys.

The representation is by disjunctive normal forms, that is, lists of pairs {positive_map, negative_maps}. A map type is the union of each element of the list, and each element is the difference of the positive_map with the negative_maps.

For instance, %{a: integer()} and not %{..., b: atom()} is stored as the pair with a positive {:closed, %{a: integer()}} and a negative {:open, %{b: atom()}}.

Set-theoretic operations constitute in distributing or, and, and not along those lists.

To handle optional fields, bit 10 of the bitmap is used to represent the not_set() special type. A key may be absent if and only if its value type contains not_set().

TODO, after that

  • add domain types (%{atom() => if_set(integer())})
  • if needed, stronger normalization for pretty-printing

# "%{..., :a => not_set()}"

assert map(a: integer(), b: atom()) |> to_quoted_string() ==
"%{:a => integer(), :b => atom()}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is flaky due to non-deterministic key ordering, I think we could sort them in to_quoted?

Comment on lines +318 to +319
# dynamic() and %{..., :a => integer(), b: not_set()}
t = intersection(dynamic(), map([a: integer(), c: not_set()], :open))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

Suggested change
# dynamic() and %{..., :a => integer(), b: not_set()}
t = intersection(dynamic(), map([a: integer(), c: not_set()], :open))
# dynamic() and %{..., :a => integer(), c: not_set()}
t = intersection(dynamic(), map([a: integer(), c: not_set()], :open))

@@ -45,7 +55,10 @@ defmodule Module.Types.Descr do
def integer(), do: %{bitmap: @bit_integer}
def float(), do: %{bitmap: @bit_float}
def fun(), do: %{bitmap: @bit_fun}
def map(), do: %{bitmap: @bit_map}
def map(pairs, open_or_closed), do: %{map: map_new(open_or_closed, pairs)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the test cases, I wonder if it wouldn't be simpler to have an open_map and a closed_map constructor?

optional_a_integer_closed = map([a: if_set(integer())], :closed)
assert equal?(intersection(map(a: integer()), optional_a_integer_closed), map(a: integer()))

assert empty?(intersection(map(a: integer()), map(a: atom())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been playing with intersections a bit, I think these tests are interesting too:

map([a: integer()], :open) |> intersection(map([b: integer()], :open))
# "%{..., :a => integer(), :b => integer()}"
map([a: integer()], :open) |> intersection(map([b: integer()], :closed))
# "none()"

Comment on lines +632 to +633
# Union is list concatenation
defp map_union(dnf1, dnf2), do: dnf1 ++ dnf2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is something which can be done to prevent these lists from ballooning when calling with similar types (e.g. union([a], [b]) where one is the subtype of the other could just be a or b).

t = map([a: integer()], :open)
union(t, t) |> union(t) |> to_quoted_string()
"%{..., :a => integer()} or %{..., :a => integer()} or %{..., :a => integer()}"

Having long lists will make all other operations expensive and might also become an issue when generating error messages.
That being said, I understand that the structure for maps is quite complex so might be difficult to have a nice normalization step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry still thinking aloud, not sure if the suggestion would work.

Or perhaps with a slightly different representation, we could have all common keys factored out.

e.g.

union(
  map([a: integer(), b: integer()], :open),
  map([a: float(), c: float()], :open)
)

instead of the current

[
  {:open, %{a: %{bitmap: 4}, b: %{bitmap: 4}}, []},
  {:open, %{c: %{bitmap: 8}, a: %{bitmap: 8}}, []}
]

would be

{
  %{a: %{bitmap: 12},  # new common map - union of possible values for `:a`
  [{:open, %{b: %{bitmap: 4}}, []},  {:open, %{c: %{bitmap: 8}}, []}]  # existing list
}

The invariant being that no key can be present in all maps of the list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sabiwara I think the biggest question is how common those are. We could try to optimize for those cases but, if they are rare, then we are just adding complexity. :(

The other point is that, what happens in this case:

union(
  map([a: integer(), b: integer()], :open),
  map([a: float(), c: float()], :open),
  map([b: float(), c: float()], :open),
)

Because there is nothing shared between all three, you'd need a tree structure of the shape you provide, which adds a lot of complexity.

That said, I think we should probably apply unions when all of the keys match, but that's how far I would push it for now (either on the union itself OR during pretty printing).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there is nothing shared between all three, you'd need a tree structure of the shape you provide, which adds a lot of complexity.

Yes, in this kind of case it should be separate lists. But I'd expect many real world case to be lists of maps with mostly the same keys (e.g. if you have list of ecto structs with each key being sometimes an int/string, sometimes nil).

Comment on lines +718 to +724
try do
# keys that are present in the negative map, but not in the positive one
for {neg_key, neg_type} <- neg_fields, not is_map_key(fields, neg_key) do
cond do
# key is required, and the positive map is closed: empty intersection
tag == :closed and not is_optional?(neg_type) ->
throw(:no_intersection)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we are "looping" just to find if a condition is satisfied, we should consider Enum.any?/2 rather than a for with throw which is usually discouraged.
Enum.any will also bail at the first truthy return so should be exactly what you want.

Another benefit is that otherwise your second list will build a list which is wasteful since you don't care about the list.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will send a follow-up PR for these!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a bug in this cond. The tag can be closed and the type is optional.

Comment on lines +829 to +830
try do
for {neg_key, neg_type} when not is_map_key(fields, neg_key) <- neg_fields do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about Enum.any

@josevalim
Copy link
Member

I will merge this so we can all concurrently work on improvements and optimizations.

@josevalim josevalim merged commit 5911a98 into elixir-lang:main Apr 29, 2024
8 of 9 checks passed
@josevalim josevalim deleted the map-no-bdd branch April 29, 2024 11:37
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants