New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map types representation #13512
Map types representation #13512
Conversation
# "%{..., :a => not_set()}" | ||
|
||
assert map(a: integer(), b: atom()) |> to_quoted_string() == | ||
"%{:a => integer(), :b => atom()}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is flaky due to non-deterministic key ordering, I think we could sort them in to_quoted?
# dynamic() and %{..., :a => integer(), b: not_set()} | ||
t = intersection(dynamic(), map([a: integer(), c: not_set()], :open)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
# dynamic() and %{..., :a => integer(), b: not_set()} | |
t = intersection(dynamic(), map([a: integer(), c: not_set()], :open)) | |
# dynamic() and %{..., :a => integer(), c: not_set()} | |
t = intersection(dynamic(), map([a: integer(), c: not_set()], :open)) |
@@ -45,7 +55,10 @@ defmodule Module.Types.Descr do | |||
def integer(), do: %{bitmap: @bit_integer} | |||
def float(), do: %{bitmap: @bit_float} | |||
def fun(), do: %{bitmap: @bit_fun} | |||
def map(), do: %{bitmap: @bit_map} | |||
def map(pairs, open_or_closed), do: %{map: map_new(open_or_closed, pairs)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the test cases, I wonder if it wouldn't be simpler to have an open_map
and a closed_map
constructor?
optional_a_integer_closed = map([a: if_set(integer())], :closed) | ||
assert equal?(intersection(map(a: integer()), optional_a_integer_closed), map(a: integer())) | ||
|
||
assert empty?(intersection(map(a: integer()), map(a: atom()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been playing with intersections a bit, I think these tests are interesting too:
map([a: integer()], :open) |> intersection(map([b: integer()], :open))
# "%{..., :a => integer(), :b => integer()}"
map([a: integer()], :open) |> intersection(map([b: integer()], :closed))
# "none()"
# Union is list concatenation | ||
defp map_union(dnf1, dnf2), do: dnf1 ++ dnf2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is something which can be done to prevent these lists from ballooning when calling with similar types (e.g. union([a], [b])
where one is the subtype of the other could just be a
or b
).
t = map([a: integer()], :open)
union(t, t) |> union(t) |> to_quoted_string()
"%{..., :a => integer()} or %{..., :a => integer()} or %{..., :a => integer()}"
Having long lists will make all other operations expensive and might also become an issue when generating error messages.
That being said, I understand that the structure for maps is quite complex so might be difficult to have a nice normalization step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry still thinking aloud, not sure if the suggestion would work.
Or perhaps with a slightly different representation, we could have all common keys factored out.
e.g.
union(
map([a: integer(), b: integer()], :open),
map([a: float(), c: float()], :open)
)
instead of the current
[
{:open, %{a: %{bitmap: 4}, b: %{bitmap: 4}}, []},
{:open, %{c: %{bitmap: 8}, a: %{bitmap: 8}}, []}
]
would be
{
%{a: %{bitmap: 12}, # new common map - union of possible values for `:a`
[{:open, %{b: %{bitmap: 4}}, []}, {:open, %{c: %{bitmap: 8}}, []}] # existing list
}
The invariant being that no key can be present in all maps of the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sabiwara I think the biggest question is how common those are. We could try to optimize for those cases but, if they are rare, then we are just adding complexity. :(
The other point is that, what happens in this case:
union(
map([a: integer(), b: integer()], :open),
map([a: float(), c: float()], :open),
map([b: float(), c: float()], :open),
)
Because there is nothing shared between all three, you'd need a tree structure of the shape you provide, which adds a lot of complexity.
That said, I think we should probably apply unions when all of the keys match, but that's how far I would push it for now (either on the union itself OR during pretty printing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there is nothing shared between all three, you'd need a tree structure of the shape you provide, which adds a lot of complexity.
Yes, in this kind of case it should be separate lists. But I'd expect many real world case to be lists of maps with mostly the same keys (e.g. if you have list of ecto structs with each key being sometimes an int/string, sometimes nil
).
try do | ||
# keys that are present in the negative map, but not in the positive one | ||
for {neg_key, neg_type} <- neg_fields, not is_map_key(fields, neg_key) do | ||
cond do | ||
# key is required, and the positive map is closed: empty intersection | ||
tag == :closed and not is_optional?(neg_type) -> | ||
throw(:no_intersection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we are "looping" just to find if a condition is satisfied, we should consider Enum.any?/2
rather than a for
with throw
which is usually discouraged.
Enum.any
will also bail at the first truthy return so should be exactly what you want.
Another benefit is that otherwise your second list will build a list which is wasteful since you don't care about the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will send a follow-up PR for these!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a bug in this cond. The tag can be closed and the type is optional.
try do | ||
for {neg_key, neg_type} when not is_map_key(fields, neg_key) <- neg_fields do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about Enum.any
I will merge this so we can all concurrently work on improvements and optimizations. |
💚 💙 💜 💛 ❤️ |
Introducing a representation for map types, closed or open, with required or optional atom keys.
The representation is by disjunctive normal forms, that is, lists of pairs
{positive_map, negative_maps}
. A map type is the union of each element of the list, and each element is the difference of thepositive_map
with thenegative_maps
.For instance,
%{a: integer()} and not %{..., b: atom()}
is stored as the pair with a positive{:closed, %{a: integer()}}
and a negative{:open, %{b: atom()}}
.Set-theoretic operations constitute in distributing
or
,and
,and not
along those lists.To handle optional fields, bit 10 of the bitmap is used to represent the
not_set()
special type. A key may be absent if and only if its value type containsnot_set()
.TODO, after that
%{atom() => if_set(integer())}
)