Map types representation #13512

gldubc · 2024-04-24T11:40:45Z

Introducing a representation for map types, closed or open, with required or optional atom keys.

The representation is by disjunctive normal forms, that is, lists of pairs {positive_map, negative_maps}. A map type is the union of each element of the list, and each element is the difference of the positive_map with the negative_maps.

For instance, %{a: integer()} and not %{..., b: atom()} is stored as the pair with a positive {:closed, %{a: integer()}} and a negative {:open, %{b: atom()}}.

Set-theoretic operations constitute in distributing or, and, and not along those lists.

To handle optional fields, bit 10 of the bitmap is used to represent the not_set() special type. A key may be absent if and only if its value type contains not_set().

TODO, after that

add domain types (%{atom() => if_set(integer())})
if needed, stronger normalization for pretty-printing

lib/elixir/lib/module/types/descr.ex

sabiwara · 2024-04-28T07:58:50Z

lib/elixir/test/elixir/module/types/descr_test.exs

+      #          "%{..., :a => not_set()}"
+
+      assert map(a: integer(), b: atom()) |> to_quoted_string() ==
+               "%{:a => integer(), :b => atom()}"


This test is flaky due to non-deterministic key ordering, I think we could sort them in to_quoted?

sabiwara · 2024-04-28T08:01:03Z

lib/elixir/test/elixir/module/types/descr_test.exs

+      # dynamic() and %{..., :a => integer(), b: not_set()}
+      t = intersection(dynamic(), map([a: integer(), c: not_set()], :open))


typo?

Suggested change

# dynamic() and %{..., :a => integer(), b: not_set()}

t = intersection(dynamic(), map([a: integer(), c: not_set()], :open))

# dynamic() and %{..., :a => integer(), c: not_set()}

t = intersection(dynamic(), map([a: integer(), c: not_set()], :open))

sabiwara · 2024-04-28T08:06:48Z

lib/elixir/lib/module/types/descr.ex

@@ -45,7 +55,10 @@ defmodule Module.Types.Descr do
  def integer(), do: %{bitmap: @bit_integer}
  def float(), do: %{bitmap: @bit_float}
  def fun(), do: %{bitmap: @bit_fun}
-  def map(), do: %{bitmap: @bit_map}
+  def map(pairs, open_or_closed), do: %{map: map_new(open_or_closed, pairs)}


Looking at the test cases, I wonder if it wouldn't be simpler to have an open_map and a closed_map constructor?

sabiwara · 2024-04-29T07:11:42Z

lib/elixir/test/elixir/module/types/descr_test.exs

+      optional_a_integer_closed = map([a: if_set(integer())], :closed)
+      assert equal?(intersection(map(a: integer()), optional_a_integer_closed), map(a: integer()))
+
+      assert empty?(intersection(map(a: integer()), map(a: atom())))


I've been playing with intersections a bit, I think these tests are interesting too:

map([a: integer()], :open) |> intersection(map([b: integer()], :open)) # "%{..., :a => integer(), :b => integer()}"

map([a: integer()], :open) |> intersection(map([b: integer()], :closed)) # "none()"

sabiwara · 2024-04-29T07:19:52Z

lib/elixir/lib/module/types/descr.ex

+  # Union is list concatenation
+  defp map_union(dnf1, dnf2), do: dnf1 ++ dnf2


I wonder if there is something which can be done to prevent these lists from ballooning when calling with similar types (e.g. union([a], [b]) where one is the subtype of the other could just be a or b).

t = map([a: integer()], :open) union(t, t) |> union(t) |> to_quoted_string() "%{..., :a => integer()} or %{..., :a => integer()} or %{..., :a => integer()}"

Having long lists will make all other operations expensive and might also become an issue when generating error messages.
That being said, I understand that the structure for maps is quite complex so might be difficult to have a nice normalization step.

Sorry still thinking aloud, not sure if the suggestion would work.

Or perhaps with a slightly different representation, we could have all common keys factored out.

e.g.

union( map([a: integer(), b: integer()], :open), map([a: float(), c: float()], :open) )

instead of the current

[ {:open, %{a: %{bitmap: 4}, b: %{bitmap: 4}}, []}, {:open, %{c: %{bitmap: 8}, a: %{bitmap: 8}}, []} ]

would be

{ %{a: %{bitmap: 12}, # new common map - union of possible values for `:a` [{:open, %{b: %{bitmap: 4}}, []}, {:open, %{c: %{bitmap: 8}}, []}] # existing list }

The invariant being that no key can be present in all maps of the list.

@sabiwara I think the biggest question is how common those are. We could try to optimize for those cases but, if they are rare, then we are just adding complexity. :(

The other point is that, what happens in this case:

union( map([a: integer(), b: integer()], :open), map([a: float(), c: float()], :open), map([b: float(), c: float()], :open), )

Because there is nothing shared between all three, you'd need a tree structure of the shape you provide, which adds a lot of complexity.

That said, I think we should probably apply unions when all of the keys match, but that's how far I would push it for now (either on the union itself OR during pretty printing).

Because there is nothing shared between all three, you'd need a tree structure of the shape you provide, which adds a lot of complexity.

Yes, in this kind of case it should be separate lists. But I'd expect many real world case to be lists of maps with mostly the same keys (e.g. if you have list of ecto structs with each key being sometimes an int/string, sometimes nil).

sabiwara · 2024-04-29T07:26:52Z

lib/elixir/lib/module/types/descr.ex

+    try do
+      # keys that are present in the negative map, but not in the positive one
+      for {neg_key, neg_type} <- neg_fields, not is_map_key(fields, neg_key) do
+        cond do
+          # key is required, and the positive map is closed: empty intersection
+          tag == :closed and not is_optional?(neg_type) ->
+            throw(:no_intersection)


I think if we are "looping" just to find if a condition is satisfied, we should consider Enum.any?/2 rather than a for with throw which is usually discouraged.
Enum.any will also bail at the first truthy return so should be exactly what you want.

Another benefit is that otherwise your second list will build a list which is wasteful since you don't care about the list.

Will send a follow-up PR for these!

I think there is a bug in this cond. The tag can be closed and the type is optional.

sabiwara · 2024-04-29T07:29:03Z

lib/elixir/lib/module/types/descr.ex

+    try do
+      for {neg_key, neg_type} when not is_map_key(fields, neg_key) <- neg_fields do


Same comment about Enum.any

josevalim · 2024-04-29T11:33:17Z

I will merge this so we can all concurrently work on improvements and optimizations.

josevalim · 2024-04-29T11:37:44Z

💚 💙 💜 💛 ❤️

Static map types without bdds

ecbbd20

lukaszsamson reviewed Apr 24, 2024

View reviewed changes

lib/elixir/lib/module/types/descr.ex Outdated Show resolved Hide resolved

Simplify split on key

50bc68a

josevalim reviewed Apr 24, 2024

View reviewed changes

lib/elixir/lib/module/types/descr.ex Outdated Show resolved Hide resolved

josevalim reviewed Apr 24, 2024

View reviewed changes

lib/elixir/lib/module/types/descr.ex Outdated Show resolved Hide resolved

josevalim reviewed Apr 24, 2024

View reviewed changes

lib/elixir/lib/module/types/descr.ex Outdated Show resolved Hide resolved

josevalim reviewed Apr 24, 2024

View reviewed changes

lib/elixir/lib/module/types/descr.ex Show resolved Hide resolved

josevalim reviewed Apr 24, 2024

View reviewed changes

lib/elixir/lib/module/types/descr.ex Outdated Show resolved Hide resolved

gldubc added 3 commits April 25, 2024 14:26

Rewrite map intersection, remove is_opt

b2fc865

Use triples in DNFs

fadee01

Add coverage examples + doc

04e4fab

sabiwara reviewed Apr 28, 2024

View reviewed changes

sabiwara reviewed Apr 29, 2024

View reviewed changes

josevalim merged commit 5911a98 into elixir-lang:main Apr 29, 2024
8 of 9 checks passed

josevalim deleted the map-no-bdd branch April 29, 2024 11:37

hhjmmmmmmmm approved these changes May 3, 2024

View reviewed changes

sabiwara mentioned this pull request May 14, 2024

Refactor boolean logic to use Enum.all?/2 instead of throwing #13560

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Map types representation #13512

Map types representation #13512

gldubc commented Apr 24, 2024

sabiwara Apr 28, 2024

sabiwara Apr 28, 2024

sabiwara Apr 28, 2024

sabiwara Apr 29, 2024

sabiwara Apr 29, 2024

sabiwara Apr 29, 2024

josevalim Apr 29, 2024

sabiwara Apr 29, 2024

sabiwara Apr 29, 2024

jeanklingler Apr 29, 2024

josevalim Apr 29, 2024

sabiwara Apr 29, 2024

josevalim commented Apr 29, 2024

josevalim commented Apr 29, 2024

		# dynamic() and %{..., :a => integer(), b: not_set()}
		t = intersection(dynamic(), map([a: integer(), c: not_set()], :open))

		# Union is list concatenation
		defp map_union(dnf1, dnf2), do: dnf1 ++ dnf2

		try do
		for {neg_key, neg_type} when not is_map_key(fields, neg_key) <- neg_fields do

Map types representation #13512

Map types representation #13512

Conversation

gldubc commented Apr 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josevalim commented Apr 29, 2024

josevalim commented Apr 29, 2024