Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

New dict module #250

Merged
merged 30 commits into from

3 participants

@alco
Owner

Here's what's cooking for the new dict protocol.

If you try to run tests, you'll see how they fail because of issue #247.

I'm also missing the ability to write docstrings for functions in a protocol declaration.


Next, I'm going to add Orddict implementation and support for Enum.

@alco alco commented on the diff
src/elixir_compiler.erl
@@ -312,6 +312,7 @@ core_list() ->
[
"lib/uri/parser.ex",
"lib/elixir/formatter.ex",
+ "lib/dict/gen_dict.ex",
@alco Owner
alco added a note

Without this line, the compiler tries to compile dict.ex first and gets screwed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@alco
Owner

Added Orddict. This is not a final version, obviously we have duplication is some places. Here are a few highlights:

  • all extend functions are exactly the same in Dict's and Orddict's implementations
  • all new functions, except for new/0, are also exactly the same because they are implemented in terms of extend
  • the tests for both modules differ only in the way dicts are constructed.

I'm not sure, how to solve the issue with extend duplication. Should I implement it for Any? But then I'll have to extract these functions into a separate protocol. I think the problem here is that Dict is implemented in terms of a record, and Orddict is based on a list. What if I implemented extend for Record? Would it then work for Dict.Record automatically?

As for the tests, I should probably define a macro that will accept functions for dict construction and produce a new module with tests. Any other suggestions?

lib/dict/dict.ex
((104 lines not shown))
+ """
+ def new(pairs) when is_list(pairs) do
+ GenDict.extend new(), pairs
+ end
+
+ @doc """
+ Creates a new Dict from a list of elements with the
+ help of the transformation function.
+
+ ## Examples
+
+ Dict.new ["a", "b"], fn(x) -> {x, x} end
+ #=> ["a": "a", "b": "b"]
+ """
+ def new(list, transform) when is_list(list) and is_function(transform) do
+ GenDict.extend new(), list, transform
@josevalim Owner

If you can, please refer to the protocol implementation directly, for example: GenDict.Dict.extend. This allows us to skip the protocol lookup and will be faster. :)

@alco Owner
alco added a note

This is one issue I was asking about. What I'd like to do is implement extend once, although I don't know exactly how. You may say that extend is one level of abstraction higher than other functions in GenDict protocol, because it is implemented in terms of one of those function (GenDict.put) and thus it can have one single implementation for all dict types.

@josevalim Owner

Yes, we should fix this when we also allow you to add doc strings to protocol functions.

@josevalim Owner

In any case, GenDict.extend was only an example, GenDict.put could also be written as GenDict.Dict.Record.put.

@alco Owner
alco added a note

What am I missing? https://gist.github.com/2380007

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@josevalim
Owner

1) I don't like the name GenDict much. In Erlang, Gen* are usually related to OTP behaviors and for now I would like to keep them as such. I would prefer if we named this module simply Dict (less to type) and named the other modules based on the implementation. For example, the current dict in Erlang could be named HashedDict, we would also have OrdDict and so on.

2) I see you want a way to document protocol functions. Could you please open up a new issue so we could discuss it there? :)

lib/dict/dict.ex
((71 lines not shown))
+
+ def update(dict, key, fun) do
+ dict.update_d(:dict.update key, fun, &1)
+ end
+
+ def update(dict, key, initial, fun) do
+ dict.update_d(:dict.update key, fun, initial, &1)
+ end
+end
+
+defmodule Dict do
+ @doc """
+ Creates a new empty Dict.
+ """
+ def new do
+ Dict.Record.new [d: :dict.new]
@josevalim Owner

One option is, instead of using the record, we could simply change the first element of the tuple returned by :dict.new to be Dict (or Dict.Record). This way we avoid creating a new unnecessary tuple (although creating a new tuple may be even faster than changing an existing one). Another option is to simply set a global reference, like this:

refer Erlang.dict, as: HashedDict

This way we wouldn't need to do any conversion. :)

@alco Owner
alco added a note

I imagine this would work thanks to the fact that Erlang's dict is implemented as a tagged tuple meaning that it looks like a record to Elixir? If we can rely on this, then I'll get rid of the redundant Dict.Record.

But for Orddict the implementation should still be defined as for: List, right?

@josevalim Owner

Exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@alco
Owner

1) I don't like the name GenDict much. In Erlang, Gen* are usually related to OTP behaviors and for now I would like to keep them as such. I would prefer if we named this module simply Dict (less to type) and named the other modules based on the implementation. For example, the current dict in Erlang could be named HashedDict, we would also have OrdDict and so on.

Erlang's documentation on dict does not explicitly say that it uses a hash, though I don't like GenDict either :) We could name the protocol Map or PDict or IDict. What do you think about the latter two, i.e. indicating the fact that X is a protocol by prepending it with P (for "protocol") or I (for "interface")?

@alco
Owner

I agree that we can dump extend, it's not essential.

On a side node, does Elixir support inheritance? If you look at how new functions are implemented, you'll notice that they are all implemented in terms of new/0. The logical thing to do is extract these definitions so that they don't repeat in different implementation. It is not possible to do so with protocols, because these new functions lack the dict argument to dispatch on. This is an obvious use-case for inheritance.

@josevalim
Owner

@alco the @overridable mechanism gives you something that feels like inheritance. Elixir takes the same approach as other functional languages which end up favoring delegation instead of inheritance. So I would suggest something along the same lines.

@alco
Owner

Sure, I can implement these function once in some other module, but I would still need to write delegation boilerplate once for each dict type. Inheritance allows me to write the boilerplate once ever.

defmodule Dict.Common do
  def new(dict_type, pairs) do
    # some logic that involves calling dict_type.new()
  end
end

defmodule Dict do
  def new do
    :dict.new
  end

  def new(pairs) do
    Dict.Common.new __MODULE__, pairs
  end
end

defmodule Orddict do
  def new do
    []
  end

  # This is _redundant_
  def new(pairs) do
    Dict.Common.new __MODULE__, pairs
  end
end
@josevalim
Owner
@alco
Owner

I think I've figured out how to do it with __using__. Sorry for my moaning.

I don't know why the tests that check for exception fail. It seems like the correct exception is triggered but it's not caught by assert_raises.

1) test_new_two_lists (DictTest)
  ** (ArgumentError) Both arguments must have equal size
  stacktrace:
    /Users/alco/Documents/git/elixir/test/elixir/dict_test.exs:133: DictTest.test_new_two_lists/0
    /Users/alco/Documents/git/elixir/lib/ex_unit/runner.ex:78: ExUnit.Runner.run_test/3
    /Users/alco/Documents/git/elixir/lib/enum.ex:631: Enum.do_each/3
    /Users/alco/Documents/git/elixir/lib/enum.ex:153: Enum.each/3
    /Users/alco/Documents/git/elixir/lib/enum.ex:148: Enum.each/2
    /Users/alco/Documents/git/elixir/lib/ex_unit/runner.ex:69: ExUnit.Runner.run_tests/2

2) test_new_two_lists (OrddictTest)
  ** (ArgumentError) Both arguments must have equal size
  stacktrace:
    /Users/alco/Documents/git/elixir/test/elixir/dict_test.exs:142: OrddictTest.test_new_two_lists/0
    /Users/alco/Documents/git/elixir/lib/ex_unit/runner.ex:78: ExUnit.Runner.run_test/3
    /Users/alco/Documents/git/elixir/lib/enum.ex:631: Enum.do_each/3
    /Users/alco/Documents/git/elixir/lib/enum.ex:153: Enum.each/3
    /Users/alco/Documents/git/elixir/lib/enum.ex:148: Enum.each/2
    /Users/alco/Documents/git/elixir/lib/ex_unit/runner.ex:69: ExUnit.Runner.run_tests/2
@josevalim
Owner

You need to pass a function to assert_raises:

assert_raises ArgumentError, fn ->
  1 + "foo"
end

:)

@alco
Owner

Added support for dicts to Enum. For those functions that make sense only for ordered collections, like split, take, drop, an error should be raised. I'm thinking to do this by using another protocol. For instance,

defprotocol Collection, [is_ordered(col)], only: [List, Record]

# default implementation
defimpl Collection, for: Record
  def is_ordered(_) do
    raise RuntimeError, message: "This collection is not ordered"
  end
end

@josevalim, what do you think?

@josevalim
Owner

I think we could use the same protocol. In this sense, we could add an ordered_iterator() function and mark it as optional, so not all protocols need to implement it. Currently we can't mark a protocol function as optional, so I will add it to the list of features we should add to protocols in short term.

@alco
Owner

Good idea. We don't need an optional function in this case. Optional functions only make sense in those cases when we determine at runtime whether they should be called.

I think it's mostly finished. One thing that still bothers me is this --> https://gist.github.com/2380007. I don't know why this doesn't work, so I have written implementations for Tuple (example) temporarily so that tests do pass.

Also Keyword is now obsolete. The literal syntax [a: 1, b: 2] produces an orddict with atoms for keys, but now we can also create a orddict with arbitrary keys in multiple ways:

Orddict.new {key, value}
Orddict.new [{key1, value}, {key2, value}]
Orddict.new [a, b, c], fn(x) -> {x, x} end
Orddict.new [key1, key2, key3], [value, value, value]
@josevalim
Owner

Good idea. We don't need an optional function in this case. Optional functions only make sense in those cases when we determine at runtime whether they should be called.

OK.

I think it's mostly finished. One thing that still bothers me is this --> https://gist.github.com/2380007. I don't know why this doesn't work, so I have written implementations for Tuple (example) temporarily so that tests do pass.

I understand why they don't pass, I will fix them later.

Also Keyword is now obsolete. The literal syntax [a: 1, b: 2] produces an orddict with atoms for keys, but now we can also create a orddict with arbitrary keys in multiple ways:

How? The problem in treating orddicts as lists is ambiguity in the access protocol. How was this problem solved?

@alco
Owner

The problem in treating orddicts as lists is ambiguity in the access protocol. How was this problem solved?

You have solved it :)

@josevalim
Owner

Ah, this is temporary. For instance, in the future we can have tuple access to slice a list or other features and we will have to keep them two different things anyway.

@alco
Owner

@josevalim I'm still against doing index-based access and slicing operations on lists with shortcut syntax. It all boils down to Erlang not having random-access lists -- we won't be able to provide an efficient implementation for list goodies such as slicing.

To me, being able to get a dict element with dict[{1, 3}] is much more important than slicing a list in linear time using list[{1, 3}]. But since Dict does not interfere with List, this is only an issue with Orddict and so it's up to you to decide its fate. Notice however that I have enabled the arbitrary term access for List, so I'll have to revert it.

@josevalim
Owner

But if we can have both, why don't we? All we need to do is to wrap the Orddict in a tuple. Also, the access protocol is just an example, we cannot ensure that List and Orddict will behave the same for all protocols, so this problem my reappear in another form later on.

@alco
Owner

Ok, I'll wrap Orddict in a tuple.

@alco
Owner

Update.

I'd like to replace these two implementations with one

defimpl Enum.Iterator, for: Record

However, when I do this, I get

** (UndefinedFunctionError) undefined function: Enum.Iterator.Any.iterator/1
@josevalim
Owner
@alco
Owner

OK.

What do you think Enum.join for dicts should look like? I can't think of it being used for data types other than lists, so it should be List.join in my opinion.

Python, for instance, joins the keys. But in Elixir we can leave it up to the user

d = Orddict.new [a: 1, b: 2, c: 3]

Enum.join Dict.keys(d), ","
#=> "a,b,c"

Enum.join Dict.values(d), ","
#=> "1,2,3"
@josevalim
Owner

Well, it doesn't work for dicts but it may work for arrays, sets and other data types. I think we should leave it as is (i.e. it will join the tuples for now) and we can come up with a better solution later.

@alco
Owner

(i.e. it will join the tuples for now)

It won't. to_binary is not implemented for tuples.

@josevalim
Owner

Hrm, right. In a way, it makes sense since we can't join any list, we need to ensure its elements are joinable. I guess we could make it an optional part of the protocol later.

@alco
Owner

In a way, it makes sense since we can't join any list, we need to ensure its elements are joinable.

We already ensure this using to_binary since it belongs to a protocol, namely Binary.Chars. And the doc string is pretty explicit about it:

 All items in the collection must be convertible to binary, otherwise an error is raised.

Give a moment to fix the test for Enum.join.

@alco
Owner

Is there anything else I should fix for these modules?

@josevalim
Owner
@travisbot

This pull request passes (merged d1e8aba3 into 2851da4).

@alco
Owner

Thanks, travis! I've rebased on master and fixed the compiler warnings in dict_test.exs

@travisbot

This pull request passes (merged 7daf5807 into e612fb7).

@travisbot

This pull request passes (merged e219104c into e612fb7).

lib/enum.ex
((8 lines not shown))
- defp iterate([h|t]) do
- { h, t }
+ def to_list(list), do: list
+
+ def iterate({ [h|t], ctor }) do
+ { h, {t, ctor} }
+ end
+
+ def iterate({ [], ctor }) do
@josevalim Owner

Couldn't we make this simpler? I don't like much that we need to pass the constructor around. Even more, I dislike we need to pass the constructor around for every iteration, when we probably don't need them.

I believe a better way to solve this problem is to have a way to retrieve the protocol and its collections. So we could do something like:

Enum.Iterator.__apply__(:iterator, [list])

That would return both the result of the invocation and the implementation module it was invoked on, allowing us to access the constructor straight from the source if we need to.

Wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
lib/enum.ex
((52 lines not shown))
+ import Enum.Iterator.List, only: [iterate: 1]
+
+ def iterator(dict) do
+ { iterate(&1), iterate({ to_list(dict), fn(pairs, do: extend(Dict.empty(dict), pairs)) }) }
+ end
+
+ def to_list(dict), do: Dict.to_list(dict)
+
+ defp extend(dict, pairs) do
+ List.foldl pairs, dict, fn(pair, dict) ->
+ Dict.put dict, pair
+ end
+ end
+end
+
+defimpl Enum.OrdIterator, for: HashDict.Record do
@josevalim Owner

Can we remove this protocol and instead simply have a Enum.Iterator altogether?

@alco Owner
alco added a note

There is a bug in this code, I should delete the defimpl Enum.OrdIterator, for: HashDict.Record.

The idea is as follows. When the user passes a dict to one of the functions that require an ordered iterator, they'll get the error ** (UndefinedFunctionError) undefined function: Enum.OrdIterator.Any.ordered_iterator/1. If it makes sense for their application, they could define an implementation of Enum.OrdIterator for the dict. And besides, it is only used in a few functions, so it should not be obligatory for users to implement ordered_iterator for their types if they don't want to. This is why I keep this function in a protocol of its own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@josevalim
Owner

I think we are close to get this merged. I have added some concerns regarding the constructor in Enum and right now I will start working on the new protocols, so we can get this all tidy up and merged.

@josevalim
Owner

Ok, I have pushed a function that returns you the implementation for a given protocol and argument:

Enum.Iterator.__impl_for__!([1,2,3])
#=> Enum.Iterator.List

This may help you do a couple things so you don't need the constructor around. Another option is to simply pass the original data structure around.

It is important to notice that the Enum module has a bunch of functions that receives an iterator and a pointer as arguments and I would like to say that you are free to change/remove these functions in any way. If not in this pull request, I am going to mark them as private next because I think it is to soon to expose such functionality and given that we are increasing our dependency on the protocol, exposing such functions becomes harder and harder.

@josevalim
Owner

Ok, the new protocol style is in. Could you please rebase this pull request? I should merge it today or tomorrow. :)

@alco
Owner

Ok, I'll review the code when I have time. Should be ready by tomorrow.

@alco
Owner

I don't see how __impl_for__ solves the issue with the need to pass the constructor around. Erlang's dict module does not offer any traversal mechanism. I can either get all keys or convert it to a list.

So by the time a do_... function in Enum is invoked, there is nothing left from the original dict except for the constructor function. I could replace the constructor with the original dict, but it would be the same -- we still need to operate on the list of {key, value} tuples and merge them into an empty dict at the end.

alco added some commits
@alco alco New GenDict protocol 6a35725
@alco alco New Dict module wrapping Erlang.dict f84dd2e
@alco alco Add tests for GenDict.merge 8cf4997
@alco alco Implement GenDict.extend and GenDict.update e7d0344
@alco alco Implement Dict.from_dict 136a2f5
@alco alco Use capitalized name for Dict instances 3bb3e51
@alco alco Implement Orddict 9be17e3
@alco alco Remove duplication and cleanup the code 8b9413b
@alco alco Add a small optimization in `put` invocations ca34986
@alco alco Clean up the code a bit more 71d0c78
@alco alco Add tests for size, update and empty 4cce433
@alco alco Additional tests for Dict.new and Orddict.new b8545c5
@alco alco Fix assert_raises usage 7b78e19
@alco alco Implement arbitrary term access for List
This also makes the Keyword module obsolete since we can now use
Orddict.
53da7c6
@alco alco Document the usage of tricky ref d783b65
@alco alco Implement PDict.to_list
This is going to be useful in Enum
ebbb1a3
@alco alco Generalize the Enum module to support dicts d79ba92
@alco alco Add `ordered_iterator` function
Some of the Enum's functions assume an ordering is defined for the collection.
Since ordering is not defined for all types, using `ordered_iterator`
allows us to raise a runtime error for those types.
2c4f17f
@alco alco Review the docs for Enum functions bf28eb0
@alco alco Rename Dict -> HashDict and encapsulate it in a record 5e92c8e
@alco alco Fix issues with Enum protocol for dicts ad8113a
@alco alco Revert "Implement arbitrary term access for List"
This reverts commit d859057ae93558bdd4a31e31c53bc359c5d41a44.

Discussion:
elixir-lang#250 (comment)
5b2ae4a
@alco alco Keep Orddict.values in order 51d1f12
@alco alco Test that Enum.join does not work for dicts 25f5fa6
@alco alco Fix compiler warnings with default argument a90a654
@alco alco Use the new defprotocol for Dict ff4c159
@alco alco assert_raises -> assert_raise 327c0a4
@alco alco Reimplement the Enum iterator 707d2c8
@alco alco Comment out one assert in enum_test 82b81c6
@alco alco Remove the paragraph about custom iterator e261aab
@travisbot

This pull request fails (merged e261aab into d997b24).

@alco
Owner

I've redone the iterator implementation.

What confuses me is the behavior of assert_raise. When I try to use one of the functions that work with an ordered iterator on a hash dict, the following exception is raised:

** (UndefinedFunctionError) undefined function: Enum.OrdIterator.Any.ordered_iterator/1

However, all of the assert_raise calls in enum_test.exs expect ArgumentError and yet they all catch the UndefinedFunctionError. I've commented out one assert statement, so you can run the tests and see it for yourself.

@alco
Owner

Oh, Travis has already tested this for you :)

@josevalim
Owner

@alco seems to be a bug in rescue, working on it

@alco alco referenced this pull request
Merged

A newly updated dict module #267

@josevalim josevalim merged commit e261aab into elixir-lang:master
@alco alco deleted the alco:new-dict-module branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Apr 26, 2012
  1. @alco

    New GenDict protocol

    alco authored
  2. @alco
  3. @alco

    Add tests for GenDict.merge

    alco authored
  4. @alco
  5. @alco

    Implement Dict.from_dict

    alco authored
  6. @alco
  7. @alco

    Implement Orddict

    alco authored
  8. @alco
  9. @alco
  10. @alco

    Clean up the code a bit more

    alco authored
  11. @alco
  12. @alco
  13. @alco

    Fix assert_raises usage

    alco authored
  14. @alco

    Implement arbitrary term access for List

    alco authored
    This also makes the Keyword module obsolete since we can now use
    Orddict.
  15. @alco

    Document the usage of tricky ref

    alco authored
  16. @alco

    Implement PDict.to_list

    alco authored
    This is going to be useful in Enum
  17. @alco
  18. @alco

    Add `ordered_iterator` function

    alco authored
    Some of the Enum's functions assume an ordering is defined for the collection.
    Since ordering is not defined for all types, using `ordered_iterator`
    allows us to raise a runtime error for those types.
  19. @alco
  20. @alco
  21. @alco
  22. @alco

    Revert "Implement arbitrary term access for List"

    alco authored
    This reverts commit d859057ae93558bdd4a31e31c53bc359c5d41a44.
    
    Discussion:
    elixir-lang#250 (comment)
  23. @alco

    Keep Orddict.values in order

    alco authored
  24. @alco
  25. @alco
  26. @alco

    Use the new defprotocol for Dict

    alco authored
  27. @alco

    assert_raises -> assert_raise

    alco authored
  28. @alco

    Reimplement the Enum iterator

    alco authored
  29. @alco
  30. @alco
Something went wrong with that request. Please try again.