Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of memory vars in map/set literals are subject to Clojure semantics (order not preserved except for small values) #97

Closed
yuhan0 opened this issue Jan 3, 2020 · 5 comments

Comments

@yuhan0
Copy link
Contributor

yuhan0 commented Jan 3, 2020

The interaction between Meander's "left-to-right, top-to-bottom" evaluation strategy and the unordered nature of Clojure's map and set literals could cause the following type of edge case:

(m/rewrite [1 2 3 4]
  [!n ...]
  {1 !n
   2 !n
   3 !n
   4 !n})
;; => {1 1, 2 2, 3 3, 4 4}

This appears to work fine until more than 8 elements are introduced, causing Clojure to switch from an ordered ArrayMap to unordered HashMap implementation.

(m/rewrite [1 2 3 4 5 6 7 8 9]
  [!n ...]
  {1 !n
   2 !n
   3 !n
   4 !n
   5 !n
   6 !n
   7 !n
   8 !n
   9 !n})
;; => {7 2, 1 6, 4 3, 6 9, 3 5, 2 1, 9 7, 5 8, 8 4}

@noprompt and @jimmyhmiller suggested using & syntax on RHS to guarantee evaluation order:

(m/rewrite [1 2 3 4 5 6 7 8 9]
  [!n ...]
  {& [[1 !n]
      [2 !n]
      [3 !n]
      [4 !n]
      [5 !n]
      [6 !n]
      [7 !n]
      [8 !n]
      [9 !n]]})

And similarly m/and on the LHS:
{1 !n ,,, 9!n} -> (m/and {1 !n} ,,, {9 !n})

I suggest introducing a check during compilation that would detect if the same memory-var is being referenced across more than one branch of a map or set literal, and then flag it as a warning/error due to the undefined semantics. (hopefully this does not prove too restrictive to cases where one does not care about order)

@noprompt
Copy link
Owner

noprompt commented Jan 3, 2020

I suggest introducing a check during compilation that would detect if the same memory-var is being referenced across more than one branch of a map or set literal, and then flag it as a warning/error due to the undefined semantics.

I disagree then semantics are undefined in such cases, however, I would agree they are nondeterministic and hear your concern. It is well known that map literals are read by the reader as either PeristentArrayMaps or PersistentHashMaps and, with respect to the latter, Clojure makes no promise about key ordering. Due to this property (which sets also possess), Meander does not make any promises about how a map pattern is matched either; it simply follows the order provided by PersistentHashMap.

I agree with your concern that a warning/error might "prove too restrictive to cases where one does not care about order" and further add that I think it would be difficult to determine the set of patterns involving memory variables and PersistentHashMaps that should produce a warning. It could take a while to figure that out and I think "it has memory variables" is too sensitive a trigger.

Going back to ordering problem, the common prescription for constructing a large, insertion ordered map is to use the array-map function e.g.

(apply array-map '(k l m n o p q r s t a b c d e f g h i j ))
;; => {k l, m n, o p, q r, s t, a b, c d, e f, g h, i j}
(class *1)
;; => clojure.lang.PersistentArrayMap

Perhaps we could do something similar and thus could compromise in the following ways.

  1. Introduce the meander.epsilon/array-map pattern constructor which creates a map pattern using clojure.core/array-map function.
  2. Ensure that map patterns backed by PersistentArrayMaps are matched in insertion order.
  3. Recommend in the documentation that either & or m/array-map be used when dealing with large map patterns and concern for ordering.

@noprompt noprompt changed the title Multiple uses of memory vars in map/set literals causes undefined behavior Multiple uses of memory vars in map/set literals causes nondeterministic behavior Jan 3, 2020
@yuhan0
Copy link
Contributor Author

yuhan0 commented Jan 3, 2020

Oh yes, "nondeterministic" is a much better way of putting what I had in mind, thanks for clarifying that :)

I think "it has memory variables" is too sensitive a trigger.

That's definitely true, my first thought of a heuristic was something like the following

(when (map? pattern)
  (let [mvar-sets
        (for [[k v] pattern
              :when (not= '& k)]
          (set (concat (extract-mvars k) (extract-mvars v))))]
    (assert (pairwise-disjoint? mvar-sets))))

The other options also sound good, I admit it's not much of an issue to someone familiar with Clojure's reader, and I only came across it while thinking about the implications of memory var semantics. So maybe introducing a new operator just to support this edge-case isn't necessary?

@timothypratley timothypratley changed the title Multiple uses of memory vars in map/set literals causes nondeterministic behavior Use of memory vars in map/set literals are subject to Clojure semantics (order not preserved except for small values) Jan 4, 2020
@timothypratley
Copy link
Collaborator

Given nothing can be done about the Clojure reader, disallowing memory variables in maps seems reasonable to me.

It seems to me that whenever you encounter an "non-entity map" in meander you are almost always better off treating it as a sequence of key-value pairs. This is achieved with m/seqable on the matching pattern and various tricks on the construction side. We would benefit from a symmetric way to work with key-value pairs directly.

Match only maps to m/kvs, and construct m/kvs to maps when substituting. Is this possible? Then I can write things like:

(m/rewrite
  {1 2 3 4}
  (m/kvs ([!k !v] ...))
  (m/kvs ([!v !k] ...)))

Alternatively, is it possible to make the {& ([k v])} syntax symmetrical?

(m/rewrite {1 2 3 4}
           {& ([!k !v] ...)}
           {& ([!v !k] ...)})

This has the advantage of visually queuing me in that its a map, but right now it doesn't work (not sure why... should it work??)

@noprompt
Copy link
Owner

noprompt commented Jan 4, 2020

disallowing memory variables in maps seems reasonable to me.

There are plenty of cases where memory variables in map patterns are specifically useful, convenient, and deterministic. Eliminating them because they're unpredictable in certain situations is not a solution.

Alternatively, is it possible to make the {& ([k v])} syntax symmetrical?

I believe you're asking if we could make match support that syntax and I think, yes, we could.

Another way to get the symmetry you crave with the map-of syntax is by implementing the substitution portion as follows.

(defsyntax map-of [k-pattern v-pattern]
  (cond
    (match-syntax? &env)
    `(with [%map# {~k-pattern ~v-pattern & (or '{} %map#)}]
       %map#)

    (subst-syntax? &env)
    `(with [%map# {~k-pattern ~v-pattern & %map#}]
       %map#)

    :else
    &form))

Using your example

(m/rewrite {:foo "bar", :baz "quux"}
  (m/map-of !k !v)
  (m/map-of !v !k))
;; =>
{"quux" :baz, "bar" :foo}

One thing to note about the map-of implementation is the with for match will ensure that search finds all the possible answers. A map pattern such as

{& [[!k !v] ...]

will not. I think this is also fine, however, I do want to make sure that detail is flagged.

@timothypratley
Copy link
Collaborator

Based on conversations in other topics I just want to regroup on where your thinking is at on this specific subject as there were several proposals which I think at this point can be thinned out.

  1. No, match cannot ever look like {& ([!k !v] ...)} for reasons defined in Does & imply seq? #101

1.a. Match does however already support {& {?x ?y}} and (m/and {} (m/seqable [!k !v] ...)) and {& (m/seqable [!k !v] ...)} which seem to cover all use cases imaginable so this is fine.

  1. You proposed using array-map workarounds; I don't see how these help in any way given the "correct" workaround is to match with existing constructs (1.a) and to substitute with {& [...]} -- can this idea be ruled out to avoid introducing duplicative options, or does it allow something I am missing?

  2. Usage that falls outside of these approaches:
    It seems rare enough to just wait and see? If people really run into this they can reopen this issue. Also, there is an immediate way to avoid it, so not a huge inconvenience even if people do run into it.

@noprompt noprompt closed this as completed Jul 1, 2021
Repository owner locked and limited conversation to collaborators Jul 1, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

3 participants