Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seqs & iterators #22

Closed
lilactown opened this issue Aug 15, 2022 · 13 comments
Closed

Seqs & iterators #22

lilactown opened this issue Aug 15, 2022 · 13 comments

Comments

@lilactown
Copy link
Collaborator

Background

The Clojure "seq" is a key concept in its language design. It provides a common abstraction for working with all kinds of collections where you want to treat them as an ordered sequence of elements.

JS has a similar abstraction called Iterators. Like a collection in Clojure can be "seqable," a collection in JS can be "iterable." Like seqs, iterators can lazily generate each element or be a view on top of an eager collection. The Iterator & Iterable protocols in JS underpin fundamental operations like for...of, similar to how Clojure seqs underpin map, filter et al. There also exist ways to create custom Iterators in JS using generators, just like in Clojure you can construct lazy sequences via lazy-seq.

Currently, iterators do not have any operations on them except the commonly use for ... of comprehension. There is currently a stage 2 proposal for adding more helpers like .map, .filter, etc: https://github.com/tc39/proposal-iterator-helpers

The major difference between the two abstractions is that iterators are mutable. Consuming an element in the iteration by calling the .next() method mutates the iterator object in place. This makes passing a reference of an iterator to a function a somewhat dangerous thing, as you can't be sure whether or not it consumes it.

Proposal

I would propose two things. I am still thinking this through, so feedback welcome:

  1. core seq ops like map, filter, etc. should accept anything that is Iterable and return a concrete array type
  2. transducers can work with raw iterators, avoiding the overhead of converting to an array each stage

Rationale

Seqs work well in Clojure because collections are immutable, so an immutable view on top of them does not need to worry about the underlying collection changing. In JS we do not have this guarantee, and an immutable view on top of a mutable collection can lead to surprising behavior. Clava also has a general philosophy of "use the platform" rather than bringing over Clojure abstractions. All of this is why I think basing our sequential operators on Iterators makes more sense than porting seqs.

In the above section, I proposed that we take anything that is Iterable but always return an array in map, filter, etc. The alternative would be to return an Iterator directly, similar to the TC39 proposal I linked in the background. However, I think this would lead to a lot of confusion when writing application code. Take the following example:

(defn my-todo-app
  (let [[todos set-todos] (React/useState [{:name "Task 1" :completed false} {:name "Task 2" :completed false}])
        todos-with-id (map-indexed #(assoc %2 :id %1) todos)
        todo-count (count todos-with-id)]
      [:div
       (for [todo todos-with-id]
         [:div {:key (:id todo)} (:name todo)])]))

If map-indexed returns an Iterator, this code is broken: the (count todos-with-id) will consume the iterator, which will lead to nothing being in the Iterator when React tries to render the divs. Developers will have to be careful in their application code to only use the result of a map, map-indexed or filter once. I think that this is an unreasonable thing to force developers to keep in mind, which is why I think that we should do these operations eagerly and return an array.

For transducers, however, we can own the Iterator as long as it's in the transducer pipeline. So code like:

(into [] (comp (map inc) (filter even?)) (range 10))

(range 10) could return an Iterable (e.g. an array). The transducer pipeline can pass an iterator to each stage, avoiding the cost of creating an array each time, before finally adding it all to the array passed to into.

Gotta stop here. Will add more thoughts later. Questions and comments welcome!!

@lilactown
Copy link
Collaborator Author

I'm not sure yet how this relates to things like first, rest and cons. first could take an iterable, create an iterator and return the first, throwing away the iterator. rest could copy map and convert the iterable to an array before returning. 🤔

@lilactown
Copy link
Collaborator Author

It looks like destructuring uses the iterable protocol, so we can easily do

export function first(coll) {
  // destructuring uses iterable protocol
  let [first] = coll;
  return first;
}

export function rest(coll) {
  let [_, ...rest] = coll;
  return rest;
}

and get the behavior we want, fast.

This was referenced Aug 15, 2022
@lilactown
Copy link
Collaborator Author

A downside I've found in implementing & using this in tests is that objects (as created via {:a 1}) does not implement the Iterable protocol. So you cannot pass an object created via {:a 1} as a collection to map, reduce, etc.

You can get access to an iterator via functions like Object.keys, Object.values and Object.entries.

A thought I have: provide a helper function iterable which coerces the value passed to it as something iterable, very similar to seq. For arrays, sets, maps and anything else that already implements the Iterable method, it would pass through unchanged. If the value is an object, it will return Object.entries(o). This can be used internally in reduce, map, etc. to allow them to be used on objects without having to extend the iteratable protocol to them.

Thoughts?

@borkdude
Copy link
Member

An internal function is always ok. Not sure if we should expose this. Perhaps we could expose it as seq, which would return nil if the iterator was empty, like in Clojure...?

@borkdude
Copy link
Member

There's also a -iterator function in CLJS (which is a protocol method really) whose name we could re-use...

@lilactown
Copy link
Collaborator Author

lilactown commented Aug 16, 2022

Writing down some notes for posterity:

  • Iterator: an object with the structure {next, done, value}. next() when called progresses the iterator, value is a property that contains the current value, and done is a property that signals the iterator is exhausted.
  • Iterable: an object that has the Symbol.iterator method on it, which can be called to create a new Iterator object

The thing that makes this tricky is that the for...of comprehension and JS destructuring expect an Iterable, not an Iterator object. So working with a raw Iterator object is also not a complete replacement of ISeq.

Perhaps seq would make sense if it took an Iterable, checked to see if it's empty, and returned nil if so. It could also support nil punning. I'm starting to like that idea @borkdude.

@lilactown
Copy link
Collaborator Author

Hmmm but null is not iterable, so passing nil or an empty collection to any seq functions will throw in that case.

@borkdude
Copy link
Member

Unless those explicitly support nil?

@lilactown lilactown mentioned this issue Aug 17, 2022
Merged
2 tasks
@lilactown
Copy link
Collaborator Author

lilactown commented Aug 17, 2022

@borkdude In the PR I just opened, I am trying the following:

  • seq is the Clojure-like: takes a collection and returns an Iterable of that collection, or nil if it's empty
  • iterable takes a collection and returns an Iterable of that collection, even if it's empty
  • seqable? can be used to check if you can call either one

first, rest, map, reduce et al. use iterable to avoid having to do explicit nil checks, which they would need to do if they called seq.

@borkdude
Copy link
Member

@lilactown So far this makes a ton of sense to me :)

Perhaps add a bit to the README about differences to ClojureScript, basically what you said in your last comment

@borkdude
Copy link
Member

Alright, I think we could close this now, probably, since we pretty much locked down the seq + iterator behavior, unless you have something to add @lilactown ?

@lilactown
Copy link
Collaborator Author

I think the only thing left is to add the rest of the sequence functions, which I can create more issues for

@brandonstubbs
Copy link
Contributor

brandonstubbs commented Aug 18, 2022

@lilactown I am catching up on all of this, just leaving this here for thought Reading this and the transducers issue it's exactly your thoughts, let me know if you want any help with anything

function _first([x]) {
  return x;
}

function _toArray(iterable) {
  const array = [];
  for (const x of iterable) {
    array.push(x);
  }
  return array;
}

function* _map(iterable, f) {
  for (const x of iterable) {
    yield f(x); 
  }
}

function* _filter(iterable, pred) {
  for (const x of iterable) {
    if (pred(x)) {
      yield x;
    }
  }
}

function map(f, coll) {
  return _toArray(_map(coll, f));
}

function filter(pred, coll) {
  return _toArray(_filter(coll, pred))
}

console.log(map((v) => v + 1,new Array(1, 2)))
console.log(filter((v) => v === 2, new Array(1, 2)))

function transformer(iterable) {
  return {
    first: () => _first(iterable),
    map: f => transformer(_map(iterable, f)),
    filter: pred => transformer(_filter(iterable, pred)),
    toArray: () => _toArray(iterable),
    [Symbol.iterator]: () => iterable[Symbol.iterator](),
  }
}

console.log(
  transformer(new Array(1, 2))
    .map(v => v + 1)
    .filter(v => v == 2)
    .first()
);

console.log(
  transformer(new Array(1, 2))
    .map(v => v + 1)
    .filter(v => v == 2)
    .toArray()
);

borkdude added a commit that referenced this issue Dec 24, 2022
borkdude added a commit that referenced this issue Dec 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants