Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object but for sequences #147

Closed
SergioBenitez opened this issue Nov 15, 2022 · 6 comments
Closed

Object but for sequences #147

SergioBenitez opened this issue Nov 15, 2022 · 6 comments

Comments

@SergioBenitez
Copy link
Contributor

SergioBenitez commented Nov 15, 2022

Using Object cut down the running time of my application by 25% by allowing me to remove clones in favor of using proxy objects. From profiling, I can see that a similar possibility for optimization exists for sequences.

At present, if I have an existing sequence of non-minijinja-values, I have no recourse but to clone the entire sequence, converting each value into a native minijinja::Value in the process:

array.iter()
    .map(value_to_minijinja_value)
    .collect()

This is expensive, especially when the sequence is large. If instead there was a trait like Object for sequences, I could wrap the array in a proxy object like I do maps and save the cost:

Value::from_sequence_object(ProxySequence(array))

One option is to introduce another trait and ValueRepr kind, say Sequence, which looks a bit like Object:

pub trait Sequence: Display + Debug + Any + Sync + Send {
    fn get(&self, index: usize) -> Option<Value> { ... }

    /// This is a range so that slices are cheaply representable.
    fn range(&self) -> Range<usize> { ... }
}

Alternatively, Object can be generalized to be useful for sequence-like objects as well. One approach for this option is to add the same get() and range() methods above to Object. Another is to make the existing get_attr() and attributes() methods more versatile, allowing the former to take a usize and the latter to emit a range().

@mitsuhiko
Copy link
Owner

My preference would be for Object to gain the necessary APIs to allow it to represent a sequence. Some potential options:

pub trait Object: fmt::Display + fmt::Debug + Any + Sync + Send {
    // looks up an attribute by name
    fn get_attr(&self, name: &str) -> Option<Value>;
    // looks up an item by index
    fn get_by_index(&self, idx: usize) -> Option<Value>;
    // iterates over the attributes of the object
    fn attributes(&self) -> Box<dyn Iterator<Item = &str> + '_>;
    // returns the number of items in the object.  Must be implemented
    // for sequences, is entirely optional for mappings
    fn len(&self) -> Option<usize>;
    // what type of object is this?  Valid return values are `Seq` and `Map`
    fn kind(&self) -> ValueKind;

    fn call_method(&self, state: &State, name: &str, args: &[Value]) -> Result<Value, Error>;
    fn call(&self, state: &State, args: &[Value]) -> Result<Value, Error>;
}

I'm not sure if range() is necessary. If you want to enable cheap slicing that could be an operation internal to the object unless you want things like [:5] to have first class cheap slicing support.

@SergioBenitez
Copy link
Contributor Author

My preference would be for Object to gain the necessary APIs to allow it to represent a sequence.

I think this is feasible, but I would advocate against an API that dynamically asserts that an Object is a Seq xor a Map simply because a value might wish to be both. Alas, I suppose this is truly a question of semantics. Is expr.a the same as expr[a]? If so, then only difference between expr.foo and expr.0 is that foo is a string and 0 is an integer. Could an API like the following work?

enum Key<'a> {
    Attr(&'a str),
    Index(usize),
}

pub trait Object: fmt::Display + fmt::Debug + Any + Sync + Send {
    // looks up an attribute by name or index
    fn get(&self, key: Key<'_>) -> Option<Value>;

    // iterates over the attributes of the object
    fn keys(&self) -> Box<dyn Iterator<Item = Key<'_>> + '_>;

    fn call_method(&self, state: &State, name: &str, args: &[Value]) -> Result<Value, Error>;

    fn call(&self, state: &State, args: &[Value]) -> Result<Value, Error>;
}

If a value should be a Sequence xor Object, then I would advocate for two traits as that seems much simpler and easier to implement correctly. What's more, I've been dreaming of a way to get the actual value as an enum (like Value::kind() but with the variant containing the actual value), and the separation (if it's truly xor) would make this nicer as well.

@mitsuhiko
Copy link
Owner

Alas, I suppose this is truly a question of semantics. Is expr.a the same as expr[a]?

So this is an interesting question. In MiniJinja they are equivalent today because we so far did not have objects that want to provide both. However in Jinja2 that was not possible because Python objects have attributes and keys. As such in Jinja2 foo.a and foo['a'] are slightly different. The former tries attributes first, keys after, the second operates in reverse order. Filters are provided to explicitly disambiugate.

Another difference in MiniJinja to Jinja2 is that I decided that foo.x() is directly a call rather than a lookup to x which returns a method which is callable. This also means that foo.x() does not expect x to exist as attribute. I believe this is in general a better approach.

I believe i like the MiniJinja semantics more and there is not much reason to emulate Jinja2 here.

On the topic of keys: In the value type Key already exists but it has some pretty gnarly enum variants i do not want to expose. Having a second key is an option but also feels a bit weird.

Generally speaking I think I prefer an object to be one trait that represents the entire object model similar to how Python objects work. Mostly because it's (as ugly as this can get) relatively easy to reason about. So what we know today is that both sequences and maps are things worth representing. We probably don't want to permit an object to impersonate a string and integer. In fact I don't see much value at the moment for an Object to represent anything but this. For the unlikely future case that something wants to proxy an entire value it's probably better to special case this and to let it return another value that it "becomes".

However there is already a hole in the value type today where we just lie. The engine needs a BoxedFunction to represent globals (like range() or debug()). These are today implemented as Object which is really quite wrong because functions really do not behave like objects. As such I do see some value in adding Function to ValueKind and have kind() exposed to objects so that they can at least cover that case better.

Another benefit of having just one trait is that it lets an object to fully proxy something else at runtime too. This would be quite beneficial for some lazy loading scenarios.

@mitsuhiko
Copy link
Owner

mitsuhiko commented Nov 15, 2022

I'm starting to wonder if it wouldn't be saner to really expose the Key struct in some form. The object trait implementation then could look like this:

pub trait Object: fmt::Display + fmt::Debug + Any + Sync + Send {
    fn kind(&self) -> ValueKind {
        ValueKind::Map
    }

    fn get(&self, key: &Key) -> Option<Value> {
        None
    }

    fn call_method(&self, state: &State, name: &str, args: &[Value]) -> Result<Value, Error> {
        let _state = state;
        let _args = args;
        Err(Error::new(
            ErrorKind::InvalidOperation,
            format!("object has no method named {}", name),
        ))
    }

    fn call(&self, state: &State, args: &[Value]) -> Result<Value, Error> {
        let _state = state;
        let _args = args;
        Err(Error::new(
            ErrorKind::InvalidOperation,
            "tried to call non callable object",
        ))
    }

    fn keys(&self) -> Box<dyn Iterator<Item = Key> + '_> {
        Box::new(None.into_iter())
    }

    fn len(&self) -> usize {
        self.keys().count()
    }

    fn is_empty(&self) -> bool {
        self.len() == 0
    }
}

impl Object for MySeq {
    fn kind(&self) -> ValueKind {
        ValueKind::Seq
    }

    fn get(&self, key: &Key) -> Option<Value> {
        if let Key::U64(idx) = key {
            self.0.get(*idx as usize)
        } else {
            None
        }
    }

    fn len(&self) -> usize {
        self.0.len()
    }
}

The iteration behavior in the engine would be to iterate from 0..obj.len() for ValueKind::Seq and otherwise it would iterate over obj.keys(). It does however make general object implementations much more complex now because keys can then be of different types and Key today at least holds both Key::String and Key::Str.

Loop controller for instance then looks like this:

impl Object for Loop {
    fn keys(&self) -> Box<dyn Iterator<Item = Key> + '_> {
        Box::new(
            [
                Key::Str("index0"),
                Key::Str("index"),
                Key::Str("length"),
                Key::Str("revindex"),
                Key::Str("revindex0"),
                Key::Str("first"),
                Key::Str("last"),
                Key::Str("depth"),
                Key::Str("depth0"),
            ]
            .into_iter(),
        )
    }

    fn get(&self, key: &Key) -> Option<Value> {
        let idx = self.idx.load(Ordering::Relaxed) as u64;
        let len = self.len as u64;
        match name.as_str()? {
            "index0" => Some(Value::from(idx)),
            "index" => Some(Value::from(idx + 1)),
            "length" => Some(Value::from(len)),
            "revindex" => Some(Value::from(len.saturating_sub(idx))),
            "revindex0" => Some(Value::from(len.saturating_sub(idx).saturating_sub(1))),
            "first" => Some(Value::from(idx == 0)),
            "last" => Some(Value::from(len == 0 || idx == len - 1)),
            "depth" => Some(Value::from(self.depth + 1)),
            "depth0" => Some(Value::from(self.depth)),
            _ => None,
        }
    }
}

@SergioBenitez
Copy link
Contributor Author

What I'm observing is that adding a fn kind(&self) -> ValueKind method to Object adds the complexity and fallibility of tracking the kind of object to the consumer without providing any benefit. It seems that your intention is for Object to be either object-like or sequence-like, but not both, so why leave the decision dynamic when a static one (via two traits) exists?

If there's some internal benefit to having only a single trait (though my instinct is that two traits would also simplify things internally), you could still unify both internally either via an enum with two variants (effectively your proposal) or via a third trait.

Alternatively, if an Object can be both sequence-like and object-like, then removing the kind() method entirely would suffice.

@mitsuhiko
Copy link
Owner

mitsuhiko commented Nov 16, 2022

What I'm observing is that adding a fn kind(&self) -> ValueKind method to Object adds the complexity and fallibility of tracking the kind of object to the consumer without providing any benefit.

Yes and no. Generally the benefit this actually has is that it would permit lazy loading of either structs/maps or sequences. You could place an object in the scope and only the first time it's interacted with would it turn into the target value (for as long that value can be represented by Object). The other issue is that if those are different traits, then the engine also needs a downcast_sequence_ref for sequences now.

However because maps and sequences do not behave the same, there are various different places where the differences matter.

It seems that your intention is for Object to be either object-like or sequence-like, but not both, so why leave the decision dynamic when a static one (via two traits) exists?

Object was created to represent dynamic types in the engine. The original intention before it was exposed to the user was to represent the following types:

  • loop: this is a struct and a fake function (not callable if reassigned today, but that's an unintended limitation)
  • functions: if you assign a global function into the template, it's represented by an Object
  • any macro: macros are functions but they also have some struct like attributes

Object in a way is an unfortunate name because it implies that it represents a struct like object, but that's just because this was the interface that was needed to represent these types. Recently the object interface has made it look more like a map in the engine (with the recent changes). That was originally not really the purpose of it, and now all the sudden it more resembles a map or struct.

The original intention of the Object interface was to by like a PyObject from Python and the behavior it gets are based on what it implements.

Alternatively, if an Object can be both sequence-like and object-like, then removing the kind() method entirely would suffice.

That's only possible if the iteration behavior of the object becomes part of the object's interface. An object that behaves like a map iterates over the keys, an object that behaves like a sequence iterates over the values as an example. Likewise serializing the thing to JSON also produces obviously different representations of the final structure.

I have implemented different forms of this as an experiment and I'm not particularly happy with any of these but I'm exploring around. One thing I have been playing with that ensures that you cannot accidentally be two things at once is this:

enum ObjectBehavior<'a> {
    Seq(&'a dyn Seq),
    Struct(&'a dyn Struct),
}

trait Object {
    fn behavior(&self) -> ObjectBehavior<'_>;
    fn call_method(&self, state: &State, name: &str, args: &[Value]) -> Result<Value, Error> { ... }
    fn call(&self, state: &State, args: &[Value]) -> Result<Value, Error> { ... }
}

trait Seq {
    fn get(&self, idx: usize) -> Option<Value>;
    fn len(&self) -> usize;
}

trait Struct {
    fn get(&self, key: &str) -> Option<Value>;
    fn keys(&self) -> Box<dyn Iterator<Item = &str> + '_>;
}

It requires implementing two traits but beyond that it's quite nice:

struct MySeq;

impl Object for MySeq {
    fn behavior(&self) -> ObjectBehavior<'_> {
        ObjectBehavior::Seq(self)
    }
}

impl Seq for MySeq {
    fn get(&self, idx: usize) -> Option<Value> {
        if idx < self.len() {
            Some(Value::from(idx * idx))
        } else {
            None
        }
    }

    fn len(&self) -> usize {
        10
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants