Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of the lifetime parameters on Triple, Graph and Dataset #24

Closed
pchampin opened this issue Dec 10, 2019 · 9 comments
Closed

Get rid of the lifetime parameters on Triple, Graph and Dataset #24

pchampin opened this issue Dec 10, 2019 · 9 comments
Assignees

Comments

@pchampin
Copy link
Owner

They are causing much complexity, especially the requirement for using higher-ranked type bound in some situations.

It might be better to give up some genericity in order to make the API easier to use.

@pchampin
Copy link
Owner Author

Removing lifetimes on Tripe and Quad is easy. I have a local branch that I'll push soon, where this is done.

For Graph and Dataset, this is more tricky. The problem comes from the fact that Graph::Triple, which is the type returned by the iterating methods (triples, triples_with_s, etc.) sometimes needs a lifetime, especially when one wants the iterator to return references to internal triples owned by the graph. This is something that I would like to keep.

As mentioned elsewhere, the cleanest solution would be generic associated types (GAT), but those are not expected to be stabilized before long...

I have experimented with another approach, which tries to emulate somehow GAT, by using pointers under the hood, hence eschewing lifetime parameters, and wrapping them in a safe abstraction. I tried my best to make this approach easy enough for implementors of the Graph trait (although the documentations is still rough at the edges).

https://gist.github.com/pchampin/7fbf3262ab22b6a0f6a6eae8a6aa938c

@Tpt @MattesWhite anyonelse... your feedback would be much appreciated

@MattesWhite
Copy link
Contributor

Your solution seems suitable.

  • One minor disadvantage I see is the cost of extra wrapping of newly created triple but maybe the compiler is smart enough to optimize that away.
  • What definitley must be document is that this is just a temporary solution until GATs arrive.
  • A small suggestion I have to make: When implementors see that they have to return GuradedTriples they will probably have first a look at the docs of the type. As a consequence, I would move the constructors to the type itself.
  • Your implementation exposes the raw pointers to the user/implementor. Here is a version with a complete safe API and without UnsafeTriple.

@Tpt
Copy link
Contributor

Tpt commented Dec 19, 2019

I like @MattesWhite solution. A maybe dumb comment: are the NotNull pointers actually required in your solution? I believe you could do it with just references (but maybe I'm wrong).

@MattesWhite
Copy link
Contributor

Well, this is not a dumb question 😁 . Indeed, they are not needed. I updated to a version with zero unsafe code.

BTW with the removal of everything unsafe the name GuardedTriple is not suitable anymore. Maybe something like WrappedTriple or GraphTriple would be better.

@Tpt
Copy link
Contributor

Tpt commented Dec 19, 2019

Maybe something like WrappedTriple or GraphTriple would be better.

This type looks a lot like Cow from std. I am also not sure that the variant BorrowedTriple is useful: it should be easy and cheap to convert it to BorrowedTerms. If we also make the subject, predicate and object explicit, we would have a type Triple that would be an owned only triple, and TripleCow that is Cow for triples (and maybe TripleRef for the cases where just references are enough)? This way we get something that woulbe be very similar to String str and Cow<str> or PathBuf, Path and Cow<Path>.

@pchampin
Copy link
Owner Author

Your solution seems suitable.

Cool :)

  • One minor disadvantage I see is the cost of extra wrapping of newly created triple but maybe the compiler is smart enough to optimize that away.

My solution has zero overhead in memory (PhantomData is 0-sized) and in runtime (provided the appropriate #[inline] annotations). Yours (with an enum) does induce an overhead. More below.

  • What definitley must be document is that this is just a temporary solution until GATs arrive.

Absolutely. I wouldn't hold my breath, though... GATs have been mentioned for a while now.

  • A small suggestion I have to make: When implementors see that they have to return GuradedTriples they will probably have first a look at the docs of the type. As a consequence, I would move the constructors to the type itself.

Very good point. The current approach (with method) evolved organically from previous attempts, but the constructor approach is much cleaner.

  • Your implementation exposes the raw pointers to the user/implementor. Here is a version with a complete safe API and without UnsafeTriple.

Actually, the raw pointers are hidden inside the GuardedTriple; neither implementors nor users will actually handle the pointers directly. Only a very daring implementor may have to write unsafe sections if they are not satisfied with the 3 provided options, and decide to add their own implementation of UnsafeTriple. Which the documentation will clearly discourage.

Now, regarding your variant: by using enums, you are introducing memory overhead (the enum takes as much space as the biggest variant), and runtime overhead (each call to triple methods will go through the match statement). My approach relies on monomorphisation.

If we were to accept a small overhead, there would be no need for this complex approach. We could simply go for dynamic dispatching with trait objects:

trait Graph {
    // note that the associated type Triple has vanished

    fn triple<'s>(&'s self) -> Box<dyn Iterator<Item=Box<dyn Triple + 's>> + 's>
    //                                               ^^^^^^^^^^^^^^^^^^^^
    //                                               this is the big change
}

I have considered this option. But, as I expressed before, I find it sad to renounce zero-cost abstract so close to the goal. And the proposed approach will allow for a smooth transition when GAT finally land (basically replace the associated type Triple = *const T with Triple<'t> = &'t T). So I think it is worth the (temporary and relatively light) hassle.

(sorry for the long answer)

@MattesWhite
Copy link
Contributor

 neither implementors nor users will actually handle the pointers directly.

Of course but the types-signatures include raw pointers. Maybe it is already enough to hide them behind a typedef, e.g.

type BorrowedTriple<T> = *const T;

If I think further this might be a nice expressive way. Change the associated type of Graph to something linke type IteratorTriple: TripleOrigin; with:

trait TripleOrigin: Triple {}

type CreatedTriple<T> = T;
type StoredTriple<T> = *const T;
type StoredTerms<TD> = [*const Term<TD>; 3];

// respective impls of TripleOrigin for the types

Don't know if this realy works because I'm on mobile but I wanted to wright this thought down.

@MattesWhite
Copy link
Contributor

Now I had some time to test out my proposal. Here are the results:

https://gist.github.com/MattesWhite/f5daf2d38d36c4e325e00cbb41ec59e8#file-sophia_experiment_zero-rs

The most notable difference is that I use new-types instead of typedefs to declare the TripleOrigins. As they induce no memory-overhead (jsut like GuradedTriple) they are zero-cost abstractions as-well. In addition, they provide an expressive way for implementors to tell them what to do.

@pchampin
Copy link
Owner Author

That's roughly what I'm planning to do. I'm not decided on the terminology, though. I want to make easy to understand by implementors of the Graph/Dataset traits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants