Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] Help make sophia a common RDF API for Rust #23

Open
pchampin opened this issue Dec 10, 2019 · 15 comments
Open

[discussion] Help make sophia a common RDF API for Rust #23

pchampin opened this issue Dec 10, 2019 · 15 comments

Comments

@pchampin
Copy link
Owner

pchampin commented Dec 10, 2019

The design of Sophia emphasizes genericity. The goal was, from the start, to allow multiple implementations of the provided traits to coexist, not even necessarily inside the sophia crate itself.

The goal of this issue is to foster discussion on what is required to achieve this. Are there any design choices in Sophia's traits or underlying types which you find too opinionated or constraining? Are they too complex to be widely adopted?

@pchampin pchampin mentioned this issue Dec 10, 2019
3 tasks
@Tpt
Copy link
Contributor

Tpt commented Dec 10, 2019

Are there any design choices in Sophia's traits or underlying types which you find too opinionated or constraining?

I would actually disagree with the goal of this question. I find a lot of Sophia traits not opinionated enough, having a lot of generic arguments making their use very cumbersome. Imho RDF libraries end users should not have to write any for<'a> or similar syntaxes like Sophia documentation suggests. Performance is good, making the library more complex to use to earn 1 extra percent is probably not worth it.

I do not argue that we should end up with something like the small model library of Oxigraph that is too opinionated and requires to many memory copies. An easy to use and to integrate middle ground is what we need.

We should maybe have a look at successful libraries like http or url that are used in a lot of very different places while keeping what I think is a fairly simple API.

@Tpt
Copy link
Contributor

Tpt commented Dec 10, 2019

An other maybe interesting example is cssparser that is used by Servo, some SVG libraries and probably soon Gnome Shell.

@pchampin
Copy link
Owner Author

Granted, some trade-off are required between genericity and usability. I added a question in the initial description of the issue to acknowledge that.

Re. the for <'a> trick with the Graph trait, it sucks, I totally agree with that. In my view, this was a temporary workaround until generic associated types landed. But admittedly, this might take some time, and we might be better off without this extra layer of genericity. So... #24

@MattesWhite
Copy link
Contributor

Seperation of sophia

In order to provide a clear API, I agree with @Tpt (in this comment).
We require to seperate sophia into sub-crates, like other Rust projects, e.g. actix, tokio or crossbeam.
My suggestion:

  • sophia: Re-exports the sub-crates (maybe feature-flagged)
  • sophia-core: Provides the most basic types like Term and base traits like TermData and Triple.
    In addition, most basic traits are provided for extensions like Graph and Parser.
  • other crates depending on sophia-core
    • sophia-sparql: Parser for SPARQL and engine that operates on sophia-core::Graph
    • sophia-nt
    • sophia-xml
    • sophia-rio
    • sophia-hdt
    • etc...

@Tpt
Copy link
Contributor

Tpt commented Dec 11, 2019

+1 for the separation of Sophia.

But, before going into that direction, we probably need to make an ecosystem choice. Do we want to have big integrated libraries like what exists in Java with Jena and RDF4J that does most of the things but are hardly interoperable between each other or do we want to have API crates providing common interfaces and then have an ecosystem of libraries implementing them like what exists now in JavaScript with the RDF/JS community group work.

I would be more inclined to prefer the second option that would then see the existence of different implementations targetting different use cases and able to work with each other. We could have an integrated toolkit (sophia), a quad store targetting performance (Oxigraph), a parser suite (Rio), a json-ld implementation... all able to be able to be used with each other.

If we go into the "Sophia is the foundation" direction, I am a bit concern with the choice of the CECILL-C license. The rust compiler currently statically link crates together, making the CECILL-C license viral if I understand it correctly. It might prevent reuse by for profit organizations, reducing significantly the reach of the library.

@pchampin
Copy link
Owner Author

I also agree that

  • the sophia crate should be split in smaller crates (I was planning to do it eventually, anyway),
  • we should aim at small independent but interoperable components, rather than big monolithic systems.

Again, I think that the core of Sophia has the potential to play the role of a unified interface. Separating it from the start from the rest of the implementation would send a clearer message to the community, so that should probably go up my list of priorities. I'll do that before the next release.

@pchampin
Copy link
Owner Author

Regarding the licensing issue, thanks @Tpt for spotting that. What I'll probably do is change at least the license of sophia-core to a more permissive one, so that closed source implementations of the traits remain possible. For the other crates, I'll stick to CECILL-C by default, possibly opening some of them later.

pchampin added a commit that referenced this issue Jul 16, 2020
This contributes to issues #23 and #26.
@pchampin pchampin changed the title Help make sophia a common RDF API for Rust [discussion] Help make sophia a common RDF API for Rust Dec 14, 2023
@pchampin
Copy link
Owner Author

@damooo @labra @KonradHoeffner @MarcAntoine-Arnaud @timothee-haudebourg @Tpt @vemonet @yamdan

Resurrecting this thread, with a different spin. I'm pinging the people whom I suspect would be interested, but feel free to ping others as you see fit.

I still believe that a common crate (or set of crates) for RDF development in Rust would be beneficial. I appreciate that Sophia, as a personal project, is not be the best place to do that.

I propose we create a W3C Community Group for that purpose, and that we work collectively on a new repo hosted on https://github.com/w3c-cg/ . I suggest we could start with a common crate for dealing with IRIs (to avoid the duplication of sophia_iri, oxiri, iref, iris.

Then we could define a crate with a bunch of common traits for RDF (term, triple, graph, quad, dataset...), similar to sophia_api. Those traits could then be implemented by types from oxrdf and rdf-types to improve interoperability.

Of course, my personal opinion is that sohpia_iri and sophia_api are almost-perfect candidates for these crates 😉. But in the end, I'm happy to go for something different that gather more consensus. In the end, the community wins.

WDYT? If you think this is a good idea, please react with 👍 on this comment.

@Tpt
Copy link
Contributor

Tpt commented Jun 27, 2024

@pchampin This is a great idea! Thank you! However, I fear this is going to be much harder than RDF/JS APIs. JavaScript makes abstract interfaces much easier with duck typing and preventing low level considerations (no String vs str vs Arc<str> vs Cow<'a, str> vs smallstr vs ...). I am afraid that we will need to pick a cursor between having an easy to use API and a fully featured API covering as many usecases as possible. To take a caricatural example, there is already this tension between oxrdf that is more in the "easy to use" realm and "sophia-api" that is more in the fully featured one. Starting small with something like IRIs is a great idea!

@KonradHoeffner
Copy link

KonradHoeffner commented Jun 28, 2024

I have exactly the same opinion as @Tpt :-) but want to elaborate that further, because this may be trivial for you two but maybe not for other potential readers in this issue, as this only occurred to me when working on the HDT Rust library where you cannot return references to resources because they are only available in compressed form. I once talked about this in the Linked Data party of Triply, so I hope it is OK here to share some of the notes. Their response was that this wasn't an issue for them because they usually operate on very large graphs but comparatively small query results, so a minor overhead was not a problem for them (and I think they use C++ where you don't have some of the issues). If you think this is too much noise in this thread I can also move the post somewhere else, and it's been a while since I worked with it the last time so please correct me if I get something wrong.

Main Challenge: How to return triples?

Which collection for triples?

  • an RDF graph is a set of triples
  • a triple pattern matches an RDF subgraph
  • return a Set?
  • we may only care about a part of the results
  • all processing needs to happen at the start
  • memory needs to be reserved all at once
  • → I think Iterator is a pretty clear choice

How to represent a single triple?

  • In HDT: Triple section contains triples of integer IDs
  • Translate IDs to strings using dictionary
  • Triple = (String, String, String)?

String in Java

  • immutable
  • non-primitive, so actually a reference
  • duplicates (e.g SP? pattern) don’t waste space and
    performance
  • garbage collected when not referenced anymore

String in Rust

  • pointer to the heap
    • may grow, capacity field uses one extra usize
      (alternative: Box)
    • ownership of the contents
    • → waste space on duplication
<http://looongsubject.com> <http://ex.com/l> "label".
<http://looongsubject.com> <http://ex.com/c> "comment".

Pointer to String?

  • Rust has raw pointers like *const String, but they...
  • are not guaranteed to point to valid memory
  • are not automatically cleaned up
  • don’t move ownership
  • dereferencing requires unsafe Rust

Reference to String

let string = String::from("Hello, world!");
// Convert the String to a string slice
let s: &str = string.as_str();
// Compile error!
// Cannot return a reference to a value created in a function.
return s;
  • Cannot return reference within HDT object because resources only
    exist in compressed form.

Maybe Owned String

  • https://crates.io/crates/mownstr
  • String that may be owned or borrowed
  • used by HDT Rust in the past
  • SP? pattern query: return reference to subject and predicate parameters, create new string for each object
  • does not work e.g. for ??? pattern

Arc

  • atomic (thread safe) reference counting smart pointer
  • safe space on ??? pattern
  • synchronization overhead → not optimal for patterns like
    SP?
  • best compromise?

One pattern query method vs many?

pub fn all_triples ...
pub fn triples_with_s ...
pub fn triples_with_p ...
pub fn triples_with_o ...
pub fn triples_with_sp( ...
pub fn triples_with_so( ...
pub fn triples_with_po( ...
pub fn triples_with_spo( ...
  • best performance
    − different return types
    − code duplication
    − complicated for library users

Conclusion and personal pain points

  • There may not be one optimal return type and query API, it may depend on the use case.
  • However it may be worth it to sacrifice optimal performance for all use cases to get a nice interface for most use cases, because we need developers to actually adopt it.
  • Sophia uses traits so one can choose the pointer type, but I found this a massive obstacle while learning Rust. Especially when my application supports different types of graphs, it does not know at compile time which one the user wants to load but I cannot make it a trait object, then I have to define enums and it all gets quite cumbersome and duplicated. So I would really appreciate an Graph trait that is "object safe". See the trait sophia::graph::Graph cannot be made into an object #122 (comment).

@MarcAntoine-Arnaud
Copy link
Contributor

I may open the question to defines some traits around major semantic types and then provide quite of a default implementation for a general use case.
Then a specific use case may re-implement different implementation, for some performance reasons.

Does it may relevant to open a messaging/working group solution (Slack, Discord, ...) in addition that can be "Semantic in Rust" ?

@pchampin
Copy link
Owner Author

Thanks all for your support :)

The proposed CG is listed here: https://www.w3.org/community/groups/proposed/?search=rust&groups=Expand+all+groups

It needs 5 expressions of support to be officially created, and we will have a mailing list and a Github repo. @MarcAntoine-Arnaud we can discuss then if we want another channel.

@Tpt @KonradHoeffner Yes, the main challenge will be to find consensus on the genericity/usability trade-off, and that's why we should start small. And also, documenting the crates to ease their adoption will be key, IMO.

@pchampin
Copy link
Owner Author

The Community Group is now officially created: https://www.w3.org/groups/cg/r2c2/ 🎉

@damooo
Copy link
Contributor

damooo commented Jul 4, 2024

Great @pchampin . I was inactive for months, hence late.

This was really a tedious problem. A common lingua-franca was felt really essential during work.

And the solving iri zoo will be good start. Instead of custom implementation again, can we use excellent iri_string crate? It offers standard compliant types for RIs, and their builders, validators, normaization, reference resolvers, templates etc as per ietf standards. It also offers unsized uri reference types. Worked great in Manas, for many operations regarding iris.
Code quality is too good.

@pchampin
Copy link
Owner Author

pchampin commented Jul 4, 2024

And the solving iri zoo will be good start. Instead of custom implementation again, can we use excellent iri_string crate?

I suggest that we have this discussion on the dedicated repo of the CG, once we create it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants