Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUGGESTION: Module for RDF2VEC. #1529

Closed
bobvanluijt opened this issue Apr 8, 2021 · 10 comments
Closed

SUGGESTION: Module for RDF2VEC. #1529

bobvanluijt opened this issue Apr 8, 2021 · 10 comments
Labels
autoclosed Closed by the bot. We still want this, but it didn't quite make the latest prioritization round discussion

Comments

@bobvanluijt
Copy link
Member

It might be interesting to see if there is a Weaviate module opportunity for RDF2VEC. Especially because the Weaviate schema is RDF-inspired.

Somewhat related: #1522

@bobvanluijt
Copy link
Member Author

CC: @HeikoPaulheim

@etiennedi
Copy link
Member

etiennedi commented Apr 8, 2021

The major requirement for a vectorizer module is that it can independently assign a vector to each object (or a node). E.g. in the case of the transformers module if I have two sentences "My name is John" and "Jane likes ice cream", the vectorizer sees each object in isolation and can assign a vector to each.

If this is something that works with rdf2vec then it it's a great fit for a module. (I don't know - as I haven't looked into rdf2vec thoroughly yet). If, on the other hand, the entire "complete" graph is the input for the rdf2vec and the output is all vectors at once then it wouldn't be compatible with the "vectorize [individually] at import time" philosophy of Weaviate modules.

Having said that it would still work well with Weaviate natively, just not through a module. In this case the process would be:

  1. Take the entire graph
  2. Vectorize it outside of Weaviate, so that each node gets a vector
  3. Import each node with the vector positions into Weaviate without a vectorizer module

In this case a good place to make the process easier for the user might be the python client?

@bobvanluijt
Copy link
Member Author

Thanks for the writeup @etiennedi

If this is something that works with rdf2vec then it's a great fit for a module.

This would be the case, isn't it @HeikoPaulheim?

@HeikoPaulheim
Copy link

It's more of the second case. RDF2vec takes an entire RDF graph and outputs vectors for each node.

@bobvanluijt
Copy link
Member Author

It's more of the second case. RDF2vec takes an entire RDF graph and outputs vectors for each node.

Interesting, in principle this should fit but maybe not as a module.

I'll come back to this, thanks Heiko!

@ali1k
Copy link

ali1k commented May 5, 2021

I also have a question related to RDF2vec:
in the 2nd case you described, if vectorization is done outside Weaviate for RDF data and then we use Weaviate vectorizer for some other type of data, would they still be compatible? or does it mean that RDF vecs cannot be used in conjunction with internal Weaviate vec?!

@HeikoPaulheim
Copy link

HeikoPaulheim commented May 5, 2021 via email

@bobvanluijt
Copy link
Member Author

Well - in principle it should be possible. You could have Weaviate classes that represent the data and the RDF vectors and Weaviate classes that use -for example- a text vectorizer.

It's a bit abstract but a test setup should be doable.

@etiennedi
Copy link
Member

etiennedi commented May 5, 2021

To avoid confiusion, we need to distinguish two separate cases here:

Case 1: Single Weaviate - linked classes

This is the approach that @bobvanluijt describes. You'd have multiple Classes in Weaviate. Each class can have it's own vector space. They need not be compatible. Essentially you'd import your dataset twice, once with Text-Vectors, once with RDF-Vectors. You can then make graph-relations between the two classes. Note: These relations are not your regular edges that you already have in your dataset, these are additional cross-class edges which link the node in space 1 to the equivalent node in space 2. So whenever you do a vector search in one space, you can follow the link to the other vector space and perform a second search around the linked object. We also use this approach when linking images to text when we are using separate vectorizers for each.

Case 2: Single Weaviate - single Vector Space

This is the approach that @HeikoPaulheim describes. You have two separate vector spaces, then you train a mapping function to combine them into as single vector space (So far all of this has happened outside of Weaviate). Once you have a single model you can import it as a whole into Weaviate.

@stale
Copy link

stale bot commented Jul 4, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the autoclosed Closed by the bot. We still want this, but it didn't quite make the latest prioritization round label Jul 4, 2021
@stale stale bot closed this as completed Jul 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autoclosed Closed by the bot. We still want this, but it didn't quite make the latest prioritization round discussion
Projects
None yet
Development

No branches or pull requests

4 participants