Add TermsCardinalitySets as an index #8

pietercolpaert · 2024-04-04T08:26:10Z

This pull request introduces a new index called TermsCardinalitySets, speeding up distinct look-ups of subject, predicate, objects and/or graphs.

In the index, we maintain an overview of their cardinalities of how many times they have been used in that position. While currently to exploited, this could also be useful towards speeding up certain counts.

Full test coverage should be reached, but I lack the typescript coverage experience to understand how the last line can be covered as well. Can you help with that?

…redicates and getObjects - need to fix coverage still

rubensworks · 2024-04-10T11:59:17Z

Thanks for the PR @pietercolpaert!

I had a quick look at the proposed API, but It's quite a bit different to what I had in mind, which might be caused by our different requirements.

While we already had quad indexes, this PR adds unary indexes. But there is also a need for binary and ternary indexes.
As such, I think the most sustainable way forward to implement this is to extend or abstract the IRdfStoreIndex interface, given their common logic.

But the current implementation with TermsCardinalitySets obviously works, and solves your use case. So one option might be to use your fork of this library until I've managed to look into implementing this index abstraction.

pietercolpaert · 2024-04-11T07:02:14Z

Thanks! That works for me. Can we however agree already on the method names then, such as getSubjects() in order to get a distinct list of subjects?

rubensworks · 2024-04-11T07:19:11Z

Can we however agree already on the method names then, such as getSubjects() in order to get a distinct list of subjects?

I'm not too sure to be honest. If we follow this approach for binary and ternary access as well, then will need a huge amount of methods for handling all possible combinations of access. So we may need something more abstract. But I don't have a clean way in mind atm. Possible something using QuadTermName as params.

pietercolpaert · 2024-04-11T08:16:45Z

How about getDistinct(term1: QuadTermName, term2?: QuadTermName, term3?: QuadTermName, term4?: QuadTermName)

rubensworks · 2024-04-11T10:06:21Z

How about getDistinct(term1: QuadTermName, term2?: QuadTermName, term3?: QuadTermName, term4?: QuadTermName)

I don't think that will be sufficient to capture queries such as S? or ?P on SP index.

pietercolpaert · 2024-04-11T10:16:57Z

Right... How about:

getDistinct(terms: QuadTermName[], quad?: QuadPattern)

That way you can then bind the variables to the distincts you want?

i.e.

getDistinct(['subject'], [null, df.namedNode('rdf:type'),df.namedNode('foaf:Person')])

Would translate into:

SELECT DISTINCT ?s WHERE {
   ?s a foaf:Person .
}

rubensworks · 2024-04-11T10:24:15Z

Perhaps, something like that could work.
But I'm not 100% certain yet, I feel like additional requirements may pop up once I start designing the architecture for handling these different indexes into comunica 😅

First design of TermsCardinalitySets for getGraphs, getSubjects, getP…

b7f8ef9

…redicates and getObjects - need to fix coverage still

pietercolpaert marked this pull request as draft April 4, 2024 08:27

pietercolpaert mentioned this pull request Apr 4, 2024

Fix #27: performance increase for many graphs TREEcg/extract-cbd-shape#28

Open

Fixing map and coverage

56dcfb2

pietercolpaert marked this pull request as ready for review April 4, 2024 18:46

Fix linting

d818638

pietercolpaert changed the title ~~First design of TermsCardinalitySets~~ Add TermsCardinalitySets as an index Apr 5, 2024

pietercolpaert closed this Apr 12, 2024

pietercolpaert mentioned this pull request May 3, 2024

Performance increase idea: graphs to ignore to graphs to look into list TREEcg/extract-cbd-shape#27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TermsCardinalitySets as an index #8

Add TermsCardinalitySets as an index #8

pietercolpaert commented Apr 4, 2024 •

edited

Loading

rubensworks commented Apr 10, 2024

pietercolpaert commented Apr 11, 2024

rubensworks commented Apr 11, 2024

pietercolpaert commented Apr 11, 2024

rubensworks commented Apr 11, 2024

pietercolpaert commented Apr 11, 2024

rubensworks commented Apr 11, 2024

Add TermsCardinalitySets as an index #8

Add TermsCardinalitySets as an index #8

Conversation

pietercolpaert commented Apr 4, 2024 • edited Loading

rubensworks commented Apr 10, 2024

pietercolpaert commented Apr 11, 2024

rubensworks commented Apr 11, 2024

pietercolpaert commented Apr 11, 2024

rubensworks commented Apr 11, 2024

pietercolpaert commented Apr 11, 2024

rubensworks commented Apr 11, 2024

pietercolpaert commented Apr 4, 2024 •

edited

Loading