Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems generating or using empty HDT files #31

Closed
osma opened this issue Nov 28, 2016 · 6 comments
Closed

Problems generating or using empty HDT files #31

osma opened this issue Nov 28, 2016 · 6 comments
Labels
enhancement refactoring Results into cleaner, shorter or better code.
Milestone

Comments

@osma
Copy link
Contributor

osma commented Nov 28, 2016

I noticed that the hdt-java tools cannot handle empty HDT files i.e. files with zero triples.

Trying to generate a HDT file based on an empty N-Triple file fails:

$ touch empty.nt # create an empty N-Triples file
$ rdf2hdt.sh empty.nt empty.hdt
Converting empty.nt to empty.hdt as null
Exception in thread "main" java.lang.IllegalArgumentException: Adjacency list bitmap and array should have the same size
	at org.rdfhdt.hdt.compact.bitmap.AdjacencyList.<init>(AdjacencyList.java:50)
	at org.rdfhdt.hdt.triples.impl.BitmapTriples.load(BitmapTriples.java:207)
	at org.rdfhdt.hdt.triples.impl.BitmapTriples.load(BitmapTriples.java:224)
	at org.rdfhdt.hdt.hdt.impl.HDTImpl.loadFromModifiableHDT(HDTImpl.java:377)
	at org.rdfhdt.hdt.hdt.HDTManagerImpl.doGenerateHDT(HDTManagerImpl.java:107)
	at org.rdfhdt.hdt.hdt.HDTManager.generateHDT(HDTManager.java:129)
	at org.rdfhdt.hdt.tools.RDF2HDT.execute(RDF2HDT.java:106)
	at org.rdfhdt.hdt.tools.RDF2HDT.main(RDF2HDT.java:167)

Another way of triggering the same exception is to generate the zero-triple HDT file using hdt-cpp (which works) and then attempt to use it using hdtsparql.sh:

$ touch empty.nt # create an empty N-Triples file
$ rdf2hdt empty.nt empty.hdt # make a HDT file out of it using rdf2hdt from the hdt-cpp suite
$ hdtsparql.sh empty.hdt "select * {?s ?p ?o}"
Exception in thread "main" java.lang.IllegalArgumentException: Adjacency list bitmap and array should have the same size
	at org.rdfhdt.hdt.compact.bitmap.AdjacencyList.<init>(AdjacencyList.java:50)
	at org.rdfhdt.hdt.triples.impl.BitmapTriples.mapFromFile(BitmapTriples.java:372)
	at org.rdfhdt.hdt.hdt.impl.HDTImpl.mapFromHDT(HDTImpl.java:260)
	at org.rdfhdt.hdt.hdt.HDTManagerImpl.doMapIndexedHDT(HDTManagerImpl.java:62)
	at org.rdfhdt.hdt.hdt.HDTManager.mapIndexedHDT(HDTManager.java:93)
	at org.rdfhdt.hdtjena.cmd.HDTSparql.main(HDTSparql.java:38)

While one can argue about the usefulness of empty (i.e. zero triples) HDT files, I don't think this special case should trigger an exception. I noticed this while writing unit tests for my application; the tests exercise some special situations, and one of them happens to generate an empty NT file which will then be converted to HDT and queried using hdtsparql.sh.

@mielvds mielvds added enhancement refactoring Results into cleaner, shorter or better code. labels Apr 16, 2021
@mielvds mielvds added this to the 2.1.3 milestone Apr 16, 2021
@D063520
Copy link
Contributor

D063520 commented Feb 16, 2022

this seams to work in the current version
Screenshot 2022-02-16 at 19 19 46
@mielvds can you check on your side?

@D063520
Copy link
Contributor

D063520 commented Feb 16, 2022

In fact inside the java version everything is working .... but it is not compatible with the C++ version. Empty file created with the c++ version
Screenshot 2022-02-16 at 19 26 26
empty file created with the java version:
Screenshot 2022-02-16 at 19 27 59

@D063520
Copy link
Contributor

D063520 commented Feb 16, 2022

So for the direction

  1. compressing an empty HDT file with c++
  2. opening the hdt file with java version
    my guess is that the c++ version is wrong. Here:

the seqY has length zero but bitmapY has length 1. I'm 90% sure this is encoded in the HDT file wrongly.

Why then it opens in c++ and not in java? The c++ version is not making the same check as java

https://github.com/rdfhdt/hdt-cpp/blob/332a9cc2d5273e76b9daad366f7d2f80adb6b3fc/libhdt/src/sequence/AdjacencyList.cpp#L38

if (array.getNumberOfElements() != bitmap.getNumBits()) {

that is why in c++ we can search over it and not in java.

Moreover if we remove the check in java we can open the file and it contains no triples.

@mielvds : should I open an issue in the c++ version?

@D063520
Copy link
Contributor

D063520 commented Feb 16, 2022

For the direction:

  1. compressing an empty HDT file with java
  2. opening the hdt file with c++ version

I think the best is to debug it in the c++ version.
@mielvds I would move this also to the c++ version therefore .....

@mielvds
Copy link
Member

mielvds commented Feb 16, 2022

Nice findings @D063520 ! Yes, let's bounce this over to the C++ version. and close this issue

@mielvds mielvds closed this as completed Feb 16, 2022
@D063520
Copy link
Contributor

D063520 commented Feb 16, 2022

Ok, I'm moving the first issue ....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement refactoring Results into cleaner, shorter or better code.
Projects
None yet
Development

No branches or pull requests

3 participants