Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a version number to the .hdt.index files #7

Closed
RubenVerborgh opened this issue Nov 21, 2014 · 22 comments
Closed

Add a version number to the .hdt.index files #7

RubenVerborgh opened this issue Nov 21, 2014 · 22 comments
Assignees

Comments

@RubenVerborgh
Copy link
Member

Different .hdt.index files for the same .hdt file are incompatible with each other; this causes problems if different applications read them. Maybe they should receive some kind of version number.

But this then creates the problem: how to find the index?
Perhaps, instead of supplying the .hdt file name, we can provide the index file name.

@RubenVerborgh
Copy link
Member Author

See also LinkedDataFragments/HDT-Node#1.

@artob
Copy link
Contributor

artob commented Nov 13, 2015

Just encountered ERROR: Trying to read a LOGArray but data is not LogArray myself when trying to use tools/hdtSearch on some indexed HDT files prepared by a third party. Would indeed be good to figure out some strategy here--e.g., a common header with a format version number and feature flags.

@mielvds
Copy link
Member

mielvds commented Dec 3, 2015

My proposal for a versioning strategy:

Given that the version number is x.y.z:
A change in the x introduces a breaking change in the HDT file that can be generated or read, including a breaking change in the generated index file
A change in the y introduces a breaking change in the generated index file, but ensures compatible HDT files
All else increment z

Original release: 1.0.0
This release: 1.1.1
Next release: 1.1.2

Index files can, for instance, be named <filename>.v11.hdt.index

Any thoughts?

@RubenVerborgh
Copy link
Member Author

Makes sense to me (I'd just put the v11 either after the .hdt or .index to avoid any confusion).

@artob
Copy link
Contributor

artob commented Jun 12, 2016

Has anyone perchance made progress on this front in recent months?

@mielvds mielvds self-assigned this Jun 27, 2016
@mielvds
Copy link
Member

mielvds commented Jun 28, 2016

Would it be acceptable to put the version number in the makefile?

@mielvds
Copy link
Member

mielvds commented Jun 28, 2016

Nah, I guess that won't work because of other build strategies

@RubenVerborgh
Copy link
Member Author

Simply put it in an include file?

@mielvds
Copy link
Member

mielvds commented Jun 29, 2016

Sounds reasonable. Probably this will have to happen in the Java version as well for compatibility?

@mielvds
Copy link
Member

mielvds commented Jun 29, 2016

@mielvds
Copy link
Member

mielvds commented Jul 7, 2016

@bendiken fixed in #36 , please review

@mielvds
Copy link
Member

mielvds commented Dec 4, 2016

Fixed with merge of #36

@mielvds mielvds closed this as completed Dec 4, 2016
@RubenVerborgh
Copy link
Member Author

Excellent. Shall we publish and tag a v1.2.0 soon then?

@mielvds
Copy link
Member

mielvds commented Dec 4, 2016

depends. Did the index change in a breaking way? Else it's 1.1.2 :)

We should include the versioning strategy in the readme

@RubenVerborgh
Copy link
Member Author

RubenVerborgh commented Dec 4, 2016

Not sure I agree:

  1. I would want to follow the SemVer convention of minor version = "new backwards-compatible features"
  2. HDTVersion.hpp has places for HDT_VERSION, INDEX_VERSION, RELEASE_VERSION, but nothing implies that this is tied to the version number of the software itself. In my opinion, they should be separate: the software can go through minor and major releases, without changing compatibility with a certain major HDT format.

@wouterbeek
Copy link
Contributor

wouterbeek commented Dec 4, 2016

Sorry, I might be going off in a very different direction from what you've been discussing here...

I would very much prefer the index and HDT file to be one and the same. It is technically trivial to do so.

IIUC then the only reason why index and HDT are not one and the same file is because you can save the size of the index file when using HDTs as a transmission format. However, the size of the index is not so big so this size benefit is not so large. If we use the HDT as a storage format, then the size difference does not matter at all (because disk is so cheap these days).

Having 1 file with no versioning/synchronization overhead between HDT and index would significantly simplify handling HDTs.

@mielvds
Copy link
Member

mielvds commented Dec 4, 2016

@RubenVerborgh keeping them separate just seems confusing to me. But if this is common practice, by all means.

@wouterbeek from my experience, indexes can be quite large. But I have to admit, it would simplify things.

@wouterbeek
Copy link
Contributor

@mielvds The index files are between 10% and 40% of the size of the HDT file. There may be exceptions, but this is the ballpark figure IINM.

Having a versioning system is of course better than the current situation where we have to do the bookkeeping ourselves.

@RubenVerborgh
Copy link
Member Author

RubenVerborgh commented Dec 4, 2016

@mielvds What I suggested is semver, which becomes more and more common. But I don't mind too much in this case; I'd just want a new release somewhere soon.

@wouterbeek Not sure I follow the argument of an index to be the same everywhere. We recently had a commit in which the index was improved for certain lookups. And as you know, the index file is not information by itself, as it can be computed in its entirety from the HDT file. In almost all cases, it will be faster to generate it than to download it.

@RubenVerborgh
Copy link
Member Author

@mielvds Perhaps wait with releasing a new version until I have resolved this. I suspect something went wrong recently with this codebase.

@mielvds
Copy link
Member

mielvds commented Dec 5, 2016

@RubenVerborgh sounds reasonable, but we'll need to add that distinction to the code then.

@wouterbeek I couldn't remember the argument against it, and @RubenVerborgh found it. Some indexes are optional, and we don't know which ones will be added in the future.

@RubenVerborgh
Copy link
Member Author

@mielvds The blocking issue for a new version is #43.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants