New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace libcds with sdsl-lite #19

Open
kjetilk opened this Issue Dec 1, 2015 · 21 comments

Comments

Projects
None yet
10 participants
@kjetilk

kjetilk commented Dec 1, 2015

Hi all!

When I first reviewed hdt-cpp last winter, hoping to find a way to get it packaged for Debian, I noticed the libcds dependency that caused me some grief when compiling (but nevermind that), but that I think would also cause problems for packaging. I made a thread in the forums, but I suppose opening an issue is better for the dev's workflow. :-) I hope my view here isn't outdated. :-)

It seems like libcds is a fork of somebody else's code, and the link in there is dead. It also seems the upstream author has started a new version but seemingly not finished it. The author has made a post about the motivation for it.

I think that to get this code into a distro, and I think that is very important for widespread adoption, it would be needed to not bundle libcds, but keep it separate, so that it can be packaged independently by distros. Also, if any patches are needed, they should be sent there for incorporation. And since the upstream has abandoned version 1 of their library, HDT-CPP should probably move to use libcds2.

Cheers,

Kjetil

@RubenVerborgh

This comment has been minimized.

Member

RubenVerborgh commented Dec 1, 2015

Hi @kjetilk,

I'd definitely be in favor of migrating to libcds2; however, @joachimvh and @laurensdv have looked at it briefly in the past, and it seemed to entail rather major changes in the code. Guys, any more specific comments on that?

Thanks,

Ruben

@laurensdv

This comment has been minimized.

Contributor

laurensdv commented Dec 2, 2015

Hi yes, I remember we discussed it briefly there.

Switching this library is likely to have an impact on the entire hdt-cpp library.
I am not sure if there are any plans by the RDF-HDT team to still do so.

Since then nothing has changed as far as I know.

@joachimvh

This comment has been minimized.

Contributor

joachimvh commented Dec 2, 2015

From quickly looking the the second version of the library the interfaces seem te same (except they now capitalize all function names for some reason) so using the new version might be viable. Assuming the interfaces still do the same of course and the entire library gets implemented.

@kjetilk

This comment has been minimized.

kjetilk commented Dec 2, 2015

@laurensdv Are you sure you're not confusing the Compact Data Structure library, with the Concurrent Data Structure Library? The former is the one used in HDT-CPP, the latter is something else entirely, but they both call themselves libcds. :-)

@laurensdv

This comment has been minimized.

Contributor

laurensdv commented Dec 2, 2015

Yes, in the beginning I did mix them up (check this confusion and later fix here: https://github.com/laurensdv/hdt-cpp/tree/libcds2), but nonetheless:
https://github.com/rdfhdt/hdt-cpp/tree/master/libcds-v1.0.12
should be exactly the same as
https://github.com/fclaude/libcds
of which indeed
https://github.com/fclaude/libcds2
is the newer version
switching these libraries gave strange results during compilation and as the original developers announced a new version (back then) I stopped this track.

:)

and to make the confusion complete, there is also libCSD were there is a new version under development:
https://github.com/migumar2/libCSD
which was supposed to be integrated with hdt-cpp, and should have an important impact on the library as well.
Which was then unfortunately misspelled in the HDT-cpp source code as DCS:
https://github.com/laurensdv/hdt-cpp/tree/master/hdt-lib/src/libdcs (the source files inside are indeed CSD -> which is I think the StringDictionary in the new version)

@kjetilk

This comment has been minimized.

kjetilk commented Dec 2, 2015

Uh-oh, OK! This is fairly messy. ;-) So, every reason to clear it up, then, and to get into into a distro, it would be needed to do it one way or the other.

I did a diff with the upstream libcsd and the libcsd in this repo, and the diff, filtered for files that only in one or the other, is still quite substantial, see libcdsupstream.patch.txt.

I don't know what these differences are, they could be upstream changes. That's another option, I guess, bring it into sync with upstream libcds 1, so that it could be packaged separately. That's the bottom line, really, it must be possible to package different libraries from different authors separately.

@laurensdv

This comment has been minimized.

Contributor

laurensdv commented Dec 2, 2015

Indeed,
but note that the libCSD which is also a library that should be separated also has a reference to libcds (version 1?) embedded with the source code rather than as a separated package.

https://github.com/migumar2/libCSD/tree/master/libcds

I did not find a repo that has the same cources as the embedded libCSD's

so most likely https://github.com/laurensdv/hdt-cpp/tree/master/hdt-lib/src/libdcs (libCSD) also depends on libcds, but as this dependency is embedded in the source code (or via the makefile) it probably wasn't necessary to include it within the subfolder of this library.

@MarioAriasGa

This comment has been minimized.

Member

MarioAriasGa commented Dec 2, 2015

Hi guys,

First thanks for your contributions :-)

I think that the libcds and libcds2 are similar in essence but I'm not sure whether the libcds2 implements everything we need, in particular the FM-Index that is used from the linkeddatafragments to do text queries.

I'll ping Miguel Angel to see if he knows which one is better to clean.

Mario.

@laurensdv

This comment has been minimized.

Contributor

laurensdv commented Dec 2, 2015

Hi Mario,

I think the FM-index implemented by the libCSD library (not libcds):

new: https://github.com/migumar2/libCSD/blob/master/StringDictionaryFMINDEX.h
current: https://github.com/rdfhdt/hdt-cpp/blob/master/hdt-lib/src/libdcs/CSD_FMIndex.h

and obviously both of these depend on libcds 1
new: https://github.com/migumar2/libCSD/tree/master/libcds
current: https://github.com/laurensdv/hdt-cpp/tree/master/hdt-lib/src/libdcs (libCSD) is in the same source folder as hdt-lib so probably linked against the same libcds (this one: https://github.com/rdfhdt/hdt-cpp/tree/master/libcds-v1.0.12)

which have in essence not much changed:
https://github.com/migumar2/libCSD/blob/master/FMIndex/SSA.h
vs
https://github.com/rdfhdt/hdt-cpp/blob/master/hdt-lib/src/libdcs/fmindex/SSA.h

both seem to be using following parts of the libcds (1) library:

#include <SequenceBuilder.h>
#include <Sequence.h>
#include <BitSequenceBuilder.h>
#include <BitSequence.h>

#include <Mapper.h>

in the new libcds 2 these are unchanged except for the capitalization:
https://github.com/fclaude/libcds2/tree/master/include/libcds2/immutable

but comparing e.g.
https://github.com/fclaude/libcds/blob/master/include/BitSequence.h
with
https://github.com/fclaude/libcds2/blob/master/include/libcds2/immutable/bitsequence.h

you can see that the entire underlying data structure and interfacing has changed
for example change from size_t to cds_word and many other things.

so if libcds 1 and 2 are similar in essence the FMIndex files in both the new and current version should be convertable to the new version no?

@kjetilk

This comment has been minimized.

kjetilk commented Dec 2, 2015

I'll just say thanks for putting this on the radar! When it gets to these details, I don't have anything more useful to contribute, so I'll leave it to you to see if you can do something sensible about it as time allows. :-)

@kjetilk

This comment has been minimized.

kjetilk commented Dec 2, 2015

Since Debian is my main motivation for pestering you about this, I'll just note that the next relevant deadline is the Debian Import Freeze of Ubuntu on Feb 18th. If you'd like to have HDT in the next Ubuntu release, it has to go into Debian Sid ten days before that. To do that, it must be submitted by an established Debian developer into the NEW queue for approval. My own packages have gone through the NEW queue of Debian in a couple of days, but for it to be stuck there for a couple of months are known to happen too. Still, it seems there is a good chance of getting it into Ubuntu if it is in shape by early January. I have someone I can ask about packaging it.

@kjetilk

This comment has been minimized.

kjetilk commented Nov 28, 2016

Just to leave another heads-up, Debian has a release coming up with a freeze for new packages just in the beginning of the new year.

Would have been great for the future if the issues that are preventing packaging could be ironed out. Perhaps a label for that could be created?

@RubenVerborgh

This comment has been minimized.

Member

RubenVerborgh commented Nov 28, 2016

Good suggestion; created the debian-package label for this purpose.

Perhaps we should make it a milestone as well.

@mielvds mielvds added this to the Late 2017 milestone Jan 12, 2017

@kjetilk

This comment has been minimized.

kjetilk commented Oct 8, 2018

Seems like both the libcds libraries are unmaintained now... :-( That seems like a problem for the long-term maintainability of the system. Are there any maintained libraries that could be used on their place?

@akuckartz

This comment has been minimized.

akuckartz commented Oct 9, 2018

@fclaude Do you have any suggestions regarding libcds / libcds2 ?

@akuckartz

This comment has been minimized.

akuckartz commented Oct 9, 2018

Maybe https://github.com/simongog/sdsl-lite/ is an alternative?

@fclaude

This comment has been minimized.

fclaude commented Oct 10, 2018

@kjetilk

This comment has been minimized.

kjetilk commented Oct 10, 2018

Thanks a lot for that update, @fclaude ! I see sdsl-lite is already in Debian, I think this could be a very good thing for HDT. Perhaps change the title of this to "Migrate to sdsl-lite"?

@kjetilk

This comment has been minimized.

kjetilk commented Oct 10, 2018

BTW, on the topic of Debian, they seem to have a freeze in February, so it needs to be in well before that.

@jonassmedegaard

This comment has been minimized.

jonassmedegaard commented Nov 30, 2018

Hi, I am Debian developer and work directly with @kjetilk on getting semantic web related projects into Debian.

In case there are any questions specifically about integration with Debian (and, by extension, with Ubuntu) I am now following this discussion and happy to help where I can here.

@kjetilk kjetilk changed the title from Clean up libcds dependency to Replace libcds with sdsl-lite Nov 30, 2018

@kjetilk

This comment has been minimized.

kjetilk commented Nov 30, 2018

I changed the title to reflect what seems to be the actual step forward. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment