Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace libcds with sdsl-lite #19

Open
kjetilk opened this issue Dec 1, 2015 · 23 comments
Open

Replace libcds with sdsl-lite #19

kjetilk opened this issue Dec 1, 2015 · 23 comments

Comments

@kjetilk
Copy link

kjetilk commented Dec 1, 2015

Hi all!

When I first reviewed hdt-cpp last winter, hoping to find a way to get it packaged for Debian, I noticed the libcds dependency that caused me some grief when compiling (but nevermind that), but that I think would also cause problems for packaging. I made a thread in the forums, but I suppose opening an issue is better for the dev's workflow. :-) I hope my view here isn't outdated. :-)

It seems like libcds is a fork of somebody else's code, and the link in there is dead. It also seems the upstream author has started a new version but seemingly not finished it. The author has made a post about the motivation for it.

I think that to get this code into a distro, and I think that is very important for widespread adoption, it would be needed to not bundle libcds, but keep it separate, so that it can be packaged independently by distros. Also, if any patches are needed, they should be sent there for incorporation. And since the upstream has abandoned version 1 of their library, HDT-CPP should probably move to use libcds2.

Cheers,

Kjetil

@RubenVerborgh
Copy link
Member

Hi @kjetilk,

I'd definitely be in favor of migrating to libcds2; however, @joachimvh and @laurensdv have looked at it briefly in the past, and it seemed to entail rather major changes in the code. Guys, any more specific comments on that?

Thanks,

Ruben

@laurensdv
Copy link
Contributor

Hi yes, I remember we discussed it briefly there.

Switching this library is likely to have an impact on the entire hdt-cpp library.
I am not sure if there are any plans by the RDF-HDT team to still do so.

Since then nothing has changed as far as I know.

@joachimvh
Copy link
Contributor

From quickly looking the the second version of the library the interfaces seem te same (except they now capitalize all function names for some reason) so using the new version might be viable. Assuming the interfaces still do the same of course and the entire library gets implemented.

@kjetilk
Copy link
Author

kjetilk commented Dec 2, 2015

@laurensdv Are you sure you're not confusing the Compact Data Structure library, with the Concurrent Data Structure Library? The former is the one used in HDT-CPP, the latter is something else entirely, but they both call themselves libcds. :-)

@laurensdv
Copy link
Contributor

Yes, in the beginning I did mix them up (check this confusion and later fix here: https://github.com/laurensdv/hdt-cpp/tree/libcds2), but nonetheless:
https://github.com/rdfhdt/hdt-cpp/tree/master/libcds-v1.0.12
should be exactly the same as
https://github.com/fclaude/libcds
of which indeed
https://github.com/fclaude/libcds2
is the newer version
switching these libraries gave strange results during compilation and as the original developers announced a new version (back then) I stopped this track.

:)

and to make the confusion complete, there is also libCSD were there is a new version under development:
https://github.com/migumar2/libCSD
which was supposed to be integrated with hdt-cpp, and should have an important impact on the library as well.
Which was then unfortunately misspelled in the HDT-cpp source code as DCS:
https://github.com/laurensdv/hdt-cpp/tree/master/hdt-lib/src/libdcs (the source files inside are indeed CSD -> which is I think the StringDictionary in the new version)

@kjetilk
Copy link
Author

kjetilk commented Dec 2, 2015

Uh-oh, OK! This is fairly messy. ;-) So, every reason to clear it up, then, and to get into into a distro, it would be needed to do it one way or the other.

I did a diff with the upstream libcsd and the libcsd in this repo, and the diff, filtered for files that only in one or the other, is still quite substantial, see libcdsupstream.patch.txt.

I don't know what these differences are, they could be upstream changes. That's another option, I guess, bring it into sync with upstream libcds 1, so that it could be packaged separately. That's the bottom line, really, it must be possible to package different libraries from different authors separately.

@laurensdv
Copy link
Contributor

Indeed,
but note that the libCSD which is also a library that should be separated also has a reference to libcds (version 1?) embedded with the source code rather than as a separated package.

https://github.com/migumar2/libCSD/tree/master/libcds

I did not find a repo that has the same cources as the embedded libCSD's

so most likely https://github.com/laurensdv/hdt-cpp/tree/master/hdt-lib/src/libdcs (libCSD) also depends on libcds, but as this dependency is embedded in the source code (or via the makefile) it probably wasn't necessary to include it within the subfolder of this library.

@MarioAriasGa
Copy link
Member

Hi guys,

First thanks for your contributions :-)

I think that the libcds and libcds2 are similar in essence but I'm not sure whether the libcds2 implements everything we need, in particular the FM-Index that is used from the linkeddatafragments to do text queries.

I'll ping Miguel Angel to see if he knows which one is better to clean.

Mario.

@laurensdv
Copy link
Contributor

Hi Mario,

I think the FM-index implemented by the libCSD library (not libcds):

new: https://github.com/migumar2/libCSD/blob/master/StringDictionaryFMINDEX.h
current: https://github.com/rdfhdt/hdt-cpp/blob/master/hdt-lib/src/libdcs/CSD_FMIndex.h

and obviously both of these depend on libcds 1
new: https://github.com/migumar2/libCSD/tree/master/libcds
current: https://github.com/laurensdv/hdt-cpp/tree/master/hdt-lib/src/libdcs (libCSD) is in the same source folder as hdt-lib so probably linked against the same libcds (this one: https://github.com/rdfhdt/hdt-cpp/tree/master/libcds-v1.0.12)

which have in essence not much changed:
https://github.com/migumar2/libCSD/blob/master/FMIndex/SSA.h
vs
https://github.com/rdfhdt/hdt-cpp/blob/master/hdt-lib/src/libdcs/fmindex/SSA.h

both seem to be using following parts of the libcds (1) library:

#include <SequenceBuilder.h>
#include <Sequence.h>
#include <BitSequenceBuilder.h>
#include <BitSequence.h>

#include <Mapper.h>

in the new libcds 2 these are unchanged except for the capitalization:
https://github.com/fclaude/libcds2/tree/master/include/libcds2/immutable

but comparing e.g.
https://github.com/fclaude/libcds/blob/master/include/BitSequence.h
with
https://github.com/fclaude/libcds2/blob/master/include/libcds2/immutable/bitsequence.h

you can see that the entire underlying data structure and interfacing has changed
for example change from size_t to cds_word and many other things.

so if libcds 1 and 2 are similar in essence the FMIndex files in both the new and current version should be convertable to the new version no?

@kjetilk
Copy link
Author

kjetilk commented Dec 2, 2015

I'll just say thanks for putting this on the radar! When it gets to these details, I don't have anything more useful to contribute, so I'll leave it to you to see if you can do something sensible about it as time allows. :-)

@kjetilk
Copy link
Author

kjetilk commented Dec 2, 2015

Since Debian is my main motivation for pestering you about this, I'll just note that the next relevant deadline is the Debian Import Freeze of Ubuntu on Feb 18th. If you'd like to have HDT in the next Ubuntu release, it has to go into Debian Sid ten days before that. To do that, it must be submitted by an established Debian developer into the NEW queue for approval. My own packages have gone through the NEW queue of Debian in a couple of days, but for it to be stuck there for a couple of months are known to happen too. Still, it seems there is a good chance of getting it into Ubuntu if it is in shape by early January. I have someone I can ask about packaging it.

@kjetilk
Copy link
Author

kjetilk commented Nov 28, 2016

Just to leave another heads-up, Debian has a release coming up with a freeze for new packages just in the beginning of the new year.

Would have been great for the future if the issues that are preventing packaging could be ironed out. Perhaps a label for that could be created?

@RubenVerborgh
Copy link
Member

Good suggestion; created the debian-package label for this purpose.

Perhaps we should make it a milestone as well.

@mielvds mielvds added this to the Late 2017 milestone Jan 12, 2017
@kjetilk
Copy link
Author

kjetilk commented Oct 8, 2018

Seems like both the libcds libraries are unmaintained now... :-( That seems like a problem for the long-term maintainability of the system. Are there any maintained libraries that could be used on their place?

@akuckartz
Copy link

@fclaude Do you have any suggestions regarding libcds / libcds2 ?

@akuckartz
Copy link

Maybe https://github.com/simongog/sdsl-lite/ is an alternative?

@fclaude
Copy link

fclaude commented Oct 10, 2018 via email

@kjetilk
Copy link
Author

kjetilk commented Oct 10, 2018

Thanks a lot for that update, @fclaude ! I see sdsl-lite is already in Debian, I think this could be a very good thing for HDT. Perhaps change the title of this to "Migrate to sdsl-lite"?

@kjetilk
Copy link
Author

kjetilk commented Oct 10, 2018

BTW, on the topic of Debian, they seem to have a freeze in February, so it needs to be in well before that.

@jonassmedegaard
Copy link

Hi, I am Debian developer and work directly with @kjetilk on getting semantic web related projects into Debian.

In case there are any questions specifically about integration with Debian (and, by extension, with Ubuntu) I am now following this discussion and happy to help where I can here.

@kjetilk kjetilk changed the title Clean up libcds dependency Replace libcds with sdsl-lite Nov 30, 2018
@kjetilk
Copy link
Author

kjetilk commented Nov 30, 2018

I changed the title to reflect what seems to be the actual step forward. :-)

@donpellegrino
Copy link
Contributor

Attempting to replace libcds with sdsl-lite, I see that one area that needs work is the compressed string dictionaries implementation. Focusing on "libhdt/src/libdcs/fmindex/SSA.h" I see the set of headers mentioned in #19 (comment). However it is not clear to me how everything might map to new elements of the SDSL Lite library. I could use some advise on how libcds compares with SDSL Lite.

Headers to change (assuming I get the class changes right):

  • <SequenceBuilder.h> -> <sdsl/vectors.hpp>
  • <Sequence.h> -> <sdsl/vectors.hpp>
  • <BitSequenceBuilder.h> -> <sdsl/vectors.hpp>
  • <BitSequence.h> -> <sdsl/vectors.hpp>
  • <Mapper.h> -> <sdsl/vectors.hpp>

The typedef uchar can just be replaced with explicit code:

  • uchar -> unsigned char

Class changes:

  • SequenceBuilder -> int_vector<?>
  • BitSequenceBuilder -> bit_vector
  • Sequence -> int_vector<?>
  • BitSequence -> bit_vector

It seems that if the class changes can be figured out, then I can focus on the methods and aligning the code to use methods of the new classes.

Is it reasonable to expect that these libcds classes could be swapped out one-for-one with SDSL Lite classes, or am I missing a more fundamental difference in how libcds and SDSL Lite approach sequences?

donpellegrino pushed a commit to donpellegrino/hdt-cpp that referenced this issue Apr 4, 2022
@donpellegrino
Copy link
Contributor

One thing to note here is that switching to SDSL Lite seems to require changing the license for HDT-CPP. Currently, HDT-CPP is using the LGPL license per https://github.com/rdfhdt/hdt-cpp#license. SDSL Lite uses GPLv3 per https://github.com/simongog/sdsl-lite#licensing. Per https://www.gnu.org/licenses/gpl-faq.en.html#IfLibraryIsGPL linking HDT-CPP to SDSL Lite would require that HDT-CPP be licensed under the more restrictive GPL instead of LGPL.

libcds is LGPL per https://github.com/fclaude/libcds/blob/master/COPYING

donpellegrino pushed a commit to donpellegrino/hdt-cpp that referenced this issue Apr 5, 2022
… to sdsl-lite. This commit does not link correctly and is not expected to run correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests