Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrapping C/C++ libraries #5

Open
jeroen opened this issue Apr 19, 2017 · 13 comments
Open

Wrapping C/C++ libraries #5

jeroen opened this issue Apr 19, 2017 · 13 comments

Comments

@jeroen
Copy link
Member

jeroen commented Apr 19, 2017

If people know of any useful C/C++ libs that would be nice to wrap into an R package, I am happy to assist with that!

@dselivanov
Copy link

dselivanov commented Apr 19, 2017

@jeroen I have several in my list.

  1. Compact Language Detector 2. Has zero dependencies. Should not be too hard to wrap.
  2. POS tagger. I believe "lookahead" algorithm looks promising and easily extendable to many languages. I'm aware of 2 repos: cltk/lapos and brunexgeek/nlp-tools.
  3. bigartm flexible non-bayesian framework for topic modeling - generalize LDA, PLSA.

@kbenoit
Copy link
Collaborator

kbenoit commented Apr 19, 2017

Here's a parser and tagger based in C++ that could be wrapped in an R package:
http://www.cs.cmu.edu/~ark/TurboParser/

@benmarwick
Copy link

I'd be keen to see Dynamic Topic Models (https://github.com/blei-lab/dtm) available in R. It's a major library by David Blei for analysing how topics change over time, an extension of LDA.

@lmullen
Copy link
Member

lmullen commented Apr 19, 2017

👍 to @benmarwick's suggestion of Dynamic Topic Models.

@dselivanov
Copy link

Added bigartm - non bayesian framework for topic modeling. Online, parallel, asynchronous, very flexible. Actively developed.

@jeroen
Copy link
Member Author

jeroen commented Jun 2, 2017

For those still following this thread: I have wrapped up Compact Language Detector 2 into an R package. Give it a go and let me know if it works: https://github.com/ropensci/cld2#readme

@dselivanov
Copy link

Thanks @jeroen , will do.

@kbenoit
Copy link
Collaborator

kbenoit commented Jun 3, 2017

Awesome! Im running some tests now.

@jeroen
Copy link
Member Author

jeroen commented Jun 4, 2017

OK cld2 is on cran now, will do a v1.1 next week. Let's see what else we got here :)

@jeroen
Copy link
Member Author

jeroen commented Jun 6, 2017

I had a look at dtm but unfortunately the code is too broken to wrap in R. It has all kind of compiler warnings and doesn't build on Windows at all. It also no longer seems actively maintained.

@jeroen
Copy link
Member Author

jeroen commented Jun 9, 2017

The cld3 package is now on cran as well. Would be fun to see someone who is into text compare cld2 and cld3 on real data.

@kbenoit
Copy link
Collaborator

kbenoit commented Jun 9, 2017

How about unRTF? https://www.gnu.org/software/unrtf/.

as in quanteda/readtext#90

@jeroen
Copy link
Member Author

jeroen commented Jun 9, 2017

OK here is a wrapper for unrtf: https://github.com/ropensci/unrtf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants