Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add property to provide custom resources directory #9

merged 1 commit into from May 8, 2017


None yet
2 participants
Copy link

commented May 6, 2017

Currently there is only one way to provide the linguistic resources (e.g. non-breaker files) to the tokenizer: packaging the files with the tokenizer jar.

I currently work on the Fortis project [1], a social data analysis platform for the United Nations. For this project, we'd like the ability to update the linguistic resources without re-deploying a new jar file. As such, we need a way to provide the linguistic resources to the tokenizer without changing the jar. This pull request implements such a mechanism by adding a new property called resourcesDirectory. If this property is set to an existing directory, the tokenizer will try to load the linguistic resources from this directory instead of from the jar file.

Another nice property of this change is that it'll make it easier for users to comply with the terms of the license of the linguistic resources, the LGPL-LR [2], as we no longer need to bundle the LGPL-LR resources together with the tokenizer code which means that the application will count as a "work that uses the Linguistic Resource" and as such fall outside the scope of the license.


@ragerri ragerri merged commit d494985 into ixa-ehu:master May 8, 2017


This comment has been minimized.

Copy link

commented May 8, 2017

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.