Skip to content
This repository has been archived by the owner on Sep 3, 2018. It is now read-only.

Make alternate to writing of file to run #16

Closed
ajdocherty opened this issue May 12, 2017 · 6 comments
Closed

Make alternate to writing of file to run #16

ajdocherty opened this issue May 12, 2017 · 6 comments

Comments

@ajdocherty
Copy link

Writing a new file to run this plugin causes the system to break in a multithreaded production environment. Furthermore, the way that the plugin resources are loaded means that there always has to be a physical file to point to for loading the plugin, which doesn't work for using resources bundled inside of jars.

Make an alternate run strategy that loads from resource and doesn't write this unnecessary file.

@johann-petrak
Copy link
Owner

Could you please be more specific about what exactly breaks when and what your suggestion would be to change or add functionality?

@ajdocherty
Copy link
Author

We cannot point the plugin to bundled resources as it expects a physical file, probably due to this file that is created by the plugin to run (the binary lookup file). So for now we have taken the plugin out of our pipeline as building the project with maven and then running the service, seems to be problematic given this creation of a file on the file system. If you had an option to load the gazetteer normally without this physical file dependency then that would be great.

@johann-petrak
Copy link
Owner

OK, so you mean some mechanism where the gazetteer list can be loaded from a JAR or some other URL?
The gazetteer builds a highly optimized trie datastructure from the original lists, which takes a while. For this reason, the datastructure is written into a .gazbin file as a cache. Not having this cache would mean that every time the lists are loaded, the optimization and compilation into the trie has to be done first.
Could you give an example for how you would imagine specifying the gazetteer lists in a way that would be compatible with your deployment requirements?

@ajdocherty
Copy link
Author

The default gazetteers suffice, so if the optimized trie cannot be built without this cache then provide an option for a normal, unoptimized lookup Gate style, where resources can be loaded from the JAR.

By the way, see https://github.com/npgall/concurrent-trees for an implementation of efficient in-memory tries that are thread safe, in case you want to explore an alternative to writing to a cache.

@johann-petrak
Copy link
Owner

The optimized trie created by the gazetteer pr already is thread safe.
If you do not need the optimization (which increases both memory efficiency for huge gazetteer lists and lookup speed), then maybe the default gazetteer included in the standard GATE distribution is a better option?
There will be some kind of support for loading resources from a JAR eventually, but this will have to wait until after the first pre-release version of the next GATE version.

For this issue, I need a concrete description of what you propose should get changed or implemented so that developers can decide whether to implement it and to take it.

@ajdocherty
Copy link
Author

No worries, we'll just use the default gazetteer. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants