New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example code for Spacy Entity Linking? #4511
Comments
@davidbernat I think these two pointers should help you out: |
Thanks. I'd read those previously. There is no pre-trained model for download? This surprises me. |
Link #2 talks about download the wikidata KB. Are you talking about something else? |
Item 2 discusses downloading the Spacy Wikipedia Knowledge Base and using the Spacy training paradigm to train a model. It does not describe downloading a pre-trained model. I am surprised by this. Certainly an already trained model exists out there, already. No? And/or this pre-trained model would be useful for pre-training in creation of other knowledge bases. |
@AmoghM Can you clarify how step 2 is run? I find the instructions confusing. Do train instances and dev instances need to be set? It says it's set to 90/10 by default in the file wikidata_train_entity_linker.py's annotations but looking at the code and the command line output, the script seems to bork out if they're not set. I'm running |
@davidbernat on the original question: The NEL functionality is still in a sort of beta phase, as we're still working on refining the data & models. All the API's etc have been implemented though, and can be used for early experimentation. This does indeed mean training your own model on Wikipedia/Wikidata dumps you have to download, as detailed in the readme file linked to by @AmoghM. @petulla: I'm not entirely sure what you mean by "the script seems to bork out", but the script should indeed work if |
@svlandeg Thanks Sofie! And great presentations by you on the feature floating around the web!
So that I could know that I am training to approximately state-of-the-art (or state-of-your-art). The same information for a typical pre-training warm start for a new model would be superbly helpful.
I am really looking to include this feature in a release I am making this weekend.
Would love to discuss NLP via email. If you're open to it. Thanks! |
Happy to hear you've enjoyed the presentations! It's also good to see the community interest in the EL work. Unfortunately it's taking us a little longer than expected to get a proper model out, as we've run into several issues with the data, want to add also coreference resolution, etc etc. This is the reason why we haven't officially released this yet... but any feedback on the current implementation is ofcourse very welcome !
Not one I'm satisfied with, no, hence the work-in-progress ;-) |
Oh pish posh! :-) What do they say? Show your drafts early? ;-) OK. What co-ref are you using? Hugging Face's impressed me with its demo accuracy. If you change your mind re: models, please email me. Will be appreciated. |
Yes - we're actually working together with Hugging Face for keeping And I do show my drafts early - it's all on the current |
I'm confused. Are you saying there is a trained model on the master branch? |
Hi, I'm trying to train a model using the scripts in https://github.com/explosion/spaCy/tree/master/bin/wiki_entity_linking. |
Hi, Great work on making EL more accessible! I'm also trying to train a model for Finnish language and I'm confused about the "model" parameter that the wikidata_pretrain_kb.py requires. What is this "model"? Thanks in advance! |
I'm really excited to see this work happening! Maybe I'm being too eager to use this WIP, but I've tried twice to run wikidata_train_entity_linker.py and each time, after 36 hours or so, I get a memory error: [1]+ python ./bin/wiki_entity_linking/wikidata_train_entity_linker.py ~/projects/nel/out/ &
$ python ./bin/wiki_entity_linking/wikidata_train_entity_linker.py ~/projects/nel/out/
2019-11-05 18:28:42,575 - INFO - __main__ - Creating Entity Linker with Wikipedia and WikiData
2019-11-05 18:28:42,575 - INFO - __main__ - STEP 1a: Loading model from /Users/cwulfman/projects/nel/out/nlp_kb
2019-11-05 18:28:59,520 - INFO - __main__ - STEP 1b: Loading KB from /Users/cwulfman/projects/nel/out/kb
2019-11-05 18:29:09,389 - INFO - __main__ - STEP 2: Reading training dataset from /Users/cwulfman/projects/nel/out/gold_entities.jsonl
2019-11-05 18:29:09,389 - INFO - bin.wiki_entity_linking.wikipedia_processor - Reading train data with limit None
[edited]
1122567it [17:20:10, 66.05it/s]
/Users/cwulfman/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/resource_tracker.py:203: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d ' Is there anything I can do to fix this, or should I just be patient and wait for @svlandeg to release a model she's happy with? 😉 |
My training script keeps failing as well (with 256GB RAM). It loads |
Try limiting the number of training examples, as in |
That seems to be working – it feels weird to have memory problems, even after having upgraded to 312GB RAM. I guess I'll have to find the optimal balance of RAM and training data size. |
Even official entity liking model is not prefect now, could you please release a version for demo? |
@StudyExchange if you need an entity linking model and are OK with something slightly dated and pre-baked, the dbpedia spotlight API might suffice. https://www.dbpedia-spotlight.org/demo/ |
Thank you! dbpedia is OK to basic perceptual learn. |
Here model can be any spacy language model, for example: 'en_core_web_lg' |
@alepiscopo, @cwulfman, @petulla, @kevingeng et al (also in response to Issue #4544): PR #4811 addresses the memory requirements for training the Entity Linking pipe. Also, there is now a more informative progress bar that lets you estimate how long one epoch will take. If there are any more issues after merging / trying this PR - please feel free to open a new Issue. [Edit 6 april 2020]: this was the original command I ran:
|
@svlandeg Thanks so much for this amazing work! I am actually in the process of testing the Entity Linking and creating a custom model, is there any actual example on how to run it? Would you guys mind providing an example of how to run the |
I'm not sure what you mean by an example on how to run the script - you just need to fill in the appropriate parameters and it'll run for you. See also the readme file at https://github.com/explosion/spaCy/tree/master/bin/wiki_entity_linking. |
For a quick test (only using a small part of the data to see whether the script runs), use |
@svlandeg Thanks for your quick response. I am referring to the somewhat unclear part of the documentation in the README.
Thanks for your quick test sample parameters :) |
|
Fixed it, other environments issue. FYI for all the people interested I am running with: |
Apologies for what is likely a simple failure to find the right documentation. I understand Spacy recently added Entity Linking. How do I enable this in the default pipeline? What model do I need to install to bring this capability? Sample code would be very helpful.
Throws error:
Model for component 'entity_linker' not initialized. Did you forget to load a model, or forget to call begin_training()?
Where do I get the model? Thanks!
The text was updated successfully, but these errors were encountered: