New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to extract phrases from Wikipedia? #16
Comments
Hi @Albert-Ma, if you are looking for getting phrase representations from the documents, please refer here. The code that extracts phrases is https://github.com/princeton-nlp/DensePhrases/blob/main/generate_phrase_vecs.py and also see
which is used in generate_phrase_vecs.py .
|
Hi @jhyuklee, |
Phrase retrieval is trained with QA datasets which contain phrase-level answer annotations. So we don't need to explicitly extract phrases before training. After training,
Here, metadata means phrase vector related outputs for each document (phrase start/end vectors, start2end mapper, etc). |
Got it, thanks |
You can also check this issue! #17 |
Hi!
First of all thanks a lot for this solid project!
I just want to figure out how to extract phrases from Wikipedia? Which script is the right one?
I am a little confused when I see so many scripts in the
preprocess
folder.The text was updated successfully, but these errors were encountered: