-
Notifications
You must be signed in to change notification settings - Fork 1
How to Use
Marcel Heinz edited this page Jul 30, 2018
·
5 revisions
- Required technology:
pip install sparqlwrapperpip install nltkpip install requests-
pip install pandas(if you want to run plot/table scripts) - Download Stanford Core NLP: https://stanfordnlp.github.io/CoreNLP/index.html#download
- How to reproduce results:
- Mine Dbpedia Live by running
src/mine/miner.py.- This produces the file
data/langdict.json, where all articles are listed with depth information. The entries will be annotated by indicators/checks. - This produces the file
data/catdict.json, where all categories are listed and annotated.
- This produces the file
- Start Stanford Core NLP server.
- Open a Terminal at the folder where you deployed Stanford Core NLP.
- Start the server using:
java -cp "*" -Xmx4g edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 150000 -quiet - (Be sure that your computer's network settings allow connection to the created local host server. Especially, settings like http_proxies in your environment variables seem to cause trouble.)
- Start the indication pipeline.
- Run src/check/pipeline.py`
- It's safe to run the following indicators in
src/checkseparately for the software languages domain.- infobox_dbpedia_existence.py (search for infobox template references maintained in Dbpedia)
- url_pattern.py (search for keywords in the URL)
- lists_of.py (search for links to extracted article names in a list of lists that is located at data/Language_Lists.txt)
- summary_keywords.py (search for keywords in the summary)
- hypernym_nlp_firstsentence.py (search for part-of-speech pattern indicating that one of the keywords is a hypernym)
- It's also safe to run one of the additional indicators separately:
- hypernym_dbpedia.py (search for Hypernyms annotated only in Non-Live Dbpedia)
- hypernym_wordnet.py (tries to match article names in Wordnet)
- infobox_position.py (search for infobox template references in Wikipedia. The Revision numbers from Dbpedia entries are reused here and the API is used for accessing.)
- semantic_distance.py (computes a metric based on annotated categories)
- Customize the configuration if necessary. Otherwise you can keep the configuration for the software languages domain.
- Set root categories (CATS) in
src/data/__init__.py - Set the domain keywords in
src/data/__init__.py - Note that lists_of.py is a domain-specific indicator. You'd have to retrieve a new list of lists for a new target domain.
-
src/check/seed.pyannotates extract articles on whether they were matched in the seed (seedata/seed_annotated.json).
- Set root categories (CATS) in
- The Wiki provides a more general overview on evaluation results and provides further statistics.
- Mine Dbpedia Live by running