Skip to content

otobrglez/politiki-ner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Politiki NER

Named-entity recognition project for Slovenian political data.

Installation & development

# Python 2.7.6
mkvirtualenv --no-site-packages politiki
workon politiki
pip install --upgrade -r requirements.txt

Libaries and tools used

Preparing and scraping data

Manually scrape each portal or run './bin/small_crawl.sh' script

scrapy crawl delo -o data/urls/delo.csv -t csv -O --nolog

Combine URL lists into one huge list.

cat data/urls/*.csv | cut -d ',' -f1 | grep -v -e "url" | uniq -u > data/lists/big.txt

Use Aria2 to download everything for offline processing

aria2c --conf-path aria_config -i data/lists/big.txt

Author and credit

About

Named-entity recognition system for Slovenian political news.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors