Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
bin
 
 
bot
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Politiki NER

Named-entity recognition project for Slovenian political data.

Installation & development

# Python 2.7.6
mkvirtualenv --no-site-packages politiki
workon politiki
pip install --upgrade -r requirements.txt

Libaries and tools used

Preparing and scraping data

Manually scrape each portal or run './bin/small_crawl.sh' script

scrapy crawl delo -o data/urls/delo.csv -t csv -O --nolog

Combine URL lists into one huge list.

cat data/urls/*.csv | cut -d ',' -f1 | grep -v -e "url" | uniq -u > data/lists/big.txt

Use Aria2 to download everything for offline processing

aria2c --conf-path aria_config -i data/lists/big.txt

Author and credit

About

Named-entity recognition system for Slovenian political news.

Resources

Releases

No releases published
You can’t perform that action at this time.