Mechanical News is a Python web server application that crawls and saves news articles for research purposes.
- Automatically download and extract information from news articles
- Save structured information into a searchable database
- Get structured news data through an API client library
- Extend with your own news sources and collect custom information
Researchers can use Mechanical News with a client library of their favorite programming language, and thereby minimize the distance from data collection to both data analysis and machine learning.
An R client library for the Mechanical News API is being developed. An equivalent for Python is planned as well.
This project is under development at the Department of Journalism, Media and Communication (JMG), University of Gothenburg.
- Python 3+
- Windows, Linux or Mac OS
Mechanical News relies on Scrapy for web scraping.
pip install https://github.com/peterdalle/mechanicalnews
Note: This is a development version. The install may fail.
See documentation wiki.
- Application design
- Build scraper with headless browser that access HTML DOM
- Build one scraper per news site
- Build scheduler and que system
- Add user handling with API keys
- Add automatic error handling, with e-mail alerts
- Run unit tests
- Build R client library
- Validate method against existing data sources
- Build Python client library
- Build web client
- 2019-02-13 Programming started
- 2019-02-08 Design implementation
- 2018-10-22 Design idea started