MC Backup RSS Fetcher

The code Media Cloud server wasn't performing well, so we made this quick and dirty backup project. It gets a prefilled in list of the RSS feeds MC usually scrapes each day (~130k). Then throughout the day it tries to fetch those. Every night it generates a synthetic RSS feed with all those URLs.

Files are available afterwards at http://my.server/rss/mc-YYYY-MM-dd.rss.gz.

See documentation in doc/ for more details.

Install for Development

For development using dokku, see doc/deployment.md

For development directly on your local machine:

Install postgresql & redis
Create a virtual environment: python -mvenv venv
Active the venv: source venv/bin/activate
Install prerequisite packages: pip install -r requirements.txt
Create a postgres user: sudo -u postgres createuser -s MYUSERNAME
Create a database called "rss-fetcher" in Postgres: createdb rss-fetcher
Run alembic upgrade head to initialize database.
cp .env.template .env (little or no editing should be needed)

mypy.sh will install mypy (and necessary types library & autopep8) and run type checking.
autopep.sh will normalize code format

BOTH should be run before merging to main (or submitting a pull request).

All config parameters should be fetched via fetcher/config.py and added to .env.template

Running

Various scripts run each separate component:

python -m scripts.import_feeds my-feeds.csv: Use this to import from a CSV dump of feeds (a one-time operation)
run-fetch-rss-feeds.sh: Start fetcher (leader and worker processes)
run-server.sh: Run API server
run-gen-daily-story-rss.sh: Generate the daily files of URLs found on each day (run nightly)
python -m scripts.db_archive: archive and trim fetch_events and stories tables (run nightly)

Development Docs

doc/database-changes.md describes how to implement database migrations.
doc/stats.md describes how monitoring is implemented.

Deployment

See doc/deployment.md and dokku-scripts/README.md for procedures and scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 945 Commits
dashboards		dashboards
doc		doc
dokku-scripts		dokku-scripts
fetcher		fetcher
scripts		scripts
server		server
stubs		stubs
.env.template		.env.template
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
alembic.ini		alembic.ini
app.json		app.json
autopep8.sh		autopep8.sh
install.sh		install.sh
mypy-requirements.txt		mypy-requirements.txt
mypy.ini		mypy.ini
mypy.sh		mypy.sh
postdeploy.sh		postdeploy.sh
requirements.txt		requirements.txt
run-aws-sync.sh		run-aws-sync.sh
run-fetch-rss-feeds.sh		run-fetch-rss-feeds.sh
run-gen-daily-story-rss.sh		run-gen-daily-story-rss.sh
run-server.sh		run-server.sh
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MC Backup RSS Fetcher

Install for Development

Running

Development Docs

Deployment

About

Releases

Packages

Contributors 2

Languages

License

mediacloud/rss-fetcher

Folders and files

Latest commit

History

Repository files navigation

MC Backup RSS Fetcher

Install for Development

Running

Development Docs

Deployment

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages