Skip to content
The Django source code for the GovTrack.us website.
Python HTML JavaScript CSS Shell
Branch: master
Clone or download

Latest commit

Ben Hammer and JoshData add vp article to vp candidates
merges #197
Latest commit 0c406c9 May 27, 2020

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
analysis Update count_pages_of_bills to output each bill May 26, 2020
bill Add public pages for tracker lists Apr 1, 2020
build move data/us/{sessions.tsv, liv.xml, liv111.xml, crsnet.xml} into thi… Feb 23, 2019
committee Add the committee activity by day chart to the committees page May 6, 2020
common Django 2.x fixes Aug 16, 2018
conf Block a crawler that is generating errors Dec 9, 2019
events Users can now leave public notes on their list pages Apr 10, 2020
ext registration backend updates Feb 24, 2019
files add 2017 backer update PDF Jul 13, 2017
lib various Py3 fixes Aug 16, 2018
oversight increase field max_length in oversight fields Jan 20, 2019
parser new bill loader option to just load recently changed files May 6, 2020
person add vp article to vp candidates May 27, 2020
redirect Django 2.x fixes Aug 16, 2018
smartsearch Trying to optimize showing search results because it has been very sl… May 17, 2020
stakeholder Don't require links in stakeholder positions (#170) Mar 3, 2020
static Add 'Add to List' buttons on bill pages Apr 18, 2020
templates add vp article to vp candidates May 27, 2020
twostream Add public pages for tracker lists Apr 1, 2020
userpanels fix py3 issue exporting data from userpanels Sep 23, 2018
vote Add public pages for tracker lists Apr 1, 2020
website Add proxy voting table to covid-19 page May 27, 2020
.env.deploy.template Add template files for the environment settings Jun 5, 2017
.gitignore Bring most remote assets locally with new fetch-remote-assets.sh script Jan 14, 2020
.gitmodules bump django-registration-pv Oct 14, 2017
DEPLOY.md Add a fabric command for data folder backups Jun 5, 2017
Procfile Add a Procfile for honcho to export Jun 5, 2017
README.md Use Draftable to create bill text comparisons Mar 17, 2020
Vagrantfile Bring most remote assets locally with new fetch-remote-assets.sh script Jan 14, 2020
__init__.py lots of cool more work on the site Feb 26, 2012
civic.json add civic.json May 28, 2015
color-palette.html new more colorful and friendly bill and person page layouts Oct 2, 2016
fabfile.py ran 2to3 Aug 11, 2018
fetch-external-assets.sh Lazy-load thumbnails in search results because especially on vote res… May 17, 2020
manage.py additional server configuration details Mar 5, 2019
minify various Py3 fixes Aug 16, 2018
parse.py ensure scrapers and email updates don't run concurrently Mar 5, 2019
python_rand_hash simplify environment settings a bit Aug 24, 2017
requirements.txt Users can now leave public notes on their list pages Apr 10, 2020
run_scrapers.py more for DHG bill text Mar 16, 2020
script py3 fixes Aug 17, 2018
sessions.tsv move data/us/{sessions.tsv, liv.xml, liv111.xml, crsnet.xml} into thi… Feb 23, 2019
settings.env.template activate utf8mb4 for database connections (but not table columns) Jun 16, 2019
settings.py STATIC_URL wasn't working on the district map embed page Apr 18, 2020
settings_env.py fix mysql option when not using mysql Mar 19, 2020
u adding my 'u' unsubscribe script Dec 18, 2016
urls.py Remove our 'oversight' pages - it was too much effort to collect that… May 6, 2020
us.py new script to get total number of pages/words enacted by Congress, pl… Apr 1, 2020
wsgi.py additional server configuration details Mar 5, 2019

README.md

GovTrack website frontend

This repo contains the source code of the front-end for www.GovTrack.us. The data-gathering scripts are elsewhere.

Local Development

Development using Vagrant

GovTrack.us is based on Python 3 and Django 2.1 and runs on Ubuntu 18.04 or OS X. To simplify local development, we have a Vagrantfile in this directory. You can get started quickly simply by installing Vagrant and running:

# Get this repo (you must clone with `--recursive`)
git clone --recursive https://github.com/govtrack/govtrack.us-web.git

# Change to this repo's directory.
cd govtrack.us-web

# Start Vagrant.
vagrant up

# Create your initial user.
vagrant ssh -- -t ./manage.py createsuperuser

# Start debug server.
vagrant ssh -- -t ./manage.py runserver 0.0.0.0:8000

# Visit the website in your browser at http://localhost:8000!

# Stop the virtual machine when you are done.
vagrant suspend

# Destroy the virtual machine when you no longer are working on GovTrack ever again (or when you want your disk space back).
vagrant destroy

Even though the site is running in the virtual machine, it is using the source files on your host computer. So you can open up the files that you got from this repository in your favorite text editor like normal and the virtual machine will see your changes. When you edit .py files, runserver will automatically restart to re-load the code. The site's database and search indexes are also stored on the host machine so they will be saved even when you destroy your vagrant box.

See further down about configuration.

Development without Vagrant

To set up GovTrack development without a virtual machine, get the source code in this repository (use --recursive, as mentioned above), and then you'll need to follow along with the steps in our Vagrantfile by just looking at what we did and doing the same on your command line.

At the end:

# Create your initial user.
./manage.py createsuperuser

# Start the debug server.
./manage.py runserver

Getting test data

The Vagrantfile automatically loads current legislator information from the live site. The site draws on about a dozen different data sources.

Bills & votes

To get bill and vote data, you'll need to run the "congress" project scrapers.

If you used Vagrant, use vagrant ssh to go into the virtual machine. Otherwise, perform these steps in this project's main directory:

sudo apt install python-dev libxml2-dev libxslt1-dev libz-dev python-pip # see congress project deps
git clone https://github.com/unitedstates/congress congress-project
cd congress-project/
pip2 install -r requirements.txt 
python2 run govinfo --bulkdata=BILLSTATUS --congress=116
python2 run govinfo --collections=BILLS --extract=pdf --years=2020 
python2 run bills --log=debug --govtrack
python2 run votes --log=debug --govtrack
cd ..
mkdir -p local
cho "CONGRESS_DATA_PATH=congress-project/data" >> local/settings.env
mkdir -p data/historical-committee-membership
echo "<stub/>" > data/historical-committee-membership/116.xml
./parse.py bill
./parse.py vote

Configuration

Some features of the site require additional configuration. To set configuration variables, create a file named local/settings.env and set any of the following optional variables (defaults are shown where applicable):

# Database server.
# See https://github.com/kennethreitz/dj-database-url
DATABASE_URL=sqlite:///local/database.sqlite...

# Memcached server.
# See https://github.com/ghickman/django-cache-url#supported-caches
CACHE_URL=locmem://opendataiscool

# Search server.
# See https://github.com/simpleenergy/dj-haystack-url#url-schema
#
# For local development you may want to use the (default) Xapian search engine, e.g.:
# xapian:/home/username/govtrack.us-web/xapian_index_person
# You'll need to `apt-get install python-xapian` and `pip install xapian-haystack`
# or see https://github.com/notanumber/xapian-haystack.
#
# For a production deployment you may want to use Solr instead, e.g.:
# solr:http://localhost:8983/solr/person
#
# You can also specify 'simple' to have a dummy search backend that
# does not actually index or search anything.
HAYSTACK_PERSON_CONNECTION=xapian:local/xapian_index_person
HAYSTACK_BILL_CONNECTION=xapian:local/xapian_index_bill

# Django uses a secret key to provide cryptographic signing. It should be random
# and kept secure. You can generate a key with `./manage.py generate_secret_key`
SECRET_KEY=(randomly generated on each run if not specified)

See settings.env.template for details, especially for values used in production.

Additionally, some data files are stored in separate repositories and must be obtained and the path configured in settings.env:

  • congress project bill status data (etc.)
  • congress-legislators data
  • legislator photos (static/legislator-photos is symlinked to ../data/legislators-photos/photos, so this must go in data for now)
  • GovTrack's scorecards, misconduct, and name pronuciation repositories

Credits

Emoji icons by http://emojione.com/developers/.

Production Deployment Notes

Additional package installation notes are in the Vagrantfile.

You'll need a data directory that contains:

  • analysis (the output of our data analyses)
  • congress (a symbolic link to the congress project's data directory, holding bill and legislator data, some of which can't be reproduced because the source data is gone; also set CONGRESS_DATA_PATH=data/congress in local/settings.env)
  • congress-bill-text-legacy (a final copy of HTML bill text scraped from the old THOMAS.gov, for bills before XML bill text started)
  • historical-committee-membership (past committee membership, snapshots of earlier data)
  • legislator-photos (manually collected photos of legislators; there's a symbolic link from static/legislator-photos to legislator-photos/photos)

You'll need several other data repositories that you can put in the data directory if you don't expose the whole directory over HTTP, but they can also be placed anywhere because the paths are in settings:

At this point you should be able to run ./manage.py runserver and test that the site works.

And conf/uwsgi_start test 1 should start the uWSGI application daemon.

Install nginx, supervisord (which keeps the uWSGI process running), and certbot and set up their configuration files:

apt install nginx supervisor certbot python3-certbot-nginx
rm /etc/nginx/sites-enabled/default
ln -s /home/govtrack/web/conf/nginx.conf /etc/nginx/sites-enabled/www.govtrack.us.conf
ln -s /home/govtrack/web/conf/supervisor.conf /etc/supervisor/conf.d/govtrack.conf
# install a TLS certificate at /etc/ssl/local/ssl_certificate.{key,crt} (e.g. https://gist.github.com/JoshData/49eff618f84ce4890697d65bcb740137)
mkdir /var/cache/nginx/www.govtrack.us
service nginx restart
service supervisor restart
certbot # and follow prompts, but without the HTTP redirect because we already have it

To scrape and load new data, you'll need the congress project, etc.:

  • Clone the congress project repo anywhere and set that directory as CONGRESS_PROJECT_PATH in GovTrack's local/settings.env.
  • Follow its installation steps to create a Python 2 virtualenv for it in its .env directory.
  • Symlink the data/congress data directory as the data directory inside the congress project directory.
  • Clone the congress-legislators project as a subdirectory and follow its installation steps to create a separate Python 3 virtualenv for its scripts in its scripts/.env directory.
  • Try launching the scrapers from the GovTrack directory: ./run_scrapers.py people, ./run_scrapers.py committees, etc.
  • Copy over our local/skoposlabs_s3cmd.ini file.
  • Enable the crontab.

The crontab sends the outputs of the commands to Josh, so the server needs a sendmail-like command. The easiest to set up is msmtp, like so:

apt install msmtp-mta
cat > /etc/msmtprc <<EOF;
account default
auth on
tls on
tls_trust_file /etc/ssl/certs/ca-certificates.crt
host *******
port 587
from #######@govtrack.us
user *******
password *******
EOF
You can’t perform that action at this time.