A Bayesian statistical model for aggregating and visualizing UK opinion polling data. This project scrapes polling data from Wikipedia, fits a Bayesian B-spline regression model to estimate smooth trends for each political party, and presents the results in an interactive web dashboard.
Rather than simply averaging polls or using traditional smoothing methods (like LOESS), this project uses a Bayesian approach to estimate polling trends while accounting for uncertainty. Each party's support is modeled independently using B-spline basis functions, with posterior distributions estimated via Markov Chain Monte Carlo (MCMC) sampling.
- Model type: Bayesian B-spline regression with no pooling across parties
- Spline knots: Approximately one knot per month (adaptive to data)
- Likelihood: Normal distribution with party-specific variance
- Inference: MCMC sampling using PyMC with the NUTS sampler (via nutpie)
- Proper uncertainty quantification: Unlike simple averages, the model provides credible intervals that reflect both sampling uncertainty and model uncertainty
- Smooth trends: B-splines provide flexible, smooth curves that adapt to the data
- Separate party modeling: Each party gets its own independent trend
- All polls are treated as equally reliable (no weighting for pollster house effects or quality)
- Parties are modeled independently without accounting for correlations
- Vote shares may not sum to 100% across parties
- This is a descriptive model, not a predictive/forecasting model
polls/
├── extract_polls.py # Web scraper for Wikipedia polling data
├── main.py # Bayesian model fitting and result generation
├── index.html # Interactive visualization dashboard
├── data/
│ ├── raw/ # Raw scraped HTML
│ └── processed/ # Processed parquet/JSON files
├── pyproject.toml # Project dependencies
└── README.md # This file
This project uses Python 3.12+ and uv for dependency management.
# Clone the repository
git clone <repository-url>
cd polls
# Install dependencies with uv
uv syncpython extract_polls.pyThis scrapes the latest polling data from Wikipedia and saves it as data/processed/uk_polling_data.parquet.
python main.pyThis:
- Loads the polling data
- Fits the Bayesian B-spline model
- Exports results to
data/processed/polling_results.jsonanddata/processed/polling_data.json
Open index.html in a web browser. The dashboard reads the JSON files and renders an interactive visualization.
All polling data is sourced from: https://en.wikipedia.org/wiki/Opinion_polling_for_the_next_United_Kingdom_general_election
Wikipedia aggregates polls from various organizations including YouGov, Ipsos MORI, Savanta, Redfield & Wilton, and others.
Potential enhancements for future iterations:
- Hierarchical modeling to account for pollster house effects
- Multinomial regression to ensure vote shares sum to 100%
- Predictive modeling with proper forecasting methodology
- Poll quality weighting based on sample size, methodology, and pollster track record
- Temporal correlation modeling between parties
- Automated updates with scheduled scraping and model refitting
This project is for educational and research purposes. Polling data is sourced from Wikipedia and belongs to the respective polling organizations.
This is a statistical model for analyzing polling trends and should not be interpreted as a prediction or forecast of election results. Polls can be volatile and may not accurately reflect final election outcomes.