jrf-insight

JRF North England Insight Finder

Project hub page on Open Innovations website

StatXplore

To run the pipelines you need to register for an account and get an open data API key. You'll need to add this STATXPLORE_API_KEY=<your_key> to a .env file at the top level directory (Make sure this is added to .gitignore). Without this, you will not have permission to pull data from stat-xplore.

Pipenv

If you are on windows you should install windows subsytem for linux, and install pip and then pipenv in a linux environment. Use bash to enter your linux virtual machine. Install pipenv with pip install pipenv.To activate the virtual environment, use pipenv shell.
To install python libraries usepipenv install <library-name>. Dependencies in the pipfile are detailed in Pipfile.lock.

Pipelines / DVC

Our pipelines are managed using dvc. You can re-run all the pipelines using dvc repro -R pipelines, or an individual pipeline using dvc repro pipelines/<name>/dvc.yaml.

statxplore is the pipeline for getting data from the DWP's statXplore database. It has 3 stages. probe.py gets lookups for the api calls for the measures, dimensions and database names of every database in statXplore. These are located in data/lookups. describe.py gets metadata about every database. extract gets actual data from statxplore using .json API requests, which are kept in statxplore/json/data.

place generates all the place .geojson files and the data that is associated with each place page on the site.

fingertips will be used to scrape data from Public Health England's database.

metadata is the pipeline to create the metadata section of the site.

Data

Raw datasets are stored in data-raw/. These are files that come straight from their source and are entirely unprocessed. They are then transformed using files in the R directory and saved in data/.

Transformed data is stored in data/. Data in this folder is prepared using pipelines/place/transform.ipynb and duckdb to drive the visualisations on the site. In general, transform.ipynb produces most-recent data that goes in tables or dashboard blocks. duckdb is used to select and pivot data that powers time-series or multi-series visualisations (e.g. line charts and bar charts).

All ONS and Government data is openly available and has been accessed and used in accordance with the Open Government License.

resources

Contains meeting notes, useful links and ideas.

src

Contains the site build.

playground

Experiments including inital attempts at modelling data.

Name		Name	Last commit message	Last commit date
Latest commit History 689 Commits
.dvc		.dvc
.github/workflows		.github/workflows
.vscode		.vscode
R		R
data-mart		data-mart
data-raw		data-raw
data		data
lib		lib
patch		patch
pipelines		pipelines
playground		playground
renv		renv
resources		resources
scripts		scripts
src		src
.Rprofile		.Rprofile
.dvcignore		.dvcignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.renvignore		.renvignore
DuckDB.session.sql		DuckDB.session.sql
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
TODO.md		TODO.md
_config.ts		_config.ts
deno.jsonc		deno.jsonc
profile.ts		profile.ts
renv.lock		renv.lock
scripts.yaml		scripts.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

jrf-insight

StatXplore

Pipenv

Pipelines / DVC

Data

resources

src

playground

About

Releases

Packages

Contributors 4

Languages

License

open-innovations/jrf-insight

Folders and files

Latest commit

History

Repository files navigation

jrf-insight

StatXplore

Pipenv

Pipelines / DVC

Data

resources

src

playground

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages