Artificial Data Plug and Play

Get up and running with experimenting on artificial NHS data!

This material is maintained by the NHS England Data Science team.

See our other work here: NHS England Analytical Services.

To contact us raise an issue on Github or via email and we will respond promptly.

What is artificial data?

Artificial data sets provide users with large volumes of data that share some of the characteristics of real data while protecting patient confidentiality. They are designed to model the structure of real data but are completely artificial – they do not contain any actual patient records. We are piloting this new service with a limited number of artificial data sets.

You can find out more about the pilot on the NHS website.

What is this repo for?

This repo contains some example code for getting started with using artificial data with minimal setup.

It was creating using the rap-package-template which provides a neat way to create new repositories for Reproducible Analytical Pipelines.

What does the repo contain?

The repo contains the following files and directories:

|- sql                  # Code for interacting with SQL
|- src                  # Source code for data ingestion, cleaning, processing, etc
|- templates            # Templates for excel reporting
|- tests                # Test modules
|- pyproject.toml       # Configuration
|- plug_and_play.ipynb  # Plug and play notebook
|- requirements.txt     # Python dependencies to be installed via pip
|- ...                  # Additional repo files (e.g. .gitignore)

Note: because this repo was created from the rap-package-template there are a number of files / folders that persist from that template. These have been left in the repo so that you can fork the repo and adapt to your own needs!

For the plug and play tutorial, the main file you'll be interacting with is plug_and_play.ipynb. See below for instructions on how to get set up to run the tutorial.

How do I get started?

If you are setting up the tutorial in an environment which is provisioned out of the box (such as Google Colab or GitHub Codespaces), see Quick start. More detailed instructions can be found in Full setup.

Quick start

The easiest way to run the tutorial is in an environment which is provisioned out of the box. Clicking one of the buttons below will open the repo in the respective environment with all the dependencies setup so you can just get coding!

Full setup

Prerequisites:

A bash terminal (although similar instructions will work in PowerShell)
Python >= 3.10
An IDE or text editor (such as VS Code or PyCharm)

Open a terminal and execute the following

Navigate to a directory you want to create the tutorial repo in (using cd DESTINATION_DIRECTORY)
Clone the repo using git clone https://github.com/NHSDigital/artificial-data-plug-and-play.git
Open the repo in the terminal using cd artificial-data-plug-and-play and create a virtual environment via python -m venv .venv (note you don't have to do this in a virtual environment, but it is recommended)
Activate the environment and install the requirements . .venv/bin/activate && pip install -r requirements.txt
(Optional) Install jupyter via pip install jupyter. This will allow you to use jupyter notebooks thoough the classic web interface.
Open the tutorial
- Using jupyter if you installed it using the command above jupyter notebook plug_and_play.ipynb
- Alternatively, you can open the notebook in your IDE of choice (for example using VS Code)

You should now be ready to run the plug and play!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.devcontainer

.devcontainer

sql

sql

src

src

templates

templates

tests

tests

.gitignore

.gitignore

LICENCE

LICENCE

README.md

README.md

config.toml

config.toml

oryx-build-commands.txt

oryx-build-commands.txt

plug_and_play.ipynb

plug_and_play.ipynb

pyproject.toml

pyproject.toml

requirements.txt

requirements.txt

Repository files navigation

Artificial Data Plug and Play

What is artificial data?

What is this repo for?

What does the repo contain?

How do I get started?

Quick start

Full setup

See also

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.devcontainer		.devcontainer
sql		sql
src		src
templates		templates
tests		tests
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
config.toml		config.toml
oryx-build-commands.txt		oryx-build-commands.txt
plug_and_play.ipynb		plug_and_play.ipynb
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

NHSDigital/artificial-data-plug-and-play

Folders and files

Latest commit

History

Repository files navigation

Artificial Data Plug and Play

What is artificial data?

What is this repo for?

What does the repo contain?

How do I get started?

Quick start

Full setup

See also

About

Resources

License

Stars

Watchers

Forks

Languages