---
title: "Catch the quality line to Quality St."
author: "joe leach<br>data architect<br>enterprise architecture team"
format: revealjs
bibliography: library.bib
link-citations: yes
logo: logo_th_gov.png
transition: slide
background-transition: fade
title-slide-attributes:
    data-background-color: "#47423f"
    data-background-image: "jupiter-over-whitechapel-30-sep-2023.jpg"
    data-background-size: cover
    data-background-opacity: "0.9"
header-includes: |
      <link rel="stylesheet" media="screen" href="font/style.css" type="text/css"/>
mainfont: AtkinsonHyperlegible
monofont: SpaceMono
---

## {background-image="joe.jpg" background-opacity=0.4}

![<span style="background-color:white">Quality street</span> [see @ukgovdqf]](action-plan.png) 


# 🔎 Identify critical data {background-image="graph.png" background-color="#333"}

* Find data driving **operational success** 
* Find data that is vital for **decision-making**
* Find data where there is a **high impact of low quality**

...sometimes it is all three at once - like in the waste emergency!


## 🔎 Find data driving operational success - waste emergency example {background-image="graph-waste.png" background-color="#333"}

![see @oflog](attachment:image.png)


## 🔎 Find data driving operational success - waste emergency example {background-image="graph-waste.png" background-color="#333"}

Successful operations have clarity on provision:

* 🧑🏽 **person** (a user of a service)
* 📍 **location** (a place where services happen)
* 🏛️ **thing** (assets involved in service delivery)

...having a clear link to the original source, in the form of **master** or **reference** data, helps services to plan

## 🔎 Find data driving operational success and 🏆 win 🎉 awards 🏆  {background-image="geoplace-award.jpg" background-color=purple}

<style>
.lewisham {
  background-color:#00b7eb; color:#fee05c; opacity:.9
}
</style>
<span class="lewisham">The Winner of the 2023 Data Linking Award was Lewisham Council for integrating its Waste and Recycling Service to the Local Land and Property Gazetteer &hellip; the council has ensured that households eligible for food and garden waste collection can access the appropriate services, but do so without incurring additional costs." **Shout to William for pulling this off with postgis and qgis** 🦄 </span> 

## 🔎 Find data that is vital for decision-making {background-image="graph.png" background-color=black}

The strategic plan records 52 key perf
ormance indicators, key political decisions are made based on this data

![](attachment:image.png)

## 🔎 Find data where there is a high impact of low quality {background-image="graph.png" background-color=black}

* enforcement? 
* planning?

# 📃 Identify your data quality rules

1. Completeness
2. Uniqueness
3. Consistency
4. Timeliness
5. Validity
6. (In)accuracy

[see @damadimensions]


## ✅ Completeness {background-image="flats-bw.jpg" background-color="black"}

<style>
.completeness li {
  background-color:white; 
  color:black;
  opacity:.9;
}
</style>
<div class="completeness">

* A school collects forms from parents on emergency contact telephone numbers. 
* There are 300 students, but 294 responses are collected and recorded. 
* [completeness]{.smallcaps} = 294/300 x 100 = 98%
</div>


## ✅ Uniqueness {background-image="dreams.jpg" background-color="black"}
<style>
.uniqueness blockquote {
  background-color:white; 
  color:black;
  opacity:.8;
}
</style>
<div class="uniqueness">

> A school has 120 current students and 380 former students (i.e. 500 in total).

> The student database shows 501 different student records.

> This includes **Bob Tables** and **Bobby Tables** as separate records, despite only one student at the school named **Bob Tables**.

> This shows that the data set has a uniqueness across all records of 500/501 x 100 = 99.8%
</div>

## {background-image="bobby-tables.jpg" background-size=contain visibility=hidden}

## ✅ Consistency {background-image="vans.jpg" background-color="black"}
<style>
.consistency blockquote {
  background-color:white; 
  color:black;
  opacity:.9;
}
</style>
<div class="consistency">

> In a school, a student’s date of birth has the same value and format in the school register as that stored within the student database.

> This is an example where reference data may be used, if it exists for a person.
</div>

## ✅ Timeliness {background-image="air-ambulance.jpg" background-color="black"}

<style>
.timeliness blockquote {
  background-color:white; 
  color:black;
  opacity:.9;
}
</style>

<div class="timeliness">

> A school has a service level agreement that a change to an emergency contact will occur within 2 days.

> A parent gives an updated emergency contact number on 1 June.

> It is entered into the student database on the 4 June.

> It has taken 3 days to update the system which breaches the agreed data quality rule.

</div>

## ✅ Validity {background-image="wellies.jpg" background-color="black"}

<style>
.validity blockquote {
  background-color:white; 
  color:black;
  opacity:.9;
}
</style>

<div class="validity">

> Primary and Junior School applications capture the age of a child. This age is entered into the database and the age checked to ensure it is between 4 and 11. Any values outside of this range are rejected as invalid.

</div>

## ✅ (In)accuracy {background-image="canary-wharf-night.jpg" background-color=black}

<style>
.accuracy blockquote {
  background-color:white; 
  color:black;
  opacity:.9;
}
</style>

<div class="accuracy">

> A school receives applications for its annual September intake and requires students to be aged 5 before 31 August of the intake year.

> Someone completes the Date of Birth (D.O.B) on the application in the US date format. The student is accepted in error as the date of birth given is 09/08/YYYY rather than 08/09/YYYY.

> Inaccuracy is important too, namely when storing anonymised or de-identified (but still linkable) data.
</div>

## 🤹🏽 User needs and trade-offs 

The six rules may require juggling at times:

> In 2018 the Office for National Statistics (ONS) introduced a new model for publishing Gross Domestic Product (GDP). This enabled monthly estimates of GDP to be published. However, there was a **trade-off between timeliness and accuracy** of the data.


# 🕵🏽 Assess initial KPIs {background-image="brick-lane-bins.jpg" background-color=black}

* percentages: measuring the whole data set, or a part of it - percentages can indicate the scale of a problem
* count: typically counts are used to measure incorrect data
* true or false: things that will compromise the data set if they are wrong
* ratio: the ratio of errors or problems to data without errors or problems

# ✍🏽 Document your findings 🕵🏽

* understand previous data quality problems
    * know where improvements may need to be made in the future
        * get information about where data quality may limit the use of the data

# ⚖️  Identify and prioritise potential improvements

UPRN, always UPRN! This is sometimes achieved with systems integration (e.g. postcode to address completion), and sometims with data matching and linking (Extract Transform Load pipelines)

# 📈 Define goals for data quality improvement

e.g. creation of systems and/or data integrations that implement data standards (yep UPRN&hellip; again)

# 🌱 Root cause analysis

[see @ukgovdqf]

1. Log data quality problems
    2. Understand the data journey
        3. Estimate the cost of fixing and not fixing
            4. Fix as close to source as possible
                5. Is it correct for its original purpose?
                    6. Continue to monitor your data  

## actions {visibility="hidden"}  

* ensuring that you have an organisation-wide data management strategy, and that teams understand and implement the principles and policies within it
* introducing data quality checks for data entry, such as data validation
* improving data storage and data architecture
* improving training and guidance for those involved in data entry
* introducing automation, such as validation on data entry, automated quality checks or using specialist coding tools rather than spreadsheets
* addressing team culture and behaviours, such as creating a clear culture of accountability for data
* correcting low quality data directly (but this can be risky and cause more problems if done incorrectly, so prioritise other fixes before attempting this)
* accepting the risk and revisiting the issue in the future, though this action should only be taken after weighing up the trade-offs between the cost of correcting the root cause and the value of high quality data

# 📰 Report on your data quality

Convert the quality measurements into Key Performance Indicators (KPIs) that can be reliably monitored (hence the use of numeric metric in the initial investigation).

# ♻️ <span style="background-color:white; opacity:0.8">Repeat measurements of data quality over time {background-image="battersea-power-stn-control-room.jpg"}

<span style="background-color:white; opacity:0.8">Automate the monitoring of the KPIs derived in the initial assessment: make some dashboards</span>

# Once you've got to Quality St. there's more lines to ride {background-image="data-map.png" background-size=contain background-opacity=0.3}

* <span style="background-color:green; color:white">ethics</span> [Five safes of data @fiveSafes]
* <span style="background-color:blue; color:white;">skills</span> [@ddatcapabilityframeworkDigitalData]
* <span style="background-color:magenta; color:white">access</span> [F.A.I.R data princples @fairPrinciples]


# {background-image="data-map.png" background-size=contain}


# 📚 Bibliography {background-image="jupiter-over-whitechapel-30-sep-2023.jpg" background-opacity=0.5 background-color=black}

::: {#refs}
:::