Skip to content

2.2 Anonymising Data

javieraatenas-pixel edited this page Jun 18, 2026 · 2 revisions

Note: This content is derived from the UOS Open Science OER Toolkit & Open Research Assessment (COARA-funded), Section 5.2: Anonymising Data, UK GDPR Compliance, and Open Science (ISO 27559, ISO/IEC 27001, ISO/IEC 27701, FAIR, CoARA).

Source: https://github.com/javieraatenas-pixel/testOSC/wiki/5.2-Anonymising-Data,-UK-GDPR-Compliance,-and-Open-Science-(ISO-27559,-27001,-27701---FAIR---CoARA)

Introduction

Researchers today are expected to balance three overlapping responsibilities:

  1. Legal compliance (UK GDPR / Data Protection Act 2018)
  2. Information security and privacy governance (ISO standards)
  3. Open Science expectations (FAIR principles and CoARA)

This is especially challenging when working with sensitive or potentially identifiable data, where openness must be balanced against privacy risks.

This guide translates these frameworks into a practical, research-ready approach for Early Career Researchers (ECRs) and established researchers alike.

Key frameworks in context

UK GDPR (legal framework)

UK GDPR defines how personal data must be processed lawfully, fairly, and transparently.

  • Personal data includes any information that can directly or indirectly identify a person.
  • Even pseudonymised data is still considered personal data.
  • Truly anonymised data falls outside GDPR scope.

ISO/IEC 27559 – anonymisation and re-identification risk

ISO 27559 provides a risk-based approach to anonymisation, focusing on whether individuals can still be identified in practice.

🔗 Official standard: https://www.iso.org/obp/ui/en/#iso:std:iso-iec:27559:ed-1:v1:en

It focuses on:

  • singling out risk (can one person be isolated?)
  • linkage risk (can datasets be combined?)
  • inference risk (can sensitive facts be deduced?)

ISO/IEC 27001 – information security management

ISO 27001 ensures research data is securely stored, accessed, and managed.

🔗 Official standard: https://www.iso.org/isoiec-27001-information-security.html

It covers:

  • access control
  • encryption
  • secure storage
  • audit logging
  • incident response

ISO/IEC 27701 – privacy information management

ISO 27701 extends ISO 27001 into privacy governance.

🔗 Official standard: https://www.iso.org/standard/71670.html

It defines:

  • data controller and processor responsibilities
  • data subject rights handling
  • privacy lifecycle management

Core principle: anonymisation is risk-based

Across UK GDPR and ISO 27559:

Anonymisation is not a binary state, but a context-dependent reduction of re-identification risk.

Risk depends on:

  • available external datasets
  • data uniqueness
  • technological capability (e.g. AI linkage)
  • time (risk evolves)

Data classification (UK GDPR foundation)

Types of data

Type GDPR applies? Description
Personal data Yes Identifiable individuals
Pseudonymised data Yes Reversible coding applied
Anonymised data No No reasonable identification risk

Important: Pseudonymisation ≠ anonymisation

ISO 27559 anonymisation workflow (practical steps)

Step 1: Define research context

  • Who will access the data?
  • What is the purpose of use?
  • Could re-identification be attempted?
  • What external datasets exist?

Step 2: Identify sensitive variables

Classify data into:

  • Direct identifiers (names, IDs)
  • Indirect identifiers (postcode, age, job)
  • Quasi-identifiers (combinations of variables)
  • Sensitive attributes (health, income, behaviour)

Step 3: Assess re-identification risk

ISO 27559 defines three key risks:

Singling out

Can a record identify a single individual?

Linkability

Can datasets be matched together?

Inference

Can sensitive information be deduced?

Step 4: Apply anonymisation techniques

Common methods include:

  • Generalisation (e.g. age → age band)
  • Suppression (removing high-risk fields)
  • Aggregation (group-level data)
  • Perturbation (adding noise)
  • Masking (partial redaction)

Step 5: Validate anonymisation

  • attempt internal re-identification
  • test dataset uniqueness
  • evaluate linkage risk

If risk remains high → iterate again

Step 6: Define acceptable residual risk

Risk does NOT need to be zero.

Instead:

  • justify why remaining risk is acceptable
  • align with research purpose and ethics requirements

Step 7: Document decisions

Record:

  • methods used
  • risk assessments
  • transformations applied
  • justification for anonymisation status

Step 8: Monitor over time

Risk can change due to:

  • new datasets
  • AI advances
  • data breaches elsewhere
flowchart TD

    A[Define Research Context]
    --> B[Identify Sensitive Variables]
    B --> C[Assess Re-identification Risk]
    C --> D[Apply Anonymisation Techniques]
    D --> E[Validate Anonymisation]
    E --> F[Define Acceptable Residual Risk]
    F --> G[Document Decisions]
    G --> H[Monitor Over Time]

%% EXPANDED NODES

    A --> A1[Purpose, access, external data, re-identification risk]
    B --> B1[Direct, indirect, quasi-identifiers, sensitive attributes]
    C --> C1[Singling out, linkability, inference]
    D --> D1[Generalisation, suppression, aggregation, perturbation, masking]
    E --> E1[Test re-identification, uniqueness, linkage risk]
    G --> G1[Methods, risks, transformations, justification]
    H --> H1[New data, AI advances, external breaches]

%% ITERATION LOOP
    E -->|If risk remains high| D

%% STYLING (PASTEL + BLACK FONT)

    style A fill:#FFE4E1,color:#000
    style B fill:#E6F2FF,color:#000
    style C fill:#E6FFE6,color:#000
    style D fill:#F3E6FF,color:#000
    style E fill:#E0F7FA,color:#000
    style F fill:#FDEDEC,color:#000
    style G fill:#EBF5FB,color:#000
    style H fill:#E8F8F5,color:#000

    style A1 fill:#FFF5F5,color:#000
    style B1 fill:#F5FAFF,color:#000
    style C1 fill:#F5FFF7,color:#000
    style D1 fill:#F9F5FF,color:#000
    style E1 fill:#F3FDFF,color:#000
    style G1 fill:#F4F8FB,color:#000
    style H1 fill:#F4FBF8,color:#000
Loading

ISO 27001: research data security

Key requirements:

  • role-based access control
  • encryption (at rest and in transit)
  • audit logs
  • secure infrastructure
  • incident response plans

ISO 27701: privacy governance

Key requirements:

  • define data controller vs processor
  • manage data subject rights
  • document full data lifecycle
  • ensure accountability structures

Data sharing levels

Level Description
Open Fully anonymised data
Controlled access Sensitive but shareable under conditions
Restricted Identifiable or high-risk data
Closed Cannot be shared ethically or legally

Integrated compliance model

A complete research compliance approach combines:

UK GDPR

  • lawful basis
  • data minimisation
  • participant rights

ISO 27559

  • re-identification risk assessment
  • anonymisation justification

ISO 27001

  • secure systems and access control

ISO 27701

  • privacy governance and accountability
flowchart TD

    A[Integrated Compliance Model]

    A --> B[UK GDPR]
    A --> C[ISO 27559]
    A --> D[ISO 27001]
    A --> E[ISO 27701]

%% GDPR
    B --> B1[Lawful Basis]
    B --> B2[Data Minimisation]
    B --> B3[Participant Rights]

%% ISO 27559
    C --> C1[Re-identification Risk Assessment]
    C --> C2[Anonymisation Justification]

%% ISO 27001
    D --> D1[Secure Systems]
    D --> D2[Access Control]

%% ISO 27701
    E --> E1[Privacy Governance]
    E --> E2[Accountability]

%% STYLING (pastel + black font)

    style A fill:#FFE4E1,color:#000

    style B fill:#E6F2FF,color:#000
    style C fill:#E6FFE6,color:#000
    style D fill:#F3E6FF,color:#000
    style E fill:#E0F7FA,color:#000

    style B1 fill:#F5FAFF,color:#000
    style B2 fill:#F5FAFF,color:#000
    style B3 fill:#F5FAFF,color:#000

    style C1 fill:#F5FFF7,color:#000
    style C2 fill:#F5FFF7,color:#000

    style D1 fill:#F9F5FF,color:#000
    style D2 fill:#F9F5FF,color:#000

    style E1 fill:#F3FDFF,color:#000
    style E2 fill:#F3FDFF,color:#000
Loading

Clone this wiki locally