-
Notifications
You must be signed in to change notification settings - Fork 0
2.2 Anonymising Data
Note: This content is derived from the UOS Open Science OER Toolkit & Open Research Assessment (COARA-funded), Section 5.2: Anonymising Data, UK GDPR Compliance, and Open Science (ISO 27559, ISO/IEC 27001, ISO/IEC 27701, FAIR, CoARA).
Researchers today are expected to balance three overlapping responsibilities:
- Legal compliance (UK GDPR / Data Protection Act 2018)
- Information security and privacy governance (ISO standards)
- Open Science expectations (FAIR principles and CoARA)
This is especially challenging when working with sensitive or potentially identifiable data, where openness must be balanced against privacy risks.
This guide translates these frameworks into a practical, research-ready approach for Early Career Researchers (ECRs) and established researchers alike.
UK GDPR defines how personal data must be processed lawfully, fairly, and transparently.
- Personal data includes any information that can directly or indirectly identify a person.
- Even pseudonymised data is still considered personal data.
- Truly anonymised data falls outside GDPR scope.
ISO 27559 provides a risk-based approach to anonymisation, focusing on whether individuals can still be identified in practice.
🔗 Official standard: https://www.iso.org/obp/ui/en/#iso:std:iso-iec:27559:ed-1:v1:en
It focuses on:
- singling out risk (can one person be isolated?)
- linkage risk (can datasets be combined?)
- inference risk (can sensitive facts be deduced?)
ISO 27001 ensures research data is securely stored, accessed, and managed.
🔗 Official standard: https://www.iso.org/isoiec-27001-information-security.html
It covers:
- access control
- encryption
- secure storage
- audit logging
- incident response
ISO 27701 extends ISO 27001 into privacy governance.
🔗 Official standard: https://www.iso.org/standard/71670.html
It defines:
- data controller and processor responsibilities
- data subject rights handling
- privacy lifecycle management
Across UK GDPR and ISO 27559:
Anonymisation is not a binary state, but a context-dependent reduction of re-identification risk.
Risk depends on:
- available external datasets
- data uniqueness
- technological capability (e.g. AI linkage)
- time (risk evolves)
| Type | GDPR applies? | Description |
|---|---|---|
| Personal data | Yes | Identifiable individuals |
| Pseudonymised data | Yes | Reversible coding applied |
| Anonymised data | No | No reasonable identification risk |
Important: Pseudonymisation ≠ anonymisation
- Who will access the data?
- What is the purpose of use?
- Could re-identification be attempted?
- What external datasets exist?
Classify data into:
- Direct identifiers (names, IDs)
- Indirect identifiers (postcode, age, job)
- Quasi-identifiers (combinations of variables)
- Sensitive attributes (health, income, behaviour)
ISO 27559 defines three key risks:
Can a record identify a single individual?
Can datasets be matched together?
Can sensitive information be deduced?
Common methods include:
- Generalisation (e.g. age → age band)
- Suppression (removing high-risk fields)
- Aggregation (group-level data)
- Perturbation (adding noise)
- Masking (partial redaction)
- attempt internal re-identification
- test dataset uniqueness
- evaluate linkage risk
If risk remains high → iterate again
Risk does NOT need to be zero.
Instead:
- justify why remaining risk is acceptable
- align with research purpose and ethics requirements
Record:
- methods used
- risk assessments
- transformations applied
- justification for anonymisation status
Risk can change due to:
- new datasets
- AI advances
- data breaches elsewhere
flowchart TD
A[Define Research Context]
--> B[Identify Sensitive Variables]
B --> C[Assess Re-identification Risk]
C --> D[Apply Anonymisation Techniques]
D --> E[Validate Anonymisation]
E --> F[Define Acceptable Residual Risk]
F --> G[Document Decisions]
G --> H[Monitor Over Time]
%% EXPANDED NODES
A --> A1[Purpose, access, external data, re-identification risk]
B --> B1[Direct, indirect, quasi-identifiers, sensitive attributes]
C --> C1[Singling out, linkability, inference]
D --> D1[Generalisation, suppression, aggregation, perturbation, masking]
E --> E1[Test re-identification, uniqueness, linkage risk]
G --> G1[Methods, risks, transformations, justification]
H --> H1[New data, AI advances, external breaches]
%% ITERATION LOOP
E -->|If risk remains high| D
%% STYLING (PASTEL + BLACK FONT)
style A fill:#FFE4E1,color:#000
style B fill:#E6F2FF,color:#000
style C fill:#E6FFE6,color:#000
style D fill:#F3E6FF,color:#000
style E fill:#E0F7FA,color:#000
style F fill:#FDEDEC,color:#000
style G fill:#EBF5FB,color:#000
style H fill:#E8F8F5,color:#000
style A1 fill:#FFF5F5,color:#000
style B1 fill:#F5FAFF,color:#000
style C1 fill:#F5FFF7,color:#000
style D1 fill:#F9F5FF,color:#000
style E1 fill:#F3FDFF,color:#000
style G1 fill:#F4F8FB,color:#000
style H1 fill:#F4FBF8,color:#000
Key requirements:
- role-based access control
- encryption (at rest and in transit)
- audit logs
- secure infrastructure
- incident response plans
Key requirements:
- define data controller vs processor
- manage data subject rights
- document full data lifecycle
- ensure accountability structures
| Level | Description |
|---|---|
| Open | Fully anonymised data |
| Controlled access | Sensitive but shareable under conditions |
| Restricted | Identifiable or high-risk data |
| Closed | Cannot be shared ethically or legally |
A complete research compliance approach combines:
- lawful basis
- data minimisation
- participant rights
- re-identification risk assessment
- anonymisation justification
- secure systems and access control
- privacy governance and accountability
flowchart TD
A[Integrated Compliance Model]
A --> B[UK GDPR]
A --> C[ISO 27559]
A --> D[ISO 27001]
A --> E[ISO 27701]
%% GDPR
B --> B1[Lawful Basis]
B --> B2[Data Minimisation]
B --> B3[Participant Rights]
%% ISO 27559
C --> C1[Re-identification Risk Assessment]
C --> C2[Anonymisation Justification]
%% ISO 27001
D --> D1[Secure Systems]
D --> D2[Access Control]
%% ISO 27701
E --> E1[Privacy Governance]
E --> E2[Accountability]
%% STYLING (pastel + black font)
style A fill:#FFE4E1,color:#000
style B fill:#E6F2FF,color:#000
style C fill:#E6FFE6,color:#000
style D fill:#F3E6FF,color:#000
style E fill:#E0F7FA,color:#000
style B1 fill:#F5FAFF,color:#000
style B2 fill:#F5FAFF,color:#000
style B3 fill:#F5FAFF,color:#000
style C1 fill:#F5FFF7,color:#000
style C2 fill:#F5FFF7,color:#000
style D1 fill:#F9F5FF,color:#000
style D2 fill:#F9F5FF,color:#000
style E1 fill:#F3FDFF,color:#000
style E2 fill:#F3FDFF,color:#000