-
Notifications
You must be signed in to change notification settings - Fork 0
2.1 Principles of data ethics
Note: Content in this section is adapted from:
Atenas, J., Havemann, L., & Timmermann, C. (2023). Reframing data ethics in research methods education: A pathway to critical data literacy. International Journal of Educational Technology in Higher Education, 20(1), 11. https://doi.org/10.1186/s41239-023-00380-y
Atenas, J. (2021). The datafied present and future. In Understanding data: Praxis and politics. HDI - Data, Praxis and Politics. https://doi.org/10.5281/zenodo.4698609
There is no universal philosophical agreement on a fixed set of principles that should govern data ethics. Priorities shift depending on technological developments, regulatory frameworks, and the accountability of organisations that collect, analyse, and exploit data. Nonetheless, across interdisciplinary literature, a number of recurring themes and ethical commitments emerge. These shared elements form the basis of a pragmatic consensus that can guide educators, researchers, and practitioners in protecting individuals and communities when working with data.
flowchart TD
A[Ethical Data Principles]
A --> B[Respect Autonomy]
A --> C[Privacy]
A --> D[Sovereignty]
%% RISK & PROTECTION
B --> E[Do No Harm]
C --> E
D --> E
%% FAIRNESS & JUSTICE
E --> F[Fairness]
F --> G[Equality]
F --> H[Reduce Bias]
%% STYLING (pastel + black font)
style A fill:#FFE4E1,color:#000
style B fill:#E6F2FF,color:#000
style C fill:#E6F2FF,color:#000
style D fill:#E6F2FF,color:#000
style E fill:#E6FFE6,color:#000
style F fill:#F3E6FF,color:#000
style G fill:#F3E6FF,color:#000
style H fill:#F3E6FF,color:#000
Ethical principles offer both clarity and constraint. On one hand, they make it easier to identify when standard procedures have not been followed, enabling critique, accountability, and the demand for reparations. On the other hand, principles must remain concise and actionable; an overly complex framework risks becoming impractical. In situations where principles alone are insufficient—or where strict adherence leads to ethically problematic outcomes—interpretation guided by ethical intent becomes necessary. This underscores the importance of ethical education, equipping individuals with the capacity to reason critically and act responsibly beyond procedural compliance.
A further challenge lies in the power asymmetries embedded in data systems. Dominant institutions often shape what counts as “ethical,” raising concerns about whose values are prioritised and whose voices are marginalised. As data infrastructures increasingly operate as “smart platforms” that enable sorting, classification, and personalisation, they reinforce social, economic, and political power structures. In this context, both the presence and absence of data become indicators of power. Therefore, educational approaches must enable learners to critically interrogate data practices, recognise inequities, and challenge unjust systems through critical data literacy and interdisciplinary inquiry.
To address these complexities, research-informed pedagogies emphasise case-based learning, collaboration, and co-creation, encouraging learners to engage with real-world scenarios. Such approaches foreground three interrelated concerns: social privilege in data practices, users’ capabilities to engage critically with data, and the norms embedded in data systems. Building on this foundation, our framework translates ethical considerations into teachable principles that can be operationalised through educational activities and research-based learning.
Finally, we recognise that conducting research with data about people is a privilege rather than a right. Ethical data practices must align with established research ethics principles—respect for persons, beneficence, and justice—ensuring that risks and benefits are distributed fairly and that vulnerable groups are protected. The framework presented below is designed to support educators and learners in embedding these commitments into practice, fostering responsible, equitable, and socially just engagement with data.
| Concept | Definition | In literature |
|---|---|---|
| Fairness | This principle asks to treat like cases alike and recognises that special arrangements may be required to avoid undeserved disadvantage. Researchers must assess whether those involved or affected are exposed to harm, risk, unjust treatment, or derogatory profiling. | De Creme & Van Dijk (2003); Jo & Gebru (2020); Stoyanovich, Howe & Jagadish (2018); Hoffmann et al. (2018); Richterich (2018); Ienca et al. (2018); Hand (2018); Bertino et al. (2019); Jobin et al. (2019); Johnson (2014) |
| Equality | Rules should apply to all unless there is a publicly acceptable reason for exemption. This refers to the legal concept that all individuals should be treated equally regardless of personal characteristics. | Tusinski Berg (2018); Bogroff & Guegan (2019); Bezuidenhout et al. (2020); Kazim & Koshiyama (2019); Puaschunder (2019); Corple & Linabary (2020); Johnson (2014) |
| Do No Harm | Also referred to as non-maleficence. It involves preventing direct or indirect harm, including risks arising from data combination and re-identification. | Vinck et al. (2019); Raymond (2017); Kitto & Knight (2019); Loukides et al. (2018); Berman & Albright (2017); Taylor et al. (2016) |
| Respect Autonomy | Refers to enabling individuals to make informed decisions about how their personal data is used. | Al-Nuaimi (2020); Buckingham & Crick (2016); Powell (2018); Wheeler (2018); Sloane (2019); Kumar et al. (2020); Véliz (2019) |
| Sovereignty | Data subjects should be able to decide what data to share, when, and with whom. Refusing to share data should not restrict access to participation in society. | Kukutai & Taylor (2016); Walter & Suina (2019); Kukutai et al. (2020); Snipp (2016); Lovett et al. (2019); Ai-min & Jia (2015); Hummel et al. (2018) |
| Reduce Bias | Epistemic structures can produce unfair advantages or disadvantages. This principle calls for actively avoiding prejudiced representations and biased decision-making. | Tam & Kim (2018); Richterich (2018); Henderson (2019); Herschel & Miori (2017); Ienca et al. (2018); Mittelstadt et al. (2016); Buenadicha et al. (2019) |
| Privacy | Recognises that certain information should remain outside the public domain. Individuals have a right to withhold personal data unless there is a justified reason for disclosure. | Richards & King (2014); Yao-Huai (2005); Pollach (2005); Schwartz (2011); Herschel & Miori (2017); Zimmer (2010); Lundberg et al. (2019); Stahl & Wright (2018); Véliz (2020) |
Adapted from Atenas, Timmermann, and Havemann (2020).
To navigate the turbulent waters of data and algorithms, research activities must foster reflection on how data are constructed and operationalised across societies, and provide opportunities to learn from the analysis of data and from discussing the implications of data projects from a range of sources and perspectives, to understand how people and data are portrayed, the historical impact of bias in data, how prejudices and also, cultural misconceptions are having implications that are affecting the lives of people.
flowchart TD
%% Data cycle
A["Human Activity<br/>Personal, Social, Professional"] --> B[Data Collection]
B --> C[Processing & Analysis]
C --> D[AI & Predictive Systems]
D --> E[Inferences & Decisions]
%% Feedback loop
E --> F[Behavioural Influence]
F --> G[New Data Generated]
G --> B
%% Uses of data
D --> H1["Predictive Analytics<br/>Consumption, Health, Insurance"]
D --> H2["Governance & Policy<br/>Detectors & Effectors"]
D --> H3["Socioeconomic Prediction<br/>Education, Insurance, Policing"]
%% Ethical risks layer
E --> I[Ethical Risks & Concerns]
I --> I1[Bias & Cultural Misrepresentation]
I --> I2[Opacity & Lack of Accountability]
I --> I3[Discrimination & Inequality]
I --> I4[Surveillance & Behaviour Control]
I --> I5[Political Manipulation]
I --> I6[Privacy Violations]
%% Inequality dimensions
I3 --> J1[Racism]
I3 --> J2[Gender Inequality]
I3 --> J3[Socioeconomic Disadvantage]
I3 --> J4[Intersectional Impacts]
%% Structural context
I --> K[Power Asymmetries]
K --> K1[Control of Data Infrastructures]
K --> K2[Limited Agency of Individuals]
%% Digital ecosystems
C --> L["Digital Ecosystems<br/>Socio-technical Systems"]
L --> K
%% Teaching & literacy response
M[Academic Practice] --> N[Critical Data & Ethics Literacies]
N --> O1[Understand Data Construction]
N --> O2[Analyse Bias & Inequality]
N --> O3[Interpret Social Impacts]
N --> O4[Engage with AI & Algorithms]
%% Governance & ethics frameworks
N --> P[Ethical Frameworks & Principles]
P --> P1[Fairness & Justice]
P --> P2[Transparency & Accountability]
P --> P3[Privacy & Protection]
P --> P4[Human Rights Alignment]
%% Transformative outcomes
N --> Q[Informed Participation]
Q --> R[Policy Engagement & Governance]
Q --> S[Social Justice & Inclusion]
%% Styling (pastel + black text)
classDef core fill:#e6f7ff,stroke:#444,color:#000;
classDef use fill:#fff5e6,stroke:#444,color:#000;
classDef risk fill:#ffe6e6,stroke:#444,color:#000;
classDef inequality fill:#f3e6ff,stroke:#444,color:#000;
classDef teaching fill:#e6ffe6,stroke:#444,color:#000;
class A,B,C,D,E,F,G core;
class H1,H2,H3 use;
class I,I1,I2,I3,I4,I5,I6 risk;
class J1,J2,J3,J4,K,K1,K2,L inequality;
class M,N,O1,O2,O3,O4,P,P1,P2,P3,P4,Q,R,S teaching;
Some of the current uses of data which require to have careful consideration around ethics are for example:
-
The role technologies play in collecting data from personal, professional and social activities, permeating the uses of any platform or device, including phones and credit cards, with the intention to predict almost every behaviour, which is called predictive analytics to identify the likelihood of future outcomes based on historical data, such as what will you be shopping or what will you watch next on streaming platforms, bit also, how likely you are to survive a heart attack in order to get a life insurance cover.
-
The adaptive nature of data to play a key role in politics, as Hood and Margetts (2007) argue that governments operate through two sets of agents: detectors and effectors. Detectors gather information (data) from individuals and society, and effectors seek to influence them. So in this case, we can see how data has been used during the covid pandemic to develop public policy and also how data is used to forecast the economy. Also, we can see how data is used to influence voters targeting socioeconomic groups on political campaigns.
-
The interwovenness of data infrastructures that facilitate attempts to predict socioeconomic behaviours, which means, the collection of socioeconomic data, which includes race, gender, neighbourhood, aiming to for example predict how likely certain students are to fail or succeed depending on their socioeconomic background, or how much do you have to pay for your car insurance depending of where you live, but worse, it used in police work, to predict for example who is getting profiled by the police, foreseen as a criminal and most likely getting arrested. Finally, it is also useful to discuss how the lack of regulatory and ethical frameworks to prevent misuses of data, are affecting us every day, by for example discriminating women on data-driven job recruitment, or having obvious racist uses or misuses of data and clear Gender inequality access to health. It is important that data-led research activities are designed to address inequalities, to improve quality of life, to explore issues that may be harming a community, and also to improve data governance, as it is key that people acquire the skills to participate in developing policy frameworks that go beyond data protection, and provide a fair, harmless, unbiased and equal data landscape, regulating the uses the public and private sectors can do with data.
The field is rapidly evolving, shaped by technological developments, political and industry scandals, and growing collaboration between academia and activism. This has led to diverse ethical approaches and increased attention to emerging risks and injustices.
-
Critical data and ethics literacies : There is a broad consensus that learners need critical data and ethics literacies to understand and engage with data-driven phenomena such as artificial intelligence, algorithmic decision-making, digital poverty, surveillance capitalism, and platform governance. These literacies enable individuals to assess and respond to the societal implications of data practices. (Al-Nuaimi, 2020; Buckingham & Crick, 2016; Kumar et al., 2020; Powell, 2018; Sloane, 2019; Wheeler, 2018)
-
Socioeconomic discrimination: Algorithmic systems can reproduce or intensify inequality by disproportionately impacting low-income individuals and communities, often described as the automation of poverty or inequality. These systems influence access to welfare, housing, and other resources, reinforcing systemic disadvantage. (Bhaumik et al., 2006; Davies, 2020; Eubanks, 2018; Kleinberg et al., 2018; Sandvig et al., 2014; Atenas & Havemann, 2019; Goldkind et al., 2021; Lo Piano, 2020; UNICEF, 2019, 2020)
-
Racism:: Algorithmic opacity (‘black box’ systems) can enable discriminatory outcomes, including bias in lending, migration processes, and predictive policing. These systems can perpetuate racial profiling and disproportionately harm marginalised groups. (Alaieri & Vellino, 2016; Bartlett et al., 2019; Brantingham, 2017; Chander, 2017; Hepworth & Church, 2018; Khalifa et al., 2014; Kuzey et al., 2019; Roth, 2010; UNESCO, 2019)
-
Sex, gender and sexuality: Data-driven systems often disadvantage women and gender and sexual minorities across domains such as healthcare and employment. There is also evidence of bias within research systems themselves, including the marginalisation of studies addressing inequity. (Asplund et al., 2020; Beaman et al., 2009; Cirillo et al., 2020; Kleinberg et al., 2018; Lambrecht & Tucker, 2019; Ruberg & Ruelos, 2020; Zou & Schiebinger, 2018; Cislak et al., 2018; Orgeira-Crespo et al., 2021)
-
Surveillance The expansion of surveillance capitalism and state monitoring reflects the extensive collection and commodification of personal data. Individuals are increasingly tracked across digital and physical environments for behavioural analysis and control.
(Zuboff, 2015; Andrejevic & Selwyn, 2020; Azoulay, 2019; Feldstein, 2019; Introna & Wood, 2004; Newlands, 2021) -
Political manipulation Data and AI are used to influence political behaviour through targeted messaging and algorithmic amplification, contributing to polarisation, misinformation, and threats to democratic processes. (Badawy et al., 2019; Bolsover & Howard, 2019; Crain & Nadler, 2019; Hood & Margetts, 2007; Véliz, 2020; Woolley & Howard, 2016)
-
Privacy: Privacy remains a central concern in the digital age, particularly due to the ease with which data can be accessed and shared. There is increasing emphasis on minimising data collection, establishing contextual integrity, and enhancing user control over personal information.
(Rabotnikof, 2005; Gstrein & Beaulieu, 2022; Véliz, 2020; Nissenbaum, 2004; Zimmer, 2018; McDonald & Forte, 2020) -
Data intersectionalities: Intersectionality highlights how multiple dimensions of identity (e.g. race, gender, class) interact within data systems, often compounding disadvantage. Data-driven predictions can reinforce structural inequalities, particularly when systems are poorly understood or unregulated.
(Crenshaw, 1989; D’Ignazio & Klein, 2020; McDonald & Pan, 2020) -
Digital ecosystems: Data practices must be analysed within broader socio-technical systems, recognising how relationships between actors, technologies, and contexts shape the use and impact of data. This perspective reveals how certain practices become dominant.
(Stahl, 2021) -
Levelling the field: Significant asymmetries exist between those who control data infrastructures and those whose data are exploited. Ethical data practices must therefore align with human rights and data protection principles to avoid reinforcing harm. (Belbis & Fumega, 2019; Zwitter, 2014; Azoulay, 2019; Bogroff & Guegan, 2019; Kleinberg et al., 2018; Lo Piano, 2020; Sandvig et al., 2014; Zuboff, 2015)
-
Developing guiding principles: Ethical frameworks should inform research and teaching practices, encouraging fairness, reduced bias, and awareness of data protection. Emphasising empathy, social justice, and social good is essential in fostering responsible engagement with data. (Chang & Gray, 2013; Eisen & Parker, 2004; Stockley & Balkwill, 2013; Strohmetz & Skleder, 1992)
-
Recognising the diversity of values: Data ethics must account for multiple, sometimes conflicting values such as privacy, autonomy, non-discrimination, and safety. Ethical frameworks must balance these values within the context of a pluralistic, datafied society. (Friedman et al., 2008)