# Decolonise the Data: Rethinking Cultural Metadata With AI

## Project Overview

This notebook presents a proposed idea for an interactive application using a large language model (LLM) to **identify and reframe colonial or exclusionary language in cultural heritage metadata.** Its main focus is on museum professionals, researchers, and students working towards more inclusive, ethical digital collections. To sum up, using an LLM, this application analyses uploaded metadata and suggests alternative phrasings, contextual notes, and reflective questions to help institutions **decolonise** their catalogues.

Using an LLM trained on diverse, inclusive language, this tool will allow users to upload or input opject descriptions, archival metadata, or catalogue entries to recieve:

- Critical analysis of problematic language or omissions
- Suggestions for more inclusive or ethically sensitive alternatives
- Educational context on why the original use of the language may be harmful

---

## What is a Large Language Model (LLM)

A **Large Language Model** is a type of AI trained on text data to generate and understand human language. These models can:

- Generate or summarise text
- Translate between languages
- Engage in dialogue
- Critically analyse language patterns

This application will leverage an LLM's **language understanding** and **text generation** capabilities to critically examine cultural records.

--- 

## Target Audience

- Cultural heritage professionals (museum staff, archivists, curators) 
- Students and researchers in digital humanities and history
- Educators building exclusive curriculum
- Archivists and communities advocating for decolonisation

---

## Interactive Design

### Actors
- **User (Human)** Curator, student, researcher or educator. Uploads or inputs text metadata/ descriptions.
- **LLM (AI)** Analyses the text using prompt engineered critique tools and inclusive language suggestions.
- **Optional integration** with archival platforms or annotation platforms

### Step by Step Workflow

1. **Input**
   - User uploads a metadata entry (text file or paste in).
   - Example:
     `"Primitive tribal mask, Africa, acquired by British expedition in 1892"`

2. **Analysis**
   - Highlights:
     - Problematic language: "primitive", "tribal"
     - Lack of provenance detail
     - Colonial framing ("acquired by British expedition")
    
3. **AI Suggestions**
   - Rewriting:
     `"Ceremonial mask from the Baule people (Côte d'Ivoire), acquired during colonial ocupation. Further provenance under investigation."`

   - Additinal prompts:
     - "Have you consulted source community records?"
     - "Consider replacing 'tribal' with the communities name."

4. **User Reflection Prompts**
   - "What perspectives are absent?"
   - "What assumptions are being made about cultural value or ownership?"
   - "Could this objects display reinforce harmful cultural narratives?"

---

## Proposed Value

| Category   | How this Application Helps                                  |
|------------|-------------------------------------------------------------|
| **Learning** | Teaches users to critically engage with metadata and historical narratives.  |
| **Engagement** | Encourages dialogue between users and communities, AI and metadata.  |
| **Inclusion** |Brings marginalised voices into view by exposing bias and omissions.  |

---

## Example Use

**Original Metadata**
> "Tribal figurine, South Pacific. Acquired in 1907."

**AI Enhanced Version**
> "Wooden carving representing an ancestor figure, created by the Kanak people of New Caledonia. Acquired by French colonial forces in 1907. Cultiral significance and acquisition history under further review."

**Suggested Notes:**
- Replace "tribal" with named community if known.
- Indicate provenance status.
- Consider restitution implications.

---

## Critical Reflection

### Potential Benefits
- Encourages **decolonial practices** in institutions
- Helps users identify **language bias** and **historical omissions**
- Scales **inclusive cataloguing** practices using AI assistance 

### Potential Risks
- LLMs may reflect **biases in training data**
- Replacing human expertise or **community consultation** with automation is problematic
- Risk of **institutional "box ticking"** (token use) rather than genuine change

---

## Future Possibilities

- **Community Mode:** Involve spurce communities in AI fine tuning or feedback.
- **Annotation Layer:** Integrate with platforms like recogito for collaborative critique.
- **Gamified Training:** Educational interface for students to "spot the bias."

--- 

## Inspiration and References 

- [National Library of Scotland: Decolonising the Data Foundry](https://data.nls.uk/projects/decolonising-the-data-foundry/)
- Bender, E. et al. (2021). *On the Dangers of Stochastic Parrots.*
- [Whose Heritage? Campaign](https://www.whoseheritage.org)
- Brown, C. (2023). *The Legal and Ethical Challenges of AI Art.*

---

## Conclusion

This LLM application empowers users to reflect critically on the colonial legacies embedded in cultural metadata. While it is not a substitute for human expertise or community consultation, it offers a starting point for inclusive practice, education, and awareness within digital heritage spaces.

**Decolonising the data is not a destination, but a dialogue - and LLMs can help facilitate it.**