# Tidy Tuesday: Selected British Literary Prizes (1990-2022)
*October 28, 2025*

**About the data**
>"This dataset contains primary categories of information on individual authors comprising gender, sexuality, UK residency, ethnicity, geography and details of educational background, including institutions where the authors acquired their degrees and their fields of study. Along with other similar projects, we aim to provide information to assess the cultural, social and political factors determining literary prestige. Our goal is to contribute to greater transparency in discussions around diversity and equity in literary prize cultures."

The dataset contains background information on authors who have won, or were shortlisted for, British literary prizes. This includes the demographic and educational backgrounds for the authors. 

**The Goals**
* In which genres are women, Black, Asian and ethnically diverse writers most likely to be shortlisted and/or awarded?
* Have prizes improved their record on gender and/or ethnic representation in shortlists and awardees?
* Is there a connection between specific educational credentials and/or educational institutions and writers’ chances of being shortlisted or winning?

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
prizes = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-10-28/prizes.csv')
prizes

Unnamed: 0,prize_id,prize_alias,prize_name,prize_institution,prize_year,prize_genre,person_id,person_role,last_name,first_name,...,uk_residence,ethnicity_macro,ethnicity,highest_degree,degree_institution,degree_field_category,degree_field,viaf,book_id,book_title
0,8,Booker Prize,Booker Prize,Booker Foundation,1991,fiction,294,shortlisted,Amis,Martin,...,True,White British,English,Bachelors,University of Oxford,Language and Literature,English Literature,36913662,5,Time's Arrow
1,1,James Tait Black Prize for Fiction,James Tait Black Prize for Fiction,The University of Edinburgh,1991,fiction,33,winner,Boyd,William,...,True,White British,British,unknown,University of Oxford,Language and Literature,English Literature,111500719,36,Brazzaville Beach
2,3,Costa First Novel Award,Whitbread First Novel,Whitbread,1991,fiction,167,winner,Burn,Gordon,...,True,White British,English,none,none,none,none,51988764,42,Alma Cogan
3,8,Booker Prize,Booker Prize,Booker Foundation,1991,fiction,286,shortlisted,Doyle,Roddy,...,False,Irish,Irish,Bachelors,University College Dublin,Multiple,English and Geography,17301306,77,The Van
4,4,Costa Novel Award,Whitbread Novel,Whitbread,1991,fiction,168,winner,Gardam,Jane,...,True,White British,English,Bachelors,University of London,Language and Literature,English Literature,70213168,114,The Queen of the Tambourine
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
947,10,Women's Prize for Fiction,Women's Prize for Fiction,Women's Prize Trust,2022,fiction,192,shortlisted,Shafak,Elif,...,True,Non-UK White,Turkish British,Doctorate,Middle East Technical University,Politics and Economics,Political Science,64320935,367,The Island of Missing Trees
948,11,Gold Dagger,Gold Dagger,The Crime Writers' Association,2022,crime,410,shortlisted,Shaw,William,...,True,White British,English,unknown,unknown,unknown,unknown,,521,The Trawlerman
949,10,Women's Prize for Fiction,Women's Prize for Fiction,Women's Prize Trust,2022,fiction,182,shortlisted,Shipstead,Maggie,...,False,Non-UK White,White American,Masters,University of Iowa,Writing,Creative Writing,231972795,296,Great Circle
950,8,Booker Prize,Booker Prize,Booker Foundation,2022,fiction,176,shortlisted,Strout,Elizabeth,...,False,Non-UK White,White American,Masters,Royal College of Art,unknown,unknown,66631918,316,Oh William!


## Part 1: Python
Analysis in Python

Let's go through the columns to see what is going on.

**Prizes**

In [3]:
# Prizes being represented
print(len(prizes.prize_alias.value_counts()), 'awards:')
print(prizes.prize_alias.value_counts())

15 awards:
prize_alias
Booker Prize                               191
Women's Prize for Fiction                  162
Baillie Gifford Prize for Non-Fiction      141
Gold Dagger                                107
Ted Hughes Award for New Work in Poetry     60
James Tait Black Prize for Fiction          33
BSFA Award for Best Novel                   33
Costa Biography Award                       32
Costa First Novel Award                     31
Costa Novel Award                           31
Costa Children's Book Award                 31
Costa Poetry Award                          31
Costa Book of the Year                      31
TS Eliot Prize                              30
James Tait Black Prize for Drama             8
Name: count, dtype: int64


In [4]:
# Institutions being represented
print(len(prizes.prize_institution.value_counts()), 'institutions:')
print(prizes.prize_institution.value_counts())

10 institutions:
prize_institution
Booker Foundation                               191
Women's Prize Trust                             162
Samuel Johnson Prize for Non-Fiction Limited    141
The Crime Writers' Association                  107
Costa                                            97
Whitbread                                        90
Poetry Society                                   60
The University of Edinburgh                      41
British Science Fiction Association              33
TS Eliot Foundation                              30
Name: count, dtype: int64


**Genres**

In [16]:
# Genres being represented
print(len(prizes.prize_genre.value_counts()), 'institutions:')
print(prizes.prize_genre.value_counts())

9 institutions:
prize_genre
fiction               448
non-fiction           141
poetry                121
crime                 107
sff                    33
biography              32
children's             31
no/any/multi genre     31
drama                   8
Name: count, dtype: int64


*sff stands for "science fiction and fantasy"*

**Name and Ethnicity**\
Let's combine the first and last names for the authors to create a new column `full_name` for readability.

In [5]:
prizes['full_name'] = prizes['first_name'] + ' ' + prizes['last_name']
prizes.head(3)

Unnamed: 0,prize_id,prize_alias,prize_name,prize_institution,prize_year,prize_genre,person_id,person_role,last_name,first_name,...,ethnicity_macro,ethnicity,highest_degree,degree_institution,degree_field_category,degree_field,viaf,book_id,book_title,full_name
0,8,Booker Prize,Booker Prize,Booker Foundation,1991,fiction,294,shortlisted,Amis,Martin,...,White British,English,Bachelors,University of Oxford,Language and Literature,English Literature,36913662,5,Time's Arrow,Martin Amis
1,1,James Tait Black Prize for Fiction,James Tait Black Prize for Fiction,The University of Edinburgh,1991,fiction,33,winner,Boyd,William,...,White British,British,unknown,University of Oxford,Language and Literature,English Literature,111500719,36,Brazzaville Beach,William Boyd
2,3,Costa First Novel Award,Whitbread First Novel,Whitbread,1991,fiction,167,winner,Burn,Gordon,...,White British,English,none,none,none,none,51988764,42,Alma Cogan,Gordon Burn


In [6]:
# Any authors appear more than once on the list?
prizes.full_name.value_counts().head(10)

full_name
Ali Smith          9
Hilary Mantel      8
Sebastian Barry    7
Zadie Smith        7
Margaret Atwood    7
Sarah Waters       6
Kate Atkinson      6
Mick Herron        5
Ted Hughes         5
Seamus Heaney      5
Name: count, dtype: int64

*Ali Smith* leads with 9 appearances.\
What about the authors who have the most wins? The most shortlists?

In [42]:
winners_only = prizes[prizes['person_role'] == 'winner'] # Mask of only award winners
winners_only.full_name.value_counts().head(10)

full_name
Seamus Heaney         5
Ted Hughes            5
Sebastian Barry       5
Christopher Priest    4
Kate Atkinson         4
Don Paterson          4
Hilary Mantel         4
Alice Oswald          3
Ian McDonald          3
Adrian Tchaikovsky    3
Name: count, dtype: int64

In [43]:
shortlisted_only = prizes[prizes['person_role'] == 'shortlisted'] # Mask of only award winners
shortlisted_only.full_name.value_counts().head(10)

full_name
Sarah Waters       6
Ali Smith          6
Margaret Atwood    5
Hilary Mantel      4
Zadie Smith        4
Anne Tyler         4
Abir Mukherjee     4
Mick Herron        4
Colm Tóibín        3
Carol Shields      3
Name: count, dtype: int64

*Ali Smith* shares first in number of shortlists at 6, which means she has only won an award 3 times. Interesting.

In [7]:
print(prizes.ethnicity_macro.value_counts())
print()
print(prizes.ethnicity.value_counts())

ethnicity_macro
White British         499
Non-UK White          199
Asian                  67
Irish                  53
Jewish                 43
Black British          29
African                23
Caribbean              20
Non-White American     19
Name: count, dtype: int64

ethnicity
English                            217
British                            161
White American                     108
Scottish                            66
Irish                               41
                                  ... 
Irish British                        1
Dutch Jewish American                1
Irish British Australian             1
British Irish Australian             1
Chippewa Ojibwe German American      1
Name: count, Length: 149, dtype: int64


**Education**

In [9]:
print(prizes.highest_degree.value_counts())
print()
print(prizes.degree_institution.value_counts())
print()
print(prizes.degree_field_category.value_counts())
print()
print(prizes.degree_field.value_counts())

highest_degree
Bachelors                   296
unknown                     227
Masters                     208
Doctorate                   157
none                         51
Postgraduate                  6
Juris Doctor                  4
Certificate of Education      1
MD                            1
Diploma                       1
Name: count, dtype: int64

degree_institution
unknown                                  181
University of Oxford                     116
University of Cambridge                   73
none                                      51
University of London                      29
                                        ... 
Southampton University                     1
University of California, Los Angeles      1
Istanbul University                        1
Leeds College of Art and Design            1
Loyola University New Orleans              1
Name: count, Length: 187, dtype: int64

degree_field_category
Language and Literature         272
unknown                   

The first thing I notice is that there are a lot of unknown values in these columns. For now, I will continue with the analysis and attempt to answer the three questions above.

From what I can see from the available data, most authors have earned at least a Bachelors. Popular fields of study are related to literature and writing.

### Questions
**1) In which genres are women, Black, Asian and ethnically diverse writers most likely to be shortlisted and/or awarded?**

There is no gender variable in the data, so doing the analysis on such is not possible with the current data. A work around is to search every author's name online and determine their gender based on pronouns. I will do so at the end if given time.

For now, I will focus on the ethinicity part of the question. First, I will focus on Black authors.

In [17]:
prizes.ethnicity_macro.value_counts()

ethnicity_macro
White British         499
Non-UK White          199
Asian                  67
Irish                  53
Jewish                 43
Black British          29
African                23
Caribbean              20
Non-White American     19
Name: count, dtype: int64

Above are the possible values in the `ethnicity_macro` column. For the "Black" category, I will focus on the labels of "Black British", "African", "Caribbean", and "Non-White American".

In [55]:
black = pd.concat([ prizes[prizes['ethnicity_macro'] == 'Black British'], prizes[prizes['ethnicity_macro'] == 'African'], prizes[prizes['ethnicity_macro'] == 'Caribbean'], prizes[prizes['ethnicity_macro'] == 'Non-White American'] ])
black.head(3)

Unnamed: 0,prize_id,prize_alias,prize_name,prize_institution,prize_year,prize_genre,person_id,person_role,last_name,first_name,...,ethnicity_macro,ethnicity,highest_degree,degree_institution,degree_field_category,degree_field,viaf,book_id,book_title,full_name
107,3,Costa First Novel Award,Whitbread First Novel Award,Whitbread,1997,fiction,144,winner,Melville,Pauline,...,Black British,British Guyanese,unknown,unknown,unknown,unknown,39508096,216,The Ventriloquist's Tale,Pauline Melville
136,10,Women's Prize for Fiction,Women's Prize for Fiction,Women's Prize Trust,1998,fiction,144,shortlisted,Melville,Pauline,...,Black British,British Guyanese,unknown,unknown,unknown,unknown,39508096,216,The Ventriloquist's Tale,Pauline Melville
195,3,Costa First Novel Award,Whitbread First Novel Award,Whitbread,2000,fiction,22,winner,Smith,Zadie,...,Black British,Jamaican British,unknown,unknown,unknown,unknown,14951645,306,White Teeth,Zadie Smith


Now I check the individual ethnicities to remove any non-Black authors still on the list. For example, an Asian American author who passed via "Non-White American" or a north African author who passed via "African".

In [47]:
black.ethnicity.value_counts()

ethnicity
Jamaican British                   15
Black American                     14
British Trinidadian                 5
Nigerian America                    4
British Nigerian                    4
Ghanaian Canadian                   3
Nigerian                            3
Zimbabwean                          3
Jamaican-Chinese British            2
Trinidadian                         2
Jamaican                            2
Kittitian British                   2
British Libyan                      2
British Guyanese                    2
Black British Jamaican              2
Nigerian British                    2
Scottish Sierra Leonan              2
Tanzanian British                   1
Ghanaian American                   1
Ethiopian American                  1
Dominican American                  1
Mexican American                    1
Kenyan British                      1
Nigerian Scottish                   1
Mixed Caribbean                     1
Jamaican Caymanian                  1
Ja

In [None]:
nonblack = ['British Libyan', 'Mexican American', 'Egyptian British', 'South African', 'Chippewa Ojibwe German American']