# GAP Coreference Dataset

GAP is a gender-balanced dataset containing 8,908 coreference-labeled pairs of (ambiguous pronoun, antecedent name), sampled from Wikipedia and released by Google AI Language for the evaluation of coreference resolution in practical applications 

Github: https://github.com/google-research-datasets/gap-coreference

Kaggle Competition: https://www.kaggle.com/c/gap-coreference

## Dataset Description

The GAP dataset release comprises three .tsv files, each with eleven columns.

The files are:
 * **test** 4,000 pairs, to be used for official evaluation
 * **development** 4,000 pairs, may be used for model development
 * **validation** 908 pairs, may be used for parameter tuning

The columns contain:

Column | Header         | Description
:-----:|----------------|--------------------------------------------
1      | ID             | Unique identifer for an example (two pairs)
2      | Text           | Text containing the ambiguous pronoun and two candidate names. About a paragraph in length
3      | Pronoun        | The pronoun, text
4      | Pronoun-offset | Character offset of Pronoun in Column 2 (Text)
5      | A              | The first name, text
6      | A-offset       | Character offset of A in Column 2 (Text)
7      | A-coref        | Whether A corefers with the pronoun, TRUE or FALSE
8      | B              | The second name, text
9      | B-offset       | Character offset of B in Column 2 (Text)
10     | B-coref        | Whether B corefers with the pronoun, TRUE or FALSE
11     | URL            | The URL of the source Wikipedia page

In [28]:
import pandas as pd

gap_dev=pd.read_csv('./gap-development.tsv',delimiter='\t',encoding='utf-8')

#size of the data
print("Size of the data:",len(gap_dev))

Size of the data: 2000


In [29]:
#check out the data
gap_dev.head(10)

Unnamed: 0,ID,Text,Pronoun,Pronoun-offset,A,A-offset,A-coref,B,B-offset,B-coref,URL
0,development-1,Zoe Telford -- played the police officer girlf...,her,274,Cheryl Cassidy,191,True,Pauline,207,False,http://en.wikipedia.org/wiki/List_of_Teachers_...
1,development-2,"He grew up in Evanston, Illinois the second ol...",His,284,MacKenzie,228,True,Bernard Leach,251,False,http://en.wikipedia.org/wiki/Warren_MacKenzie
2,development-3,"He had been reelected to Congress, but resigne...",his,265,Angeloz,173,False,De la Sota,246,True,http://en.wikipedia.org/wiki/Jos%C3%A9_Manuel_...
3,development-4,The current members of Crime have also perform...,his,321,Hell,174,False,Henry Rosenthal,336,True,http://en.wikipedia.org/wiki/Crime_(band)
4,development-5,Her Santa Fe Opera debut in 2005 was as Nuria ...,She,437,Kitty Oppenheimer,219,False,Rivera,294,True,http://en.wikipedia.org/wiki/Jessica_Rivera
5,development-6,Sandra Collins is an American DJ. She got her ...,She,411,Collins,236,True,DJ,347,False,http://en.wikipedia.org/wiki/Sandra_Collins
6,development-7,Reb Chaim Yaakov's wife is the sister of Rabbi...,his,273,Reb Asher,152,False,Akiva Eiger,253,False,http://en.wikipedia.org/wiki/Asher_Arieli
7,development-8,Slant Magazine's Sal Cinquemani viewed the alb...,his,337,Greg Kot,173,False,Robert Christgau,377,True,http://en.wikipedia.org/wiki/The_Truth_About_L...
8,development-9,Her father was an Englishman ``of rank and cul...,her,246,Mary Paine,255,False,Kelsey,267,True,http://en.wikipedia.org/wiki/Mary_S._Peake
9,development-10,Shaftesbury's UK partners in the production of...,she,329,Christina Jennings,196,True,Kirstine Stewart,226,False,http://en.wikipedia.org/wiki/Murdoch_Mysteries


In [30]:
#Grab just the text
text=gap_dev['Text']
print(text[0])

Zoe Telford -- played the police officer girlfriend of Simon, Maggie. Dumped by Simon in the final episode of series 1, after he slept with Jenny, and is not seen again. Phoebe Thomas played Cheryl Cassidy, Pauline's friend and also a year 11 pupil in Simon's class. Dumped her boyfriend following Simon's advice after he wouldn't have sex with her but later realised this was due to him catching crabs off her friend Pauline.
