## Inspect clean HUMAN data

In [2]:
import pathlib
import pandas as pd 

In [3]:
path = pathlib.Path.cwd()
data_root = path.parents[1] / "datasets" / "human_datasets"

In [4]:
from inspect_data import load_clean_data, load_raw_data, print_n_rows, print_rows

### DailyDialog

In [5]:
daily_dialog = load_clean_data(data_root, "dailydialog")
raw_daily_dialog = load_raw_data(data_root, "dailydialog")

In [6]:
print_n_rows(daily_dialog, 4, "source") # nb "annotations" col has the number of turns in the convo and they match!!
#print_n_rows(daily_dialog, 4, "annotations")

[INFO]: Printing 'source' column
_____
A: okay. this trail looks the best. it's a little steep. but i'm sure it will be alright. 
B: well. you're the tour guide, i'll follow you. 
A: ... what a stink. this place stinks like rotten eggs. 
B: that's sulphur you can smell. the whole of taiwan is a volcanic region. that's why taiwan has so many hot springs. it's volcanic activity.the sulfur smoke that you can smell is coming from those fumaroles over there.
_____
A: today is your birthday. first of all, happy birthday to you!
_____
A: in my wedding ceremony, where do my parents sit in the church? 
B: the bride's parents ' seating arrangement is on the left side of the aisle and the groom's parents is on the right side. 
A: do friends of the bride always sit on one side of the church and friends of the groom on the other?
_____
A: hey, what's the matter? 
B: it doesn't matter. i just feel a little dizzy. 
A: are you sure that this has nothing to do with sleep?


In [7]:
print_n_rows(daily_dialog, 4, "human_completions")

[INFO]: Printing 'human_completions' column
_____
A: suddenly the "great outdoors" isn't so appealing. let's hike a little faster ...
_____
B: thank you for coming. we will have dinner outside. eat up!
_____
B: they usually do.
_____
B: i don't know.


In [8]:
n_row = 50
print(daily_dialog["source"][n_row])
print("\n")
print(daily_dialog["human_completions"][n_row])
print(daily_dialog["annotations"][n_row])

A: i have a terrible toothache. 
B: which tooth is it? 
A: (pointing) this one here. 
B: ah, yes. there's big cavity. 
A: can you fill it? 
B: i'm afraid not. the tooth is too far gone. it'll have to be taken out. 
A: then i might as well have it out now.


B: you'd better wait. the gums are swollen. take the medicine i prescribe and come back in three days.
{'n-turns': 7, 'source-emo': [5, 0, 0, 0, 0, 0, 0], 'comp-emo': 0, 'source-act': [1, 2, 1, 1, 2, 3, 3], 'comp-act': 3}


#### Compare with RAW

In [9]:
example = 2
print("EXAMPLE \n ____RAW____")
print(f"SOURCE: \n{raw_daily_dialog['source'][example]}")
print("\n")
print(f"COMPLETION: \n{raw_daily_dialog['human_completions'][example]}")
print("\n")
print("____CLEAN____")
print(f"SOURCE: {daily_dialog['source'][example]}")
print("\n")
print(f"COMPLETION: \n{daily_dialog['human_completions'][example]}")

EXAMPLE 
 ____RAW____
SOURCE: 
in my wedding ceremony, where do my parents sit in the church? [EOT] the bride's parents ' seating arrangement is on the left side of the aisle and the groom's parents is on the right side. [EOT] do friends of the bride always sit on one side of the church and friends of the groom on the other?


COMPLETION: 
they usually do.


____CLEAN____
SOURCE: A: in my wedding ceremony, where do my parents sit in the church? 
B: the bride's parents ' seating arrangement is on the left side of the aisle and the groom's parents is on the right side. 
A: do friends of the bride always sit on one side of the church and friends of the groom on the other?


COMPLETION: 
B: they usually do.


## Stories

In [10]:
stories = load_clean_data(data_root, "stories")
raw_stories = load_raw_data(data_root, "stories")

In [11]:
print_n_rows(stories, 10, "source")

[INFO]: Printing 'source' column
_____
your free trial of life has ended
_____
- you either die a hero. . .''
_____
just like old times, huh?''
_____
it all happened so suddenly''
_____
the city at night in the winter.
_____
write a story about your username
_____
when the hero becomes the villain
_____
hit me in the face with emotion.
_____
writing workshop # 12: happiness
_____
humans do not inhabit the earth.


#### Compare with RAW

In [12]:
example = 0
print("EXAMPLE \n ____RAW____")
print(f"SOURCE: \n{raw_stories['source'][example]}")
print("\n")
print(f"COMPLETION: \n{raw_stories['human_completions'][example]}")
print("\n")
print("____CLEAN____")
print(f"SOURCE: {stories['source'][example]}")
print("\n")
print(f"COMPLETION: \n{stories['human_completions'][example]}")

EXAMPLE 
 ____RAW____
SOURCE: 
[ WP ] Your free trial of life has ended



COMPLETION: 
`` Overpopulation '' <newline> <newline> The man in the blue suit clicked a small remote in his hand and the screen filled with images of starving children , their eyes like rough marbles pushed not quite far enough into their heads , fantastic , horizon-filling cities that smoked and glowed like a dying fire , and , of course , the typical image of the Tokyo metro with passengers being crammed through the car doors by impassive uniformed men . <newline> <newline> `` It 's reality . A horrible one . The time has come to rectify it , and the hour for half measures is past . We 're in a position to possibly , '' The man in the blue suit paused , dipping his chin in a show of humility , `` save humanity from itself . '' <newline> <newline> The boardroom was silent . Smoke from various cigarettes and cigars drifted over the massive tabletop . Rain pattered on the window panes . <newline> <newline> `` Me

## DailyMail CNN

In [13]:
dailymail_cnn = load_clean_data(data_root, "dailymail_cnn")
raw_dailymail = load_raw_data(data_root, "dailymail_cnn")

#### Compare with RAW (normal and weird example)

In [14]:
weird = 20
print("WEIRD EXAMPLE \n ____RAW____")
print(raw_dailymail["source"][weird]) 
print(raw_dailymail["human_completions"][weird]) 

print("____CLEAN____")
print(dailymail_cnn["source"][weird])
print("\n")
print(dailymail_cnn["human_completions"][weird])

WEIRD EXAMPLE 
 ____RAW____
By . Hugo Gye . PUBLISHED: . 10:18 EST, 9 April 2013 . | . UPDATED: . 12:48 EST, 9 April 2013 . Harry Potter star Daniel Radcliffe wept today as beloved actor Richard Griffiths was laid to rest at a church where Shakespeare was buried. The Hollywood actor was one of 300 mourners paying their respects to the 65-year-old, best known for his roles in Withnail and I, Pie in the Sky and The History Boys. Mr Radcliffe led tributes to his friend during the funeral service at Holy Trinity church in Stratford-upon-Avon, saying he made any room 'twice as funny'. Laid to rest: Pallbearers carrying the coffin of actor Richard Griffiths at his funeral in Stratford-upon-Avon . Mourners: The actor's Harry Potter co-star Daniel Radcliffe was part of the crowd paying their respects . Friends: Downton Abbey creator Lord Fellowes, left, and actor Nigel Havers, right, attended the service today . Beloved: The 65-year-old actor is known for his roles in Withnail and I, Harry Pot

In [21]:
normal = 40
print("NORMAL EXAMPLE \n ____RAW____")
print(raw_dailymail["source"][normal])
print("\n")
print(raw_dailymail["human_completions"][normal])

print("____CLEAN____")
print(dailymail_cnn["source"][normal])
print("\n")
print(dailymail_cnn["human_completions"][normal])

NORMAL EXAMPLE 
 ____RAW____
At a time when England manager Roy Hodgson badly needs a speedy start to this Euro 2016 campaign, his captain tried to lift the mood by summoning the spirit of a flying Dutchman. Wayne Rooney was discussing the destructive qualities of Raheem Sterling when he likened him to Marc Overmars, and told how he had taken aside England's teen star to study clips of the former Arsenal and Holland striker. 'Before the World Cup I showed him videos of Marc Overmars because he reminded me of him,' said Rooney. 'He has the potential to be as good as Overmars. Bright future: Raheem Sterling was England's best player during Wednesday's 1-0 win against Norway . Similar: Wayne Rooney believes Sterling (left) shares some of the same qualities as Marc Overmars (right) Captain material: Rooney (left) has admitted showing Sterling videos of Overmars before the World Cup . Danger man: Overmars scared opposition defenders with his pace and trickery in attacking areas . 63.6% - En

## MRPC

In [16]:
mrpc = load_clean_data(data_root, "mrpc")
raw_mrpc = load_raw_data(data_root, "mrpc")

In [17]:
example = 40
print("EXAMPLE \n ____RAW____")
print(f"SOURCE: {raw_mrpc['source'][example]}")
print("\n")
print(f"COMPLETION: {raw_mrpc['human_completions'][example]}")

print("____CLEAN____")
print(f"SOURCE: {mrpc['source'][example]}")
print("\n")
print(f"COMPLETION: {mrpc['human_completions'][example]}")

EXAMPLE 
 ____RAW____
SOURCE: The University of Michigan released a new undergraduate admission process Thursday, dropping a point system the U.S. Supreme Court found unconstitutional in June.


COMPLETION: [['The University of Michigan released today a new admissions policy after the U.S. Supreme Court struck down in June the way it previously admitted undergraduates.']]
____CLEAN____
SOURCE: the university of michigan released a new undergraduate admission process thursday, dropping a point system the u.s. supreme court found unconstitutional in june.


COMPLETION: the university of michigan released today a new admissions policy after the u.s. supreme court struck down in june the way it previously admitted undergraduates.
