# Sexual Content in Laureates of 2020 Wattys Award

## Getting the data

I used my own [fork](https://github.com/AiKuroyake/Wattpad2Epub) of [Wattpad2Epub](https://github.com/GatoLoko/Wattpad2Epub) script. Wattpad2Epub allows you download Wattpad stories easily and save them in epub format. I have done some minor changes to the code, that's why I used my version.

After downloading 30 stories from [The Wattys 2020 Award Winners](https://www.wattpad.com/list/996419659-the-2020-wattys-award-winners), I needed to convert epub format of the stories to txt, so I can process it with Python. I used [Calibre](https://calibre-ebook.com/) for this task.

## Cleaning the data

Every book starts with Introduction containing synopsis, chapter list, and info about the book. I wanted as clean data as I could so I decided to delete these. Unfortunately, every book contains a bit different Introduction, so I had to go over them manually.

I wanted to remove authors' notes from the books, too. There was the same problem as with introduction. I had to delete the notes manually.

Almost every story contains images. A following line is displayed instead of images: _Oops! This image does not follow our content guidelines. To continue publishing, please remove it or upload a different image._ I removed it using replace() method when reading the file.

## Preparing data

I want to work with a dictionary, where keys are titles of the stories and values are the stories itself. 

I need to split each story into paragraphs and convert each paragraph into an nlp object. The problem is converting paragraphs into nlp objects takes a lot of time. And it is incovenient to wait half an hour every time I restart the Jupyter Notebook to get the data.

That's why I use `save_docs()` function to store converted nlp objects for all books and `load_docs()` function to load them quickly and easily.

In [76]:
import spacy
from spacy.tokens import DocBin
nlp = spacy.load('en_core_web_sm')

In [31]:
def save_docs(docs, filename):
    '''Save data to a file'''
    # TODO modify the list of attributes as you need
    # the list of available ones is in the table here:
    # https://spacy.io/api/matcher
    doc_bin = DocBin(attrs=["SENT_START","LEMMA",  "ENT_TYPE", "POS", "DEP"])
    for doc in docs:
        doc_bin.add(doc)
    with open(filename, 'ab+') as fp:
        fp.write(doc_bin.to_bytes())

In [9]:
def load_docs(filename, nlp):
    '''Load data from a file'''
    doc_bin = None
    with open(filename, 'rb') as fp:
        doc_bin = DocBin().from_bytes(fp.read())
    return list(doc_bin.get_docs(nlp.vocab))

In [10]:
def read_books(story):
    '''Reads a text divided into paragraphs.'''
    pars = []
    with open(story) as s:
        text = s.read()
        text = text.replace('Oops! This image does not follow our content guidelines. To continue publishing, please remove it or upload a different image.', '')
        pars = [p.replace('\n', ' ') for p in text.split('\n') if p.strip()]
    return pars

In [77]:
def convert_to_nlp(text):
    '''Converts paragraphs into nlp objects'''
    docs = [nlp(par) for par in text]
    return docs

<p>I hardcoded the stories' names into a variable. It would be more convenient to iterate over the whole data folder and pick one book at time, but I don't know how to do this in Python.</p>

In [54]:
stories = ['data/a-bear-in-sheeps-clothing.txt', 
           'data/a-fantasy-real.txt', 
           'data/a-is-for-arson.txt', 
           'data/a-timely-knight.txt', 
           'data/a-twist-of-marvel-infinity-war.txt', 
           'data/bending-the-rules.txt',
           'data/breaking-darkness.txt', 
           'data/charlie-and-dia.txt', 
           'data/comfort-the-wolves.txt',
           'data/dark-side-of-the-morning.txt', 
           'data/for-june.txt', 
           'data/how-to-be-the-best-third-wheel.txt',
           'data/how-to-lose-weight.txt', 
           'data/human-code.txt', 
           'data/inspector-rames.txt',
           'data/jackson-humes-is-not-a-superhero.txt',
           'data/oliver-ausman-lives-again.txt',
           'data/parasomnia.txt',
           'data/running-from-the-past.txt', 
           'data/t-m-i.txt', 
           'data/the-devils-match.txt', 
           'data/the-mosaic-in-her-eyes.txt', 
           'data/the-night-the-vampires-came.txt', 
           'data/the-omen-girl.txt',
           'data/the-painted-altair.txt',
           'data/the-psychopath-next-door.txt',
           'data/valeria-torres.txt',
           'data/we-the-young.txt',
           'data/winners-dont-have-bad-days.txt', 
           'data/zombie-soap.txt']

Split each story into paragraphs. Convert each paragraph into an nlp object. Save the nlp object of each story in a separate file. 

In [55]:
for story in stories:
    print('Reading {}...'.format(story))
    text = read_books(story)
    print('Converting {}...'.format(story))
    nlp_text = convert_to_nlp(text)
    print('Saving {} to {}.spacy...'.format(story, story[5:-4]))
    save_docs(nlp_text, '{}.spacy'.format(story[5:-4]))
    print()

Reading data/a-bear-in-sheeps-clothing.txt...
Converting data/a-bear-in-sheeps-clothing.txt...
Saving data/a-bear-in-sheeps-clothing.txt to a-bear-in-sheeps-clothing.spacy...

Reading data/a-fantasy-real.txt...
Converting data/a-fantasy-real.txt...
Saving data/a-fantasy-real.txt to a-fantasy-real.spacy...

Reading data/a-is-for-arson.txt...
Converting data/a-is-for-arson.txt...
Saving data/a-is-for-arson.txt to a-is-for-arson.spacy...

Reading data/a-timely-knight.txt...
Converting data/a-timely-knight.txt...
Saving data/a-timely-knight.txt to a-timely-knight.spacy...

Reading data/a-twist-of-marvel-infinity-war.txt...
Converting data/a-twist-of-marvel-infinity-war.txt...
Saving data/a-twist-of-marvel-infinity-war.txt to a-twist-of-marvel-infinity-war.spacy...

Reading data/bending-the-rules.txt...
Converting data/bending-the-rules.txt...
Saving data/bending-the-rules.txt to bending-the-rules.spacy...

Reading data/breaking-darkness.txt...
Converting data/breaking-darkness.txt...
Savin

In data_spacy folder, I have collected all stories split into paragraphs and converted to nlp objects. 

I hardcoded these names, too, because iterating over a folder gave me errors.

In [63]:
nlp_stories = ['data_spacy/a-bear-in-sheeps-clothing.spacy', 
               'data_spacy/a-fantasy-real.spacy', 
               'data_spacy/a-is-for-arson.spacy', 
               'data_spacy/a-timely-knight.spacy', 
               'data_spacy/a-twist-of-marvel-infinity-war.spacy', 
               'data_spacy/bending-the-rules.spacy',
               'data_spacy/breaking-darkness.spacy', 
               'data_spacy/charlie-and-dia.spacy', 
               'data_spacy/comfort-the-wolves.spacy',
               'data_spacy/dark-side-of-the-morning.spacy', 
               'data_spacy/for-june.spacy', 
               'data_spacy/how-to-be-the-best-third-wheel.spacy',
               'data_spacy/how-to-lose-weight.spacy', 
               'data_spacy/human-code.spacy', 
               'data_spacy/inspector-rames.spacy',
               'data_spacy/jackson-humes-is-not-a-superhero.spacy',
               'data_spacy/oliver-ausman-lives-again.spacy',
               'data_spacy/parasomnia.spacy',
               'data_spacy/running-from-the-past.spacy', 
               'data_spacy/t-m-i.spacy', 
               'data_spacy/the-devils-match.spacy', 
               'data_spacy/the-mosaic-in-her-eyes.spacy', 
               'data_spacy/the-night-the-vampires-came.spacy', 
               'data_spacy/the-omen-girl.spacy',
               'data_spacy/the-painted-altair.spacy',
               'data_spacy/the-psychopath-next-door.spacy',
               'data_spacy/valeria-torres.spacy',
               'data_spacy/we-the-young.spacy',
               'data_spacy/winners-dont-have-bad-days.spacy', 
               'data_spacy/zombie-soap.spacy']

I iterate over `nlp_stories` to create a dictionary consisting of a title (key) and the story itself (value). I use the `load_docs()` function to quickly access the processed data. 

In [73]:
docs = {}
for story in nlp_stories:
    doc = load_docs(story, nlp)
    title = str(doc[0][2:])
    docs[title] = doc[1:]

## Sexual Scenes in the Stories

Out of 30 stories, 23 contain the word 'sex':

In [86]:
def containing_sex(docs):
    stories = []
    for key, val in docs.items():
        for par in val:
            for tok in par:
                if tok.lemma_ == 'sex':
                    if key not in stories:
                        stories.append(key)
    return stories

cs = containing_sex(docs)
print(len(cs))

23


The following stories have at least one occurence of a word 'sex'. After following cell, I have also printed stories without 'sex'. 

In [100]:
[s for s in cs]

["A Bear in Sheep's Clothing | Book #1",
 'A Fantasy Real',
 'A Timely Knight',
 'Bending the Rules',
 'Breaking Darkness',
 'Comfort the Wolves',
 'Dark Side of the Morning',
 'for June',
 'How To Be The Best Third Wheel ✔',
 'How To Lose Weight And Survive The Apocalypse',
 'Inspector Rames',
 'Oliver Ausman Lives Again',
 'Parasomnia',
 'RUNNING FROM THE PAST',
 'T.M.I.',
 "The Devil's Match",
 'The Mosaic in Her Eyes',
 'The Night the Vampires Came',
 'The Painted Altar',
 'The Psychopath Next Door',
 'Valeria Torres and the Midas Vault',
 'We The Young',
 'Zombie Soap']

In [99]:
[key for key in docs.keys() if key not in cs]

['A is For Arson: A Langley & Porter Mystery',
 'A Twist Of Marvel Infinity War',
 'Charlie and Dia',
 'Human Code',
 'Jackson Humes is Not a Superhero',
 'THE OMEN GIRL',
 "Winners Don't Have Bad Days"]

However, we can't assume a story contains sexual scenes just based on counting a word 'sex'. Actually, even stories where 'sex' does not occur still can have sexual scenes. 

In [101]:
for key, val in docs.items():
    for par in val:
        for tok in par:
            if tok.lemma_ == 'sex':
                print(key, tok, tok.sent)

A Bear in Sheep's Clothing | Book #1 sex — Léon — — Léon — "Is it that strange I want to have sex to Gotye's State of the Art ?"
A Bear in Sheep's Clothing | Book #1 sex Inside her room, the smell of sex and fennel scented candles burned on her warm skin.
A Bear in Sheep's Clothing | Book #1 sex "It's sex, Rob.
A Bear in Sheep's Clothing | Book #1 sex He wants to have sex with me," Léon said.
A Bear in Sheep's Clothing | Book #1 sex sex "Oh."
A Bear in Sheep's Clothing | Book #1 sex "It'll be like old times, minus the sex, uh?
A Fantasy Real sex "How I make my living isn't all about sex, Livy.
A Fantasy Real sex If those needs include sex, then that's an added bonus I'm able to provide."
A Fantasy Real sex My needs most definitely included sex, and lots of it.
A Fantasy Real sex Ever since then, I had essentially forgone all need for sex in my desire to rise to the top of the meteorology world.
A Fantasy Real sex When I woke up this morning and planned my day, nowhere on my schedule di

A Fantasy Real sex Sure, but one would wonder, despite the great sex and all, after getting to know the real Olivia Morton, why he didn't walk away and not give my crazy ass a wide berth long before he followed me all the way home to Iowa.
A Fantasy Real sex The scintillating tang of sweat and sex filled my nostrils.
A Fantasy Real sex "I've never had unprotected sex before," he said.
A Fantasy Real sex He cringed at the mere thought of me having sex with Dean. "
A Fantasy Real sex Should we have had unprotected sex?"
A Fantasy Real sex She also had unprotected sex the night before.
A Fantasy Real sex "Should we have had unprotected sex?"
A Fantasy Real sex So, I told Rosalyn everything, minus the sex parts.
A Fantasy Real sex I didn't know where or how we would start, but I knew it wouldn't involve one quick night of amazing sex and Cal telling me to goodbye or have a nice life at one in the morning.
A Fantasy Real sex I made it up to you in other ways as well, but I'm not as good at 

Inspector Rames sex please "You mean Riannon came with no strings attached," Alex said, "and a sex drive and stamina your sick wife didn't have."
Inspector Rames sex There definitely hadn't been any illicit sex with Mrs Temple.
Inspector Rames sex He smelled of sex.
Inspector Rames sex It was part of a six-bedroom same-sex flat and locked by a fingerprint system.
Inspector Rames sex I wondered if it was possible for our score to be lowered because we'd had sex before work.
Inspector Rames sex "There's no need to flatter yourself -- quite frankly, we're uninterested in your phone sex.
Inspector Rames sex We'd been rushed into this -- forced into this -- and now Alex was probably thinking that while the sex was good, he hadn't even known me for a year.
Oliver Ausman Lives Again sex I blame Mom and the fact that she insists on making me go to an all-boys school so that the only contact I have with the opposite sex is with the stuck up daughters of her friends.
Parasomnia sex All this, all

The Painted Altar sex Upon discovering the baby's sex, she'd told Mrs Harvey that the name was Mary, like the Holy Mother.
The Psychopath Next Door sex Just because I think she's going to tie me to the bed and kill me in my sleep, " he said pointedly, "doesn't mean I'm obsessed with her or want to have sex with her or anything even remotely like that."
Valeria Torres and the Midas Vault sex She had spent much of her life being afraid of the opposite sex; this was the first time the roles were reversed.
We The Young sex That guy was made of the same things as lightning, cotton candy and sex.
We The Young Sex Sex?
Zombie Soap sex It wasn't the argumentative shrieking she'd had to drown out with her TV more than once or the disgusting, loud make-up sex that almost always followed.
Zombie Soap sex This man was sex on a stick in a police uniform.
Zombie Soap sex I'm not about to let the zombie apocalypse impede me having sex."
Zombie Soap sex This was not the time to be thinking about sex, 