Write a program that splits a document into sentences. The input to your program should be
a file containing text. The output should be a new file with each sentence from the first file on
a separate line.

In [42]:
# Assume document is the example document

document = """With all the fawning end-of-the-year kudos currently circulating, it’s easy to forget that a sizable number of actual bad movies came out in 2012. Well, consider this a refresher! From failed blockbuster tentpoles (”Battleship”) to would-be hilarious comedies (“The Watch”) to lame scare-challenged horror flicks (“The Apparition”) to...uh, well, pretty much anything involving Mr. Tyler Perry, there’s no doubt that the last 366 days have come with a heaping helping of truly heinous cinematic stinkers. So what better time for an accounting of the year’s most outrageous big-screen abominations than on the eve of the coming apocalypse?"""

There are three common ways to end a sentence:
* Ending with a full stop (eg. 'I went to work.')
* Ending with a question mark (eg. 'Did you go to work?')
* Ending with an exclamation mark (eg. 'I finally finished my work!')


Assuming that a full stop, a question mark or an exclamation mark <u>always</u> represents the end of a sentence does not yield a great performance. 

There are some cases in which the full stop does not mark the end of a sentence:
* When a full stop is used after an abbreviation (eg. 'I think <u>Dr.</u> Tom is over there.')
* When a full stop is used more than once as part of an abbreviation (e.g.)
* When a full stop is used as a decimal point for a number (eg. 'There's only 1.2GB storage left')
* When full stops are used as ellipsis points (eg. 'I'm not sure about that... maybe?')
* When dates are formatted with full stops (eg. 'My appointment is on 14.12.2021.')
* When making reference to file names (eg. 'Details can be found in README.txt.')
* When making reference to URLs (eg. 'Find out more at www.website.com.')

This can be further complicated when considering splitting sentences like:
* Sentences that use a combination of punctuation (eg. 'Oh my God!?')
* Quotations that span over multiple statements (eg. 'I said, "She's too young. She can't get married".')

## Generic Sentence Boundary Detection (Splitting on punctuation)

In [45]:
## create an initial empty list that will contain each sentence
sentences = []
placer = 0

for character in range(0, len(document)-1):
    if document[character] in ['!', '.', '?']:
        sentences.append(document[placer:character + 1])
        placer = character + 1

for sentence in sentences:
    print(sentence)

With all the fawning end-of-the-year kudos currently circulating, it’s easy to forget that a sizable number of actual bad movies came out in 2012.
 Well, consider this a refresher!
 From failed blockbuster tentpoles (”Battleship”) to would-be hilarious comedies (“The Watch”) to lame scare-challenged horror flicks (“The Apparition”) to.
.
.
uh, well, pretty much anything involving Mr.
 Tyler Perry, there’s no doubt that the last 366 days have come with a heaping helping of truly heinous cinematic stinkers.


In [17]:
helpful link: https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-15-S2-S4

SyntaxError: invalid syntax (<ipython-input-17-211f7e4d42bd>, line 1)

## Sentence Boundary Detection - Round 1 (Removing ellipsis points)

In [None]:
# If there is no punctuation before it and no punctuation after it then mark it as a placer

In [64]:
sentences = []
placer = 0

for character in range(0, len(document)-1):
    if document[character] in ['!', '.', '?']:
        if character + 1 >= len(document):
            sentences.append(document[placer:])
        elif document[character - 1] == '.' or document[character + 1] == '.':
            continue
        else:
            sentences.append(document[placer:character + 1])
            placer = character + 1

In [65]:
for sentence in sentences:
    print(sentence)
    
print(sentences)

With all the fawning end-of-the-year kudos currently circulating, it’s easy to forget that a sizable number of actual bad movies came out in 2012.
 Well, consider this a refresher!
 From failed blockbuster tentpoles (”Battleship”) to would-be hilarious comedies (“The Watch”) to lame scare-challenged horror flicks (“The Apparition”) to...uh, well, pretty much anything involving Mr.
 Tyler Perry, there’s no doubt that the last 366 days have come with a heaping helping of truly heinous cinematic stinkers.
['With all the fawning end-of-the-year kudos currently circulating, it’s easy to forget that a sizable number of actual bad movies came out in 2012.', ' Well, consider this a refresher!', ' From failed blockbuster tentpoles (”Battleship”) to would-be hilarious comedies (“The Watch”) to lame scare-challenged horror flicks (“The Apparition”) to...uh, well, pretty much anything involving Mr.', ' Tyler Perry, there’s no doubt that the last 366 days have come with a heaping helping of truly h

## Sentence Boundary Detection - Round 2 (Removing mid-abbreviation full stops)

In [66]:
sentences = []
placer = 0

for character in range(0, len(document)-1):
    if document[character] in ['!', '.', '?']:
        if character + 1 >= len(document):
            sentences.append(document[placer:])
        elif document[character + 1].isalpha():
            continue
        else:
            sentences.append(document[placer:character + 1])
            placer = character + 1

In [67]:
for sentence in sentences:
    print(sentence)
    
print(sentences)

With all the fawning end-of-the-year kudos currently circulating, it’s easy to forget that a sizable number of actual bad movies came out in 2012.
 Well, consider this a refresher!
 From failed blockbuster tentpoles (”Battleship”) to would-be hilarious comedies (“The Watch”) to lame scare-challenged horror flicks (“The Apparition”) to.
.
.uh, well, pretty much anything involving Mr.
 Tyler Perry, there’s no doubt that the last 366 days have come with a heaping helping of truly heinous cinematic stinkers.
['With all the fawning end-of-the-year kudos currently circulating, it’s easy to forget that a sizable number of actual bad movies came out in 2012.', ' Well, consider this a refresher!', ' From failed blockbuster tentpoles (”Battleship”) to would-be hilarious comedies (“The Watch”) to lame scare-challenged horror flicks (“The Apparition”) to.', '.', '.uh, well, pretty much anything involving Mr.', ' Tyler Perry, there’s no doubt that the last 366 days have come with a heaping helping 