# Regular Expressions with Functions

## Instructions

* Create a function that can find the amount of times a question was asked in a text file by searching for lines that end with a question mark.
  * Print the results for how many questions were asked in the Alice in Wonderland text file.
  * Print the results for how many questions were asked in the Sherlock Holmes text file.
* Create a function that will create a DataFrame for all times a word of 6+ length follows a character.
* Use the function to retrieve the following:
  * Print the word counts for the character 'Alice'.
  * Print the word count for the character 'Hatter'.
  * Print the word count for the character 'Holmes'.
* **Note** each function should be flexible for either text file.

In [41]:
import pandas as pd
import re, os
pd.options.display.max_colwidth = 100

In [42]:
# Set variables to text files
alice_text = os.path.join("Resources", "alice.txt")
sherlock_text = os.path.join("Resources", "sherlock.txt")

In [43]:
# Create a function to find how many questions are asked in a text
def how_many_questions(text_file):
    # convert text file to DataFrame
    text_df = pd.read_csv(text_file, sep="\n", header=None)
    text_df.columns = ["Line Text"]

    # set RegEx pattern
    form = r'(\?$)'

    return len(text_df['Line Text'].str.extractall(form, flags=re.I)[0].sum())

In [44]:
# Print questions asked in the Alice text
how_many_questions(alice_text)

12

In [45]:
# Print Questions asked in the Holmes text
how_many_questions(sherlock_text)

18

In [46]:
# Create a function that will create a DataFrame for all 6+ lengths word following a character
def character_length(text_file, character):
    # convert text file to dataframe
    file_df = pd.read_csv(text_file, sep="\n", header=None)
    file_df.columns = ["Text"]

    # set RegEx pattern
    # find but exclude the character -> ?:
    # length of 6+ -> {6,}
    # group both sets -> ()()
    # non-word characters /W
    # word characters \w
    # occurrences until the next match -> +
    pattern = '('+ character + ')' + '(\s*\w{6,})'

    # get results and create new dataframe
    results_df = file_df['Text'].str.extractall(pattern)

    # return the counts
    return results_df[1].value_counts()

In [47]:
# Print the Word count for the character Alice
alice_character = "Alice"
character_length(alice_text, alice_character)

 thought         11
 looked           8
 replied          8
 ventured         4
 hastily          4
 indignantly      3
 waited           3
 called           2
 whispered        2
 guessed          2
 cautiously       2
 noticed          2
 remarked         2
 opened           1
 herself          1
 angrily          1
 timidly          1
 turned           1
 sighed           1
 desperately      1
 considered       1
 panted           1
 glanced          1
 sharply          1
 watched          1
 appeared         1
 started          1
 thoughtfully     1
 rather           1
 severely         1
 recognised       1
 caught           1
 folded           1
 gently           1
 crouched         1
 quietly          1
 doubtfully       1
 joined           1
 remained         1
 dodged           1
 loudly           1
 laughed          1
Name: 1, dtype: int64

In [48]:
# Print the Word count for the characters Hatter
hatter_character = 'Hatter'
character_length(alice_text, hatter_character)

 hurriedly    1
 looked       1
 dropped      1
 grumbled     1
 continued    1
 replied      1
 instead      1
 opened       1
 trembled     1
Name: 1, dtype: int64

In [49]:
# Print the Word count for the characters Holmes
holmes_character = 'Holmes'
character_length(sherlock_text, holmes_character)

 quietly         4
 answered        3
 blandly         3
 turned          3
 sprang          3
 pushed          3
 laughed         3
 returned        3
 leaned          2
 chuckled        2
 cheerily        2
 gravely         2
 suavely         2
 pulled          2
 clapped         2
 walked          2
 refused         1
 calmly          1
 stopped         1
 gently          1
 sternly         1
 lately          1
 interposed      1
 twisted         1
 without         1
 stepped         1
 desired         1
 welcomed        1
 changed         1
 looked          1
 struck          1
 caught          1
 thrust          1
 coldly          1
 rushed          1
 grinned         1
 before          1
 thoughtfully    1
 continued       1
 laying          1
 sweetly         1
 seemed          1
 unlocked        1
 remarked        1
 impatient       1
 nodded          1
 standing        1
 glanced         1
 closed          1
 carelessly      1
 suddenly        1
 scribbled       1
 staggered  