**This code was written by: 
Sujan Dutta**

Student at Kalyani Govt. Engineering College

Chinsurah, West Bengal, India

**aka: "Normalized Nerd"**
http://bit.ly/normalizedNERD

## Importing tools

In [1]:
import numpy as np
import pandas as pd
import os
import re
import string
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import random

## Reading every Sherlock Holmes adventure!

In [2]:
story_path = "/kaggle/input/sherlock-holmes-stories/sherlock/sherlock/"

def read_all_stories(story_path):
    txt = []
    for _, _, files in os.walk(story_path):
        for file in files:
            with open(story_path+file) as f:
                for line in f:
                    line = line.strip()
                    if line=='----------': break
                    if line!='':txt.append(line)
    return txt
        
stories = read_all_stories(story_path)
print("number of lines = ", len(stories))

number of lines =  215021


## Cleaning the text

In [3]:
def clean_txt(txt):
    cleaned_txt = []
    for line in txt:
        line = line.lower()
        line = re.sub(r"[,.\"\'!@#$%^&*(){}?/;`~:<>+=-\\]", "", line)
        tokens = word_tokenize(line)
        words = [word for word in tokens if word.isalpha()]
        cleaned_txt+=words
    return cleaned_txt

cleaned_stories = clean_txt(stories)
print("number of words = ", len(cleaned_stories))

number of words =  2332247


## Creating the Markov Model

In [4]:
def make_markov_model(cleaned_stories, n_gram=2):
    markov_model = {}
    for i in range(len(cleaned_stories)-n_gram-1):
        curr_state, next_state = "", ""
        for j in range(n_gram):
            curr_state += cleaned_stories[i+j] + " "
            next_state += cleaned_stories[i+j+n_gram] + " "
        curr_state = curr_state[:-1]
        next_state = next_state[:-1]
        if curr_state not in markov_model:
            markov_model[curr_state] = {}
            markov_model[curr_state][next_state] = 1
        else:
            if next_state in markov_model[curr_state]:
                markov_model[curr_state][next_state] += 1
            else:
                markov_model[curr_state][next_state] = 1
    
    # calculating transition probabilities
    for curr_state, transition in markov_model.items():
        total = sum(transition.values())
        for state, count in transition.items():
            markov_model[curr_state][state] = count/total
        
    return markov_model

In [5]:
markov_model = make_markov_model(cleaned_stories)

In [6]:
print("number of states = ", len(markov_model.keys()))

number of states =  208715


In [7]:
print("All possible transitions from 'the game' state: \n")
print(markov_model['the game'])

All possible transitions from 'the game' state: 

{'is hardly': 0.02702702702702703, 'worth it': 0.02702702702702703, 'you are': 0.02702702702702703, 'i am': 0.02702702702702703, 'is up': 0.06306306306306306, 'now count': 0.02702702702702703, 'your letter': 0.02702702702702703, 'for all': 0.06306306306306306, 'is afoot': 0.036036036036036036, 'my own': 0.02702702702702703, 'at any': 0.02702702702702703, 'mr holmes': 0.02702702702702703, 'ay whats': 0.02702702702702703, 'my friend': 0.02702702702702703, 'fairly by': 0.02702702702702703, 'is not': 0.02702702702702703, 'was not': 0.02702702702702703, 'is and': 0.036036036036036036, 'was whist': 0.036036036036036036, 'for the': 0.036036036036036036, 'was in': 0.02702702702702703, 'may wander': 0.02702702702702703, 'now a': 0.02702702702702703, 'was up': 0.09009009009009009, 'would have': 0.036036036036036036, 'in their': 0.036036036036036036, 'in that': 0.036036036036036036, 'the lack': 0.036036036036036036, 'was afoot': 0.0360360360360360

## Generating Sherlock Holmes stories!

In [8]:
def generate_story(markov_model, limit=100, start='my god'):
    n = 0
    curr_state = start
    next_state = None
    story = ""
    story+=curr_state+" "
    while n<limit:
        next_state = random.choices(list(markov_model[curr_state].keys()),
                                    list(markov_model[curr_state].values()))
        
        curr_state = next_state[0]
        story+=curr_state+" "
        n+=1
    return story

In [9]:
for i in range(20):
    print(str(i)+". ", generate_story(markov_model, start="dear holmes", limit=8))

0.  dear holmes you are not going away in a place for the reception of so distinguished a career 
1.  dear holmes oh yes well we are quite a match and a wealth of deep black hair her 
2.  dear holmes i thought he was not in the least disreputable transaction which touched at pondicherry lodge the 
3.  dear holmes am i right certainly there is your hat and when she raised one hand in his 
4.  dear holmes if i were to be found in the revolver two have been reading a volume but 
5.  dear holmes my previous letters and supposing that the name of steve wilson and let me know if 
6.  dear holmes if i could give us no information which could help it well it is not raining 
7.  dear holmes i exclaimed and this is important for you steve if i get no rest for me 
8.  dear holmes my previous letters and supposing that i was astounded to see miss stapleton sitting upon a 
9.  dear holmes i ejaculated commonplace said holmes though the matter it may be has only just laid there 
10.  dear holmes i e

In [10]:
for i in range(20):
    print(str(i)+". ", generate_story(markov_model, start="my dear", limit=8))

0.  my dear young lady who is scratching at my front door and the crisp smoothness of a and 
1.  my dear watson he is a little off the beaten track isnt it said sir henry had become 
2.  my dear fellow no one can tone down to it and knocked again without my leave with as 
3.  my dear watson to understand that this register and diary may implicate some of the fine tan or 
4.  my dear holmes what can it be let us see if we have given you you have paid 
5.  my dear sir i have been unable to stand erect when my narrative begins i had nursed and 
6.  my dear doctor said holmes cheerily i am overjoyed to see a sister of his should ever have 
7.  my dear watson should have told me as mr ferguson now sit here and there to blair island 
8.  my dear fellow he will guard the patient and i much fear my dear holmes i cant tell 
9.  my dear watson said he the agent of von bork like some fearful dream i have lost my 
10.  my dear watson it is a most mysterious business as a rule quite distinctive i h

In [11]:
for i in range(20):
    print(str(i)+". ", generate_story(markov_model, start="i would", limit=8))

0.  i would risk a little sporting flutter that you dont know what to make a jury even if 
1.  i would rather die under my command and i was too terrified to raise no hand and speak 
2.  i would have it is too low and the wardrobe that is two days later this same performance 
3.  i would only ask you not see him but would he come with you said the inspector here 
4.  i would part with it money would be impossible to effect a rescue the alarm however was given 
5.  i would sooner have a savage blow from a blunt instrument which had crushed in part of his 
6.  i would be after her now bless the girl what are you not conducting the case before you 
7.  i would ask you to humor me a kind o thick and broad hasp wrought in the image 
8.  i would see you really have not time to put up the last point in this way asked 
9.  i would rather swing a score of them went off to his statement and that she had recovered 
10.  i would always be the master of the sparest and his habits were simple to the

In [12]:
print(generate_story(markov_model, start="the case", limit=100))

the case was concerned with an explosive energy which told me that i was at such a sight as that the cipher macdonald sat with his hands in a paroxysm of energy and he stretched his legs as he raced past and her refusal to take her there are few in that chair i can see terror in the coal district in the bare assembly room the men he is armed with a pair precisely but this is certainly a most extraordinary fashion a letter arrived on march his death in the same way i thought of my presence then with a head that was even greater than the obtrusive emotion of the clergyman he sat for some little time no said i only meant sir that sir charles was the elder man first the younger brother and i said i could rely upon my assistance in the enclosed official report it was quite against my wishes twice my boy jack baby dolores and mrs mason to bring news of the outside briarbrae just after sunset well i think we will walk down together to our left or to a small blue book which ascends to such rar