<a href="https://colab.research.google.com/github/noircir/Python/blob/master/Self_learning_chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [55]:
!pip install nltk



In [56]:
!pip install newspaper3k



In [0]:
from newspaper import Article
import random
import string
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import nltk
import numpy as np
import warnings


In [0]:
#Ignore any warning messages
warnings.filterwarnings('ignore')

In [59]:
nltk.download('punkt', quiet=True)
nltk.download('wordnet', quiet=True)

True

In [60]:
#Get the article URL
article = Article('https://www.mayoclinic.org/diseases-conditions/chronic-kidney-disease/symptoms-causes/syc-20354521')
article.download()
article.parse()
article.nlp()
corpus = article.text

#Print the corpus/text
print(corpus)

Overview

Chronic kidney disease, also called chronic kidney failure, describes the gradual loss of kidney function. Your kidneys filter wastes and excess fluids from your blood, which are then excreted in your urine. When chronic kidney disease reaches an advanced stage, dangerous levels of fluid, electrolytes and wastes can build up in your body.

In the early stages of chronic kidney disease, you may have few signs or symptoms. Chronic kidney disease may not become apparent until your kidney function is significantly impaired.

Treatment for chronic kidney disease focuses on slowing the progression of the kidney damage, usually by controlling the underlying cause. Chronic kidney disease can progress to end-stage kidney failure, which is fatal without artificial filtering (dialysis) or a kidney transplant.

Chronic kidney disease care at Mayo Clinic

How kidneys work

Symptoms

Signs and symptoms of chronic kidney disease develop over time if kidney damage progresses slowly. Signs an

## Create tokens of sentences (bag of sentences)

In [61]:
#Tokenization
text = corpus
sentence_tokens = nltk.sent_tokenize(text) # Tokenize sentences (not words). Convert the text into a list of sentences

#Print the list of sentences
print(sentence_tokens)

['Overview\n\nChronic kidney disease, also called chronic kidney failure, describes the gradual loss of kidney function.', 'Your kidneys filter wastes and excess fluids from your blood, which are then excreted in your urine.', 'When chronic kidney disease reaches an advanced stage, dangerous levels of fluid, electrolytes and wastes can build up in your body.', 'In the early stages of chronic kidney disease, you may have few signs or symptoms.', 'Chronic kidney disease may not become apparent until your kidney function is significantly impaired.', 'Treatment for chronic kidney disease focuses on slowing the progression of the kidney damage, usually by controlling the underlying cause.', 'Chronic kidney disease can progress to end-stage kidney failure, which is fatal without artificial filtering (dialysis) or a kidney transplant.', 'Chronic kidney disease care at Mayo Clinic\n\nHow kidneys work\n\nSymptoms\n\nSigns and symptoms of chronic kidney disease develop over time if kidney damage

In [62]:
#Create a dictionary (key:value) pair to remove punctuations
# (our translation table)
# The ord() function returns an integer representing the Unicode character.

remove_punct_dict = dict(  ( ord(punct),None) for punct in string.punctuation)

#Print the punctuations
print(string.punctuation)

#Print the dictionary
print(remove_punct_dict)

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
{33: None, 34: None, 35: None, 36: None, 37: None, 38: None, 39: None, 40: None, 41: None, 42: None, 43: None, 44: None, 45: None, 46: None, 47: None, 58: None, 59: None, 60: None, 61: None, 62: None, 63: None, 64: None, 91: None, 92: None, 93: None, 94: None, 95: None, 96: None, 123: None, 124: None, 125: None, 126: None}


## Create a function for tokenizing individual words, to be used when vectorizing sentences

In [63]:
#Create a function to return a list of lemmatized lower case words after removing punctuations
def LemNormalize(text):
  # The translate() method returns a string where each character is mapped 
  # to its corresponding character as per the translation table.
  return nltk.word_tokenize(text.lower().translate(remove_punct_dict))   

#Print the tokenized text
print(LemNormalize(text))

['overview', 'chronic', 'kidney', 'disease', 'also', 'called', 'chronic', 'kidney', 'failure', 'describes', 'the', 'gradual', 'loss', 'of', 'kidney', 'function', 'your', 'kidneys', 'filter', 'wastes', 'and', 'excess', 'fluids', 'from', 'your', 'blood', 'which', 'are', 'then', 'excreted', 'in', 'your', 'urine', 'when', 'chronic', 'kidney', 'disease', 'reaches', 'an', 'advanced', 'stage', 'dangerous', 'levels', 'of', 'fluid', 'electrolytes', 'and', 'wastes', 'can', 'build', 'up', 'in', 'your', 'body', 'in', 'the', 'early', 'stages', 'of', 'chronic', 'kidney', 'disease', 'you', 'may', 'have', 'few', 'signs', 'or', 'symptoms', 'chronic', 'kidney', 'disease', 'may', 'not', 'become', 'apparent', 'until', 'your', 'kidney', 'function', 'is', 'significantly', 'impaired', 'treatment', 'for', 'chronic', 'kidney', 'disease', 'focuses', 'on', 'slowing', 'the', 'progression', 'of', 'the', 'kidney', 'damage', 'usually', 'by', 'controlling', 'the', 'underlying', 'cause', 'chronic', 'kidney', 'disease'

## Keyword matching for some static cases like greetings and goodbyes

In [0]:
#Keyword Matching

#Greeting Inputs
GREETING_INPUTS = ["hi", "hello", "hola", "greetings", "wassup", "hey"]

#Greeting responses back to the user
GREETING_RESPONSES=["howdy", "hi", "hey", "what's good", "hello", "hey there"]

#Function to return a random greeting response to a users greeting
def greeting(sentence):
  #if the user's input is a greeting, then return a randomly chosen greeting response
  for word in sentence.split():
    if word.lower() in GREETING_INPUTS:
      return random.choice(GREETING_RESPONSES)

In [65]:
# Take the user response (user question) and append to the bag of sentences

#The users response / query
user_response = 'What is chronic kidney disease'

user_response = user_response.lower() 

###Print the users query/ response
print(user_response)

#Set the chatbot response to an empty string
robo_response = ''

#Append the users response to the sentence list (remove later??)
sentence_tokens.append(user_response)

###Print the sentence list after appending the users response
print(sentence_tokens)


what is chronic kidney disease
['Overview\n\nChronic kidney disease, also called chronic kidney failure, describes the gradual loss of kidney function.', 'Your kidneys filter wastes and excess fluids from your blood, which are then excreted in your urine.', 'When chronic kidney disease reaches an advanced stage, dangerous levels of fluid, electrolytes and wastes can build up in your body.', 'In the early stages of chronic kidney disease, you may have few signs or symptoms.', 'Chronic kidney disease may not become apparent until your kidney function is significantly impaired.', 'Treatment for chronic kidney disease focuses on slowing the progression of the kidney damage, usually by controlling the underlying cause.', 'Chronic kidney disease can progress to end-stage kidney failure, which is fatal without artificial filtering (dialysis) or a kidney transplant.', 'Chronic kidney disease care at Mayo Clinic\n\nHow kidneys work\n\nSymptoms\n\nSigns and symptoms of chronic kidney disease dev

## Create TF-IDF vector object from the bag of sentences, with normalized weights. 

In [66]:
#Create a TfidfVectorizer Object
TfidfVec = TfidfVectorizer(tokenizer = LemNormalize, stop_words='english')

#Convert the text to a matrix of TF-IDF features
tfidf = TfidfVec.fit_transform(sentence_tokens)

###Print the TFIDF features
print(tfidf)

  (0, 90)	0.2164280749596735
  (0, 125)	0.3107364655225348
  (0, 94)	0.3558667125853599
  (0, 53)	0.3558667125853599
  (0, 77)	0.3107364655225348
  (0, 28)	0.3107364655225348
  (0, 60)	0.1315981644259779
  (0, 117)	0.39479449327793364
  (0, 40)	0.3322398458510612
  (0, 153)	0.3558667125853599
  (1, 220)	0.31359416197260925
  (1, 75)	0.4003993499796609
  (1, 21)	0.26281638212043423
  (1, 87)	0.4003993499796609
  (1, 74)	0.4003993499796609
  (1, 229)	0.34962157012748585
  (1, 84)	0.4003993499796609
  (1, 118)	0.26281638212043423
  (2, 22)	0.28491956414971
  (2, 26)	0.3263002572765562
  (2, 68)	0.3263002572765562
  (2, 86)	0.25555949513226306
  (2, 121)	0.28491956414971
  (2, 51)	0.3263002572765562
  (2, 197)	0.3263002572765562
  :	:
  (15, 184)	0.16527716427117542
  (15, 115)	0.09213255328993324
  (15, 164)	0.07527448599501078
  (15, 97)	0.08263858213558771
  (15, 127)	0.09213255328993324
  (15, 95)	0.16527716427117542
  (15, 205)	0.09213255328993324
  (15, 52)	0.18426510657986647
  (15,

In [67]:
#Get the measure of similarity (similarity scores)
vals = cosine_similarity(tfidf[-1], tfidf)

#Print the similarity scores
print(vals)

[[0.49892678 0.         0.22873724 0.38282205 0.43356564 0.29959479
  0.37653755 0.4320342  0.11311574 0.         0.16503485 0.15926757
  0.         0.41485439 0.36071879 0.12349944 1.        ]]


In [68]:

# Get the index of the most similar text/sentence to the users response
# sort by ascending, the last probability will be = 1, because the last sentence is the appended user's question.
idx = vals.argsort()[0][-2]
print(idx)

  

0


In [69]:
#Reduce the dimensionality of vals
flat = vals.flatten()
print(flat)

[0.49892678 0.         0.22873724 0.38282205 0.43356564 0.29959479
 0.37653755 0.4320342  0.11311574 0.         0.16503485 0.15926757
 0.         0.41485439 0.36071879 0.12349944 1.        ]


In [70]:
#sort the list in ascending order
flat.sort()
print(flat)

[0.         0.         0.         0.11311574 0.12349944 0.15926757
 0.16503485 0.22873724 0.29959479 0.36071879 0.37653755 0.38282205
 0.41485439 0.4320342  0.43356564 0.49892678 1.        ]


In [71]:
#Get the most similar score to the users response
score = flat[-2]

#Print the similarity score
print(score)

0.4989267784088469


In [73]:
#If the variable 'score' is 0 then there is no text similar to the users response
if(score == 0):
  robo_response = robo_response + "I apologize, I don't understand."
else:
  robo_response = robo_response + sentence_tokens[idx]
  
#Print the chat bot response
print(robo_response)
  
#Remove the users response from the sentence tokens list
sentence_tokens.remove(user_response)
#print(sentence_tokens)

Overview

Chronic kidney disease, also called chronic kidney failure, describes the gradual loss of kidney function.Overview

Chronic kidney disease, also called chronic kidney failure, describes the gradual loss of kidney function.


## Wrapped into a function

In [0]:
#Generate the response
def response(user_response):
  

  #The users response / query
  #user_response = 'What is chronic kidney disease'

  user_response = user_response.lower() #Make the response lower case

  ###Print the users query/ response
  #print(user_response)

  #Set the chatbot response to an empty string
  robo_response = ''

  #Append the users response to the sentence list
  sent_tokens.append(user_response)

  ###Print the sentence list after appending the users response
  #print(sent_tokens)

  #Create a TfidfVectorizer Object
  TfidfVec = TfidfVectorizer(tokenizer = LemNormalize, stop_words='english')

  #Convert the text to a matrix of TF-IDF features
  tfidf = TfidfVec.fit_transform(sent_tokens)

  ###Print the TFIDF features
  #print(tfidf)

  #Get the measure of similarity (similarity scores)
  vals = cosine_similarity(tfidf[-1], tfidf)

  #Print the similarity scores
  #print(vals)

  #Get the index of the most similar text/sentence to the users response
  idx = vals.argsort()[0][-2]

  #Reduce the dimensionality of vals
  flat = vals.flatten()

  #sort the list in ascending order
  flat.sort()

  #Get the most similar score to the users response
  score = flat[-2]

  #Print the similarity score
  #print(score)

  #If the variable 'score' is 0 then their is no text similar to the users response
  if(score == 0):
    robo_response = robo_response+"I apologize, I don't understand."
  else:
    robo_response = robo_response+sent_tokens[idx]
  
  #Print the chat bot response
  #print(robo_response)
  
  #Remove the users response from the sentence tokens list
  sent_tokens.remove(user_response)
  
  return robo_response

In [15]:
flag = True
print("DOCBot: I am Doctor Bot or DOCBot for short. I will answer your queries about Chronic Kidney Disease. If you want to exit, type Bye!")
while(flag == True):
  user_response = input()
  user_response = user_response.lower()
  if(user_response != 'bye'):
    if(user_response == 'thanks' or user_response =='thank you'):
      flag=False
      print("DOCBot: You are welcome !")
    else:
      if(greeting(user_response) != None):
        print("DOCBot: "+greeting(user_response))
      else:
        print("DOCBot: "+response(user_response))       
  else:
    flag = False
    print("DOCBot: Chat with you later !")

DOCBot: I am Doctor Bot or DOCBot for short. I will answer your queries about Chronic Kidney Disease. If you want to exit, type Bye!
hello
DOCBot: hello
Bye
DOCBot: Chat with you later !
