# ChatBot - Foul Word Detection

## Abstract

Chatbots, or conversational interfaces as they are also known, present a new way for individuals to interact with computer systems. Traditionally, to get a question answered by a software program involved using a search engine, or filling out a form. A chatbot allows a user to simply ask questions in the same manner that they would address a human. The most well known chatbots currently are voice chatbots: Alexa and Siri. However, chatbots are currently being adopted at a high rate on computer chat platforms.

The technology at the core of the rise of the chatbot is natural language processing (“NLP”). Recent advances in machine learning have greatly improved the accuracy and effectiveness of natural language processing, making chatbots a viable option for many organizations. This improvement in NLP is firing a great deal of additional research which should lead to continued improvement in the effectiveness of chatbots in the years to come.

We will be evaluating different chatbots first and creating a chatbot. We are evaluating based on hit and miss.Goal is to integrate any website with a chatbot. Objective is that it is domain specific for now, can be extended to be scalable across other platforms. We will use RASA NLU to understand the questions in a correct manner and also take care of foul language being used. We will use intent classification and entity extraction. 

Foul Word detection can be detected using a profanity filter which is what is implemented below.

## What is Rasa NLU?

Rasa NLU is an open source NLP library for intent classification and entity extraction. You can think of it as a set of high-level APIs for building your own language parser using existing NLP and ML libraries.

## Why Rasa NLU?

1. We don’t have to hand over all your training data to Google, Microsoft, Amazon, or Facebook.
2. Machine Learning is not one-size-fits all. You can tweak and customize models for your training data.
3. Rasa NLU runs wherever we want, so we don’t have to make an extra network request for every message that comes in.



## Installation files

!pip install rasa_core;
import logging, io, json, warnings
logging.basicConfig(level="INFO")
warnings.filterwarnings('ignore')

Now we will create a Story for chatbot. This is domain specific.
A story starts with ## and you can give it a name. lines that start with * are messages sent by the user. Although you don't write the actual message, but rather the intent (and the entities) that represent what the user means. If you don't know about intents and entities, don't worry! We will talk about them more later. Lines that start with - are actions taken by your bot. In this case all of our actions are just messages sent back to the user, like utter_greet, but in general an action can do anything, including calling an API and interacting with the outside world.

## Stories are for a dialogue flow

In [1]:
stories_md = """
## happy path               <!-- name of the story - just for debugging -->
* greet              
  - utter_greet
* mood_great               <!-- user utterance, in format intent[entities] -->
  - utter_happy
* mood_affirm
  - utter_happy
* mood_affirm
  - utter_goodbye
  
  
## sad path 1               <!-- this is already the start of the next story -->
* greet
  - utter_greet             <!-- action the bot should execute -->
* mood_unhappy
  - utter_cheer_up
  - utter_did_that_help
* mood_affirm
  - utter_happy
  
  

## sad path 2
* greet
  - utter_greet
* mood_unhappy
  - utter_cheer_up
  - utter_did_that_help
* mood_deny
  - utter_goodbye
  
## strange user
* mood_affirm
  - utter_happy
* mood_affirm
  - utter_unclear

## say goodbye
* goodbye
  - utter_goodbye
  - utter_goodday

## no foul
* foul
  - utter_foul

"""

%store stories_md > stories.md

Writing 'stories_md' (str) to file 'stories.md'.


Now we will create a sample list.We should list all of the intents and actions that show up in your stories. This is also the place to write templates, which contain the messages your bot can send back

## We define the intents in this file for intent classsification

In [None]:
domain_yml = """
intents:
  - greet
  - goodbye
  - mood_affirm
  - mood_deny
  - mood_great
  - mood_unhappy
  - foul

actions:
- utter_greet
- utter_cheer_up
- utter_did_that_help
- utter_happy
- utter_goodbye
- utter_unclear
- utter_goodday
- utter_foul

templates:
  utter_greet:
  - text: "Hey! How are you?"

  utter_cheer_up:
  - text: "Here is something to cheer you up:"
    image: "https://i.imgur.com/nGF1K8f.jpg"

  utter_did_that_help:
  - text: "Did that help you?"

  utter_unclear:
  - text: "I am not sure what you are aiming for."
  
  utter_happy:
  - text: "Great carry on!"

  utter_goodbye:
  - text: "Bye"
  
  utter_goodday:
  - text: "See you soon"
  
  utter_foul:
  - text: "Please do not use such language"
  
"""

%store domain_yml > domain.yml

Writing 'domain_yml' (str) to file 'domain.yml'.


In [None]:
config_spacy = """
{
  "pipeline":"spacy_sklearn",
  "path":"./models/nlu",
  "data":"./data/data.json"
}

"""

Now we will use Keras. It is a neural networks library written in Python that is high-level in nature – which makes it extremely simple and intuitive to use.We will also use Agent class provides a convenient interface for the most important Rasa Core functionality.It includes training, handling messages, loading a dialogue model, getting the next action, and handling a channel.

In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import unicode_literals
from rasa_core.policies.keras_policy import KerasPolicy
from rasa_core.policies.memoization import MemoizationPolicy
from rasa_core.agent import Agent
from rasa_core.featurizers import (MaxHistoryTrackerFeaturizer, BinarySingleStateFeaturizer)

featurizer = MaxHistoryTrackerFeaturizer(BinarySingleStateFeaturizer(), max_history=5)

agent = Agent('domain.yml', policies=[MemoizationPolicy(max_history=5),KerasPolicy(featurizer)])
                        
agent.train(
        'stories.md',
        validation_split=0.0,
        #max_history=3,
        epochs=100
);

agent.persist('models/dialogue');



Using TensorFlow backend.
Processed Story Blocks: 100%|█████████████████████████████████████████████| 6/6 [00:00<00:00, 288.32it/s, # trackers=1]
Processed Story Blocks: 100%|█████████████████████████████████████████████| 6/6 [00:00<00:00, 229.55it/s, # trackers=6]
Processed Story Blocks: 100%|████████████████████████████████████████████| 6/6 [00:00<00:00, 187.19it/s, # trackers=12]
Processed Story Blocks: 100%|████████████████████████████████████████████| 6/6 [00:00<00:00, 140.73it/s, # trackers=13]
Processed trackers: 100%|█████████████████████████████████████████████| 176/176 [00:08<00:00, 19.55it/s, # actions=183]
Processed actions: 183it [00:00, 293.95it/s, # examples=183]
Processed trackers: 100%|█████████████████████████████████████████████| 176/176 [00:07<00:00, 23.57it/s, # actions=183]


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
masking_1 (Masking)          (None, 5, 17)             0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 32)                6400      
_________________________________________________________________
dense_1 (Dense)              (None, 10)                330       
_________________________________________________________________
activation_1 (Activation)    (None, 10)                0         
Total params: 6,730
Trainable params: 6,730
Non-trainable params: 0
_________________________________________________________________
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22

Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


we just trained the dialogue model - so basically the conversational flow. So the bot will only understand structured input and no natural language yet. 

Go try it out with typing "/" + one of the intents from your domain before, e.g.:

/greet

/mood_affirm

/mood_deny

## Type here and the chatbot detects foul words

In [None]:
from profanity_check import predict, predict_prob
print("Your bot is ready to talk! Type your messages here or send 'stop'")
while True:
  a = input("Analysre here \n")
  predict([a])
  result = predict_prob([a])

  if result > 0.2:
    a = "/foul"
    responses = agent.handle_message(a)
    for response in responses:
     print(response)
     
  else:
    a = input("Give an entity:\n")
    if a == 'stop':
     break
    responses = agent.handle_message(a)
    for response in responses:
     print(response)



Your bot is ready to talk! Type your messages here or send 'stop'
Analysre here 
fuck you
{'recipient_id': 'default', 'text': 'Please do not use such language'}
Analysre here 
hello
Give an entity:
/greet
{'recipient_id': 'default', 'text': 'Hey! How are you?'}
Analysre here 
idiot
{'recipient_id': 'default', 'text': 'Great carry on!'}
