# Trying to chat with Terence McKenna

In [1]:
import nltk
import chatterbot
import numpy as np
import pandas as pd
import random
import string
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import re
import spacy
import warnings
warnings.filterwarnings("ignore")

To do something new, I decided to use a seminar [talk](https://www.asktmk.com/talks/Shamanology) by Terence McKenna posted online as source. I copy-pasted the ~22,000 word talk into a text document and troubleshooted on this notebook until I was able to figure out how to convert it to list form for cleaning.

# Loading in the document

In [2]:
# load and clean the data
shamanology = pd.read_csv('shamanology.txt', sep='delimiter', header=None)

In [3]:
shamanology

Unnamed: 0,0
0,"My name is Terence McKenna and I'm, uh, a phil..."
1,"If any of you have read ""The Invisible Landsca..."
2,And [cough] while a number of plant species ar...
3,"For many people who seem interested in curing,..."
4,The system that I refer to is the endemic use ...
...,...
295,That's right we need to talk about it. And it ...
296,I want to come back to something you were sayi...
297,"Well, I think all of these things like the eco..."
298,Sounds like you've got a [pretty big job again]


In [78]:
myfile = open('new_sham.txt', 'w')

for i in shamanology[0]:
    myfile.write(i)

myfile.close()

In [4]:
new_sham = pd.read_csv('new_sham.txt', sep='delimiter', header=None)

In [5]:
new_sham

Unnamed: 0,0
0,"My name is Terence McKenna and I'm, uh, a phil..."


In [6]:
shaman_df = new_sham.values.tolist()[0][0]

In [7]:
shaman_df

'My name is Terence McKenna and I\'m, uh, a philosophical gadfly and shamanologist, writer and lecturer [laughter]. Uhm, Louis assured me that you were so familiar with my work that probably we could handle this meeting as a dialogue after a short introduction to some of the things that I\'m interested in. So we\'ll attempt that. I\'ll talk for a few minutes and then we\'ll see if we can\'t have that conversation about the aspects of these things that interest you.If any of you have read "The Invisible Landscape", which I am the co-author of with my brother, you know that it ranges over fairly hardcore chemistry and neurophysiology, through the phenomenology of shamanism, and on into a fairly extensive discussion of principles of ordering in the I Ching. But what I seem to, uh, find myself publicly lecturing about is the relationship of, uh, hallucinogens, especially plant hallucinogens to shamanic healing in the context where use of hallucinogens is associated with shamanism. If you l

Things our corpus contains that we must remove:

- Chapter headers
- Audience reactions in brackets
- Double Dashes

Along with traditional text cleaning

# Cleaning

In [8]:
# utility function for standard text cleaning
def text_cleaner(text):
    # visual inspection identifies a form of punctuation spaCy does not
    # recognize: the double dash '--'.  Better get rid of it now!
    text = re.sub("~(.*?)~", " ", text)
    text = re.sub(r'~\*',' ',text)
    text = re.sub(r'\*',' ',text)
    text = re.sub(r'--',' ',text)
    text = re.sub(r'\\',' ',text)
    text = re.sub("\[(.*?)\]", " ", text)
    text = re.sub(r'}','',text)
    text = ' '.join(text.split())
    return text

In [9]:
shaman_df = text_cleaner(shaman_df)

In [10]:
shaman_df = re.sub(r'Part \D+', '', shaman_df)

In [11]:
shaman_df

'My name is Terence McKenna and I\'m, uh, a philosophical gadfly and shamanologist, writer and lecturer . Uhm, Louis assured me that you were so familiar with my work that probably we could handle this meeting as a dialogue after a short introduction to some of the things that I\'m interested in. So we\'ll attempt that. I\'ll talk for a few minutes and then we\'ll see if we can\'t have that conversation about the aspects of these things that interest you.If any of you have read "The Invisible Landscape", which I am the co-author of with my brother, you know that it ranges over fairly hardcore chemistry and neurophysiology, through the phenomenology of shamanism, and on into a fairly extensive discussion of principles of ordering in the I Ching. But what I seem to, uh, find myself publicly lecturing about is the relationship of, uh, hallucinogens, especially plant hallucinogens to shamanic healing in the context where use of hallucinogens is associated with shamanism. If you look at the

In [15]:
# parse the cleaned novels. This can take a bit.

nlp = spacy.load('en')

shaman_doc = nlp(shaman_df)

In [16]:
# group into sentences.
# we use the sentences that has more than 1 character
shaman_sents = [sent.text for sent in shaman_doc.sents if len(sent.text) > 1]
shaman_sents

['My name is Terence McKenna',
 "and I'm, uh, a philosophical gadfly and shamanologist, writer and lecturer .",
 "Uhm, Louis assured me that you were so familiar with my work that probably we could handle this meeting as a dialogue after a short introduction to some of the things that I'm interested in.",
 "So we'll attempt that.",
 "I'll talk for a few minutes and then we'll see if we can't have that conversation about the aspects of these things that interest you.",
 'If any of you have read "The Invisible Landscape", which I am the co-author of with my brother, you know that it ranges over fairly hardcore chemistry and neurophysiology, through the phenomenology of shamanism, and on into a fairly extensive discussion of principles of ordering in the I Ching.',
 'But what I seem to, uh, find myself publicly lecturing about is the relationship of, uh, hallucinogens, especially plant hallucinogens to shamanic healing in the context where use of hallucinogens is associated with shamanism

# Simple Chatbot

In [17]:
GREETING_INPUTS = ["hello", "hi", "greetings", "what's up","hey"]
GREETING_RESPONSES = ["hello", "hi", "hey", "hi there"]
def greeting(sentence):
    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPONSES)

In [20]:
def response(user_input):
    
    response = ""
    # we parse the user's input using SpaCy
    input_doc = nlp(user_input)
    # then we split it into sentences
    input_sents = [sent.text for sent in input_doc.sents]
    # then we append the user's sentence into our list of sentences
    for sentence in input_sents:
        shaman_sents.append(sentence)
    
    # the next step is to vectorize our new corpus using tf-idf
    TfidfVec = TfidfVectorizer(max_df=0.5, min_df=1, use_idf=True, norm=u'l2', smooth_idf=True, lowercase=False)
    tfidf = TfidfVec.fit_transform(shaman_sents)
    
    # remove the user's input from the corpus
    shaman_sents.pop(-1)
    
    # we calculate the cosine similarity
    # between the user input and all the other sentences in the corpus
    similarities = cosine_similarity(tfidf[-1], tfidf[:-1])
    # we get the index of most similar sentence
    idx = np.argmax(similarities)
        
    if(idx):
        response = response + shaman_sents[idx]
        return response
    else:
        response = response + "I'm sorry! I don't know how to respond :("
        return response

In [21]:
print("Terence: Good evening, fellow traveler. If you want to exit, type bye please.")

while(True):
    
    user_input = input("User: ")
    user_input=user_input.lower()
    
    if(user_input!='bye'):
        if(user_input == 'thanks' or user_input == 'thank you'):
            break
            print("Terence: You're welcome.")
        else:
            if(greeting(user_input) != None):
                print("Terence: " + greeting(user_input))
            else:
                print("Terence: ", end = "")
                print(response(user_input))
    else:
        print("Terence: Good-bye! It was a great chat.")
        break

Terence: Good evening, fellow traveler. If you want to exit, type bye please.
User: Hello
Terence: hey
User: what's up Terence?
Terence: The shift of modality from down to up, and up to down.
User: clever answer
Terence: It seems the answer is "No".
User: if that is the answer, then what is the question?
Terence: It seems the answer is "No".
User: anything else?
Terence: else's.
User: elaborate please
Terence: I was definitely the largest person in any of these sessions, and, uh, the same amount is doled out to each person, and not in a context where you can say "I'd like to take more please" .
User: take more what?
Terence: What happened was I started to dream more and more....
User: Unfortunately you're not making sense yet. I'll have to end the conversation.
Terence: I'll talk for a few minutes and then we'll see if we can't have that conversation about the aspects of these things that interest you.
User: Maybe later
Terence: Ten minutes later it's gone .
User: bye
Terence: Good-bye

Not the best results but already interesting.

# Chatterbot Chatbot

In [12]:
# import libraries

from chatterbot import ChatBot

In [13]:
from chatterbot.trainers import ListTrainer, ChatterBotCorpusTrainer

In [14]:
from chatterbot.conversation import Statement

In [18]:
# create a chatbot
chatbot = ChatBot('Terence')
# this is to remove the accumulated knowledge base
chatbot.storage.drop()

# create a new trainer for the chatbot
trainer = ListTrainer(chatbot)

# train the chatbot based on Emma
trainer.train(shaman_sents)

List Trainer: [####################] 100%


In [19]:
print("Terence: I will try to respond you reasonably. If you want to exit, type bye please.")

# below is the chatting
while True:
    
    user_input = input("User: ")
    user_input=user_input.lower()
    
    if(user_input!='bye'):
        if(user_input == 'thanks' or user_input == 'thank you'):
            break
            print("Terence: You're welcome.")
        else:
            if(greeting(user_input) != None):
                print("Terence: " + greeting(user_input))
            else:
                print("Terence: ", end = "")
                print(chatbot.get_response(user_input))
    else:
        print("Terence: Bye! It was a great chat.")
        break

Terence: I will try to respond you reasonably. If you want to exit, type bye please.
User: hello
Terence: hello
User: How are you?
Terence: There are several ways a person can take that notion, really, in several different directions .
User: what notion?
Terence: And the way democracy is.
User: tell me about democracy
Terence: I mean, if I say that I'm falling in love, and you once fell in love, that doesn't mean that the way your love affair ended is how mine will end.
User: can you elaborate?
Terence: How free are they, if at the end of the year, we look at the wreck and say "Yes it certainly is true.
User: are you free?
Terence: How free are they, if at the end of the year, we look at the wreck and say "Yes it certainly is true.
User: are you a shaman?
Terence: How free are they, if at the end of the year, we look at the wreck and say "Yes it certainly is true.
User: why repeat yourself?
Terence: What I suggested was that they were actually varieties of time.
User: time?
Terence: Bu

Perhaps better, though we get repeated phrases and incoherence. This could be improved if the source document was directly a dialogue instead of a seminar with open questions, and if this corpus was larger.