# 20 Questions

*Ryan Becwar and Matthew Dragan*

## Introduction
An intelligent agent should be able to make decicions given some information.  In addition, such an agent should be able to learn about a new subject given some information on that subject.  In this project we will try to produce a simplified version of the described agent by creating a program that can learn about new items by asking questions about them. 

The agent will play a version of the 20 questions game. 20 questions is a game in which a computer will try to guess what the player is thinking based on some simple 'yes' or 'no' questions.  In this version the goal objects will consist of fruits and vegetables, and the questions will be specialized for differentiating between fruits and vegetables. 
Additionally, we will try to produce a version of the game that has the following characteristics:
  * The game will determine which question to ask next based on which question most evenly divides the remaining possibilities.  This means that if the game has narrowed the possibilities down to 10 fruits and vegetables then the game will try to pick the next question that has 5 'yes' answers and 5 'no' answers.  
  * If the plays plays a fruit or vegetable that the game has not heard of before then the game will add that fruit or vegetable to the database and use the players answers as the 'truth' about that fruit or vegetable.  For example, if the player is playing 'raspberry' and the game has never heard of 'raspberry' before then 'raspberry' will be added to the database and the answers that the player gave will be considered to be the 'truth' about raspberries.  If the player chooses a fruit or vegetable that hasn't been played before then we may ask the player to answer all questions about that fruit or vegetable so the answers will be added to the database.  
  * If there are two fruits/vegetables that are hard to tell apart based on the questions (radishes and beets for example) the game will pick the fruit or vegetable that has been played the most frequently (a.k.a. the one that is most likely based on previous plays of the game).  
  * The game will be able to get to the correct answer even if the player answers a question wrong.  
  * The game will be able to change an answer about a fruit or vegetable if it believes that the current answer is wrong.  For example, say the game was taught that a tomato is a vegatable, and the user plays 'tomato' several times answering that 'yes' a tomato is a fruit then the game will eventually learn that a tomato is more likely a fruit than a vegetable.    
  
If implemented correctly, and with enough practice, the game should be able to frequently guess the correct fruit or vegetable based on the users input.  

## Methods
In order to create the game we implemented a twentyQ class in python which will implement all of the functions required to run the game.  

In [97]:
import numpy as np
import pandas as pd
import csv
import copy as cp

In [9]:
%%writefile twentyQ.py

class twentyQ(object):
    def __init__(self):
        self.questions = []
        self.answers = {}
        self.likelihood = {}
        self.weightVals = {}
        self.questionsUsed = []
        self.remainingFood = []
        
        data, weights = self.readData()
        
        self.processData(data, weights)
        
    def readData(self):
        # get known data from csv
        data = pd.read_csv('tempData.csv')

        # get weights from csv
        weights = pd.read_csv('tempWeights.csv')

        # extract questions
        questions = list(data.dtypes.index)
        self.questions = questions[1:]

        #extract data
        data = data.values

        # extract weights
        weights = weights.values
        
        return data, weights
    
    def processData(self, data, weights):
        for i in data:
            self.answers[i[0]] = i[1:]
            self.likelihood[i[0]] = 0
            self.remainingFood.append(i[0])
    
        for i in weights:
            self.weightVals[i[0]] = i[1:]
            
    def getFirstQuestion(self):
        countYes = 0
        countNo = 0
        nextQ = []
        for j in range(0,len(self.questions)):
            for i in self.answers:
                if self.answers[i][j] == 1:
                    countYes = countYes + 1
                elif self.answers[i][j] == -1:
                    countNo = countNo + 1
            nextQ.append(abs(countYes - countNo))
            countYes = countNo = 0
        self.questionsUsed.append(self.questions[np.argmin(nextQ)])
        return self.questions[np.argmin(nextQ)]
            
    def getNextQuestion(self, currentQ, currentA):
        couldBe = []
        countYes = 0
        countNo = 0
        nextQ = []
        for i in self.answers:
            if self.answers[i][self.questions.index(currentQ)] is currentA:
                couldBe.append(i)
        self.remainingFood = list(set(self.remainingFood) & set(couldBe))
        for j in range(0,len(self.questions)):
            for i in self.remainingFood:
                if self.answers[i][j] == 1:
                    countYes = countYes + 1
                elif self.answers[i][j] == -1:
                    countNo = countNo + 1
            nextQ.append(abs(countYes - countNo))
            countYes = countNo = 0
        self.questionsUsed.append(self.questions[np.argmin(nextQ)])
        return self.questions[np.argmin(nextQ)]
    
    def convertAnswer(self, currentA):
        if currentA is 'yes':
            return 1
        elif currentA is 'no':
            return -1
        else:
            return 0
        
    def updateLikelihood(self, currentQ, currentA):
        for i in self.answers:
            if self.answers[i][self.questions.index(currentQ)] is currentA:
                self.likelihood[i] = self.likelihood[i] + self.weightVals[i][self.questions.index(currentQ)] *1
            else:
                self.likelihood[i] = self.likelihood[i] + self.weightVals[i][self.questions.index(currentQ)] *-1
                
    def updateWeights(self, answer, correct):
        if correct is True:
            for i in self.questionsUsed:
                self.weightVals[answer][self.questions.index(i)] = self.weightVals[answer][self.questions.index(i)] + (1-self.weightVals[answer][self.questions.index(i)])/2 
                print(self.weightVals[answer][self.questions.index(i)])
        else:
            for i in self.questionsUsed:
                self.weightVals[answer][self.questions.index(i)] = self.weightVals[answer][self.questions.index(i)] - (1-self.weightVals[answer][self.questions.index(i)])/2
                if self.weightVals[answer][self.questions.index(i)] < .0625:
                    self.weightVals[answer][self.questions.index(i)] = .5
                    self.answers[answer][self.questions.index(i)] = -self.answers[answer][self.questions.index(i)]
    
    def writeToCSV(self):
        Qs = cp.deepcopy(self.questions)
        Qs.insert(0, ' ')
        
        #copy data to csv
        myfile = open('tempData.csv', 'w')
        with myfile:
            myFields = Qs
            writer = csv.DictWriter(myfile, fieldnames=myFields)    
            writer.writeheader()
            for i in self.answers:
                newlist = [i]
                for j in self.answers[i]:
                    newlist.append(j)
                writer.writerow({Qs[k]:newlist[k] for k in range(len(Qs))})
                    
        # copy weights to csv
        myfile = open('tempWeights.csv', 'w')
        with myfile:
            myFields = Qs
            writer = csv.DictWriter(myfile, fieldnames=myFields)    
            writer.writeheader()
            for i in self.weightVals:
                newlist = [i]
                for j in self.weightVals[i]:
                    newlist.append(j)
                writer.writerow({Qs[k]:newlist[k] for k in range(len(Qs))})

Writing twentyQ.py


In [14]:
me = twentyQ()
me.writeToCSV()
me.answers

{'apple': array([1, -1, 1, -1, 1, -1, -1, 1, -1], dtype=object),
 'broccoli': array([-1, -1, -1, -1, -1, 1, 1, -1, -1], dtype=object),
 'carrot': array([-1, -1, -1, 1, -1, -1, -1, -1, -1], dtype=object),
 'lettuce': array([-1, 1, -1, -1, -1, 1, -1, 1, -1], dtype=object),
 'onion': array([-1, 1, -1, 1, -1, -1, -1, 1, -1], dtype=object),
 'orange': array([1, -1, 1, -1, -1, -1, -1, 1, -1], dtype=object),
 'peach': array([1, -1, 1, -1, 1, -1, -1, 1, -1], dtype=object),
 'potato': array([-1, -1, -1, 1, -1, -1, -1, 1, -1], dtype=object),
 'raspberry': array([1, -1, -1, -1, 1, -1, 1, -1, -1], dtype=object),
 'strawberry': array([1, -1, -1, -1, 1, -1, 1, -1, 1], dtype=object)}

In [15]:
print(me.getFirstQuestion())
print(me.remainingFood)
me.updateLikelihood(me.questionsUsed[0], -1)
print(me.getNextQuestion(me.questionsUsed[0], -1))
print(me.remainingFood)
me.updateLikelihood(me.questionsUsed[1],-1)
print(me.getNextQuestion(me.questionsUsed[1], -1))
print(me.remainingFood)
me.updateLikelihood(me.questionsUsed[2], 1)
print(me.getNextQuestion(me.questionsUsed[2], 1))
print(me.remainingFood)
me.updateWeights('raspberry', False)
me.likelihood


is it a fruit?
['apple', 'orange', 'peach', 'strawberry', 'raspberry', 'carrot', 'potato', 'lettuce', 'onion', 'broccoli']
Is it commonly found on a sandwich?
['potato', 'carrot', 'onion', 'broccoli', 'lettuce']
Does it grow underground?
['carrot', 'potato', 'broccoli']
Is it usually round?
['carrot', 'potato']


{'apple': -0.5,
 'broccoli': 0.5,
 'carrot': 1.5,
 'lettuce': -0.5,
 'onion': 0.5,
 'orange': -0.5,
 'peach': -0.5,
 'potato': 1.5,
 'raspberry': -0.5,
 'strawberry': -0.5}

In [89]:
print({me.questions[i]:me.questions[i] for i in range(len(me.questions))})
me.questions

{'is it a fruit?': 'is it a fruit?', 'Is it commonly found on a sandwich?': 'Is it commonly found on a sandwich?', 'Does it grow on a tree?': 'Does it grow on a tree?', 'Does it grow underground?': 'Does it grow underground?', 'Is it commonly found in pie?': 'Is it commonly found in pie?', 'Is it mostly green?': 'Is it mostly green?', 'Does it grow on a bush?': 'Does it grow on a bush?', 'Is it usually round?': 'Is it usually round?', 'Does it have seeds on the outside?': 'Does it have seeds on the outside?'}


['is it a fruit?',
 'Is it commonly found on a sandwich?',
 'Does it grow on a tree?',
 'Does it grow underground?',
 'Is it commonly found in pie?',
 'Is it mostly green?',
 'Does it grow on a bush?',
 'Is it usually round?',
 'Does it have seeds on the outside?']

In [91]:

myfile = open('dummy.csv', 'w')
with myfile:
    myFields = me.questions
    writer = csv.DictWriter(myfile, fieldnames=myFields)    
    writer.writeheader()
    for i in me.answers:
        newlist = [i]
        for j in me.answers[i]:
            newlist.append(j)
        writer.writerow({me.questions[k]:newlist[k] for k in range(len(me.questions))})