##### Reinforcement Learning and Decision Making &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Homework #5

# &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Bar Brawl

## Description

You are the proprietor of an establishment that sells beverages of an unspecified, but delicious, nature. The establishment is frequented by a set $P$ of patrons.  One of the patrons is the instigator and another is the peacemaker.

On any given evening, a subset $S \subseteq P$ is present at the establishment. If the instigator is in $S$ but the peacemaker is not in $S$, then a fight will break out. If the instigator is not in $S$ or if the peacemaker is in $S$, then no fight will occur.

Your goal is to learn to predict whether a fight will break out among the subset of
patrons present on a given evening, without initially knowing the identity of
the instigator or the peacemaker.

## Procedure

Develop a KWIK learner for this problem (see Li, Littman, and Walsh 2008).  Your learner will be presented with $S$, the patrons at the establishment, and the outcome (fight or no fight) of that evening.
Your learner will attempt to predict whether a fight will break out, or indicate that
it doesn't know, and should be capable of learning from the true outcome for the evening.

For each problem, the following input will be given:

-   `at_establishment`: a Boolean two-dimensional array whose rows
    represent distinct evenings and whose columns represent distinct patrons.
    Each entry specifies if a particular patron is present
    at the establishment on a particular evening:
    a $1$ means present and a $0$ means absent.
    
-   `fight_occurred`: a Boolean vector whose entries are the 
    outcomes (a $1$ means ``FIGHT`` and a $0$ means ``NO FIGHT``) for that
    particular evening.


Specifically:

-   For each episode (evening), you should present your learner with the next row of `at_establishment` and the corresponding row of `fight_occurred`.
-   If your learner returns a $1$ (for ``FIGHT``) or a $0$ (for ``NO FIGHT``),
    you may continue on to the next episode.
-   If your learner returns a $2$ (for ``I DON’T KNOW``), then you should
    present the pair (`at_establishment`, `fight_occurred`) to you learner
    to learn from.

You will return a string of integers corresponding to the returned values of each episode.

The test case will be considered successful if no wrong answers are returned
**and** the number of "I DON'T KNOW"s does not exceed the maximum 
allowed by the autograder.

## Resources

The concepts explored in this homework are covered by:

-   'Knows what it knows: A framework for self-aware learning', Li, Littman, Walsh 2008

## Submission

-   The due date is indicated on the Canvas page for this assignment.
    Make sure you have your timezone in Canvas set to ensure the
    deadline is accurate.

-   Submit your finished notebook on Gradescope. Your grade is based on
    a set of hidden test cases. You will have unlimited submissions.
    By default, the last score is kept.  You can also set a particular
    submission as active in the submission history, in which case that
    submission will determine your grade.

-   Use the template below to implement your code. We have also provided
    some test cases for you. If your code passes the given test cases,
    it will run (though possibly not pass all the tests) on Gradescope.

-   Gradescope is using python 3.6.x. For permitted libraries, please see
    the requirements.txt file, You can also use any core library
    (i.e., anything in the Python standard library).
    No other library can be used.  Also, make sure the name of your
    notebook matches the name of the provided notebook.  Gradescope times
    out after 10 minutes.

In [7]:
import numpy as np
from itertools import permutations

class Agent(object):
    def __init__(self):
        pass

    def solve(self, at_establishment, fight_occurred):
        """
        at_establishment: list of 0s and 1s, where 1s indicate that the corresponding patron is present
        """
        pred_results = []
        pred_string_results = ""
        num_patrons = len(at_establishment[0])

        # create a list of patrons index
        patrons = list(range(len(at_establishment[0])))

        # init the original hypothesis space
        H = list(permutations(patrons,2))

        # iterate each episode
        for episode in range(len(at_establishment)):
            print("\repisode:", episode, end=" ")
            H_prime, pred = self.pred_or_learn(at_establishment[episode], num_patrons,H, fight_occurred[episode])
            
            pred_results.append(pred)
            H = H_prime
            if (pred == 1):
                pred_string_results += "1"
                # pred_string_results.append("FIGHT")
            elif (pred == -1):
                pred_string_results += "2"
                # pred_string_results.append("I DON’T KNOW")
            else:
                pred_string_results += "0"
                # pred_string_results.append("NO FIGHT")
        return pred_string_results


    def pred_or_learn(self, patrons, num_patrons, H, fight):
        """
        Inner function to decide outcome
        """

        # if all patrons present, no fight will occur for sure
        if sum(patrons) == num_patrons:
            return (H, 0)
        
        # store the opinions from each hypothesis h
        votes = []
        
        # itearte each h in Hypothesis space H
        for h in H:
            # from h, compute logics whether instigator/peacemaker present in the current episode
            instigator_ind, peacemaker_ind = h[0], h[1]
            instigator_presence = (patrons[instigator_ind] == 1)
            peacemaker_presence = (patrons[peacemaker_ind] == 1)
            
            # fight only occurs if instigator presents while peacemake does not
            if (instigator_presence and not peacemaker_presence):
                votes.append(1)
            else:
                votes.append(0)
    
        # check whether all remaining h have unaminous opinions on fight or no-fight
        # if Yes |L| = 1, the algo knows the proper output
        if (sum(votes) == len(votes) or sum(votes) == 0):
            return (H, votes[0])
        
        # if different opinion is in the vote, present the true label, remove all hypothesis that 
        # give the wrong opinion
        else:
            remove_indices = [i for i, x in enumerate(votes) if x == (1-fight)]
            for ind in sorted(remove_indices, reverse=True):
                del H[ind]
            return (H, -1)

In [8]:

## DO NOT MODIFY THIS CODE.  This code will ensure that you submission is correct 
## and will work proberly with the autograder

import unittest


class TestBarBrawl(unittest.TestCase):
    @classmethod
    def setUpClass(cls):
        cls.agent = Agent()

    def test_case_1(self):
        np.testing.assert_equal(
            self.agent.solve(
                [[1,1], [1,0], [0,1], [1,1], [0,0], [1,0], [1,1]],
                [0, 1, 0, 0, 0, 1, 0]
            ),
            '0200010'
        )

    def test_case_2(self):
        np.testing.assert_equal(
             self.agent.solve(
                [[1,0,0,0,],[0,1,0,0],[0,1,1,1],[0,1,1,1],[0,1,1,1],[0,0,0,1],[1,1,1,1],[1,1,1,1],[1,1,1,1],[1,1,1,1],[0,1,1,1],[1,1,1,1],[0,1,1,1],[0,1,1,1],[1,1,1,1],[0,1,1,1],[0,1,1,1],[1,1,1,1],[0,1,1,1],[0,1,1,1],[0,0,0,1],[0,0,0,1],[1,1,1,1],[0,1,1,1],[0,1,1,1],[0,0,1,1],[0,1,1,1],[0,0,0,1],[0,0,1,1],[0,1,1,1],[0,0,1,1],[1,1,1,1]],
                [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0]
            ),
            '22200200000000000000000002001010'
        )

    def test_case_3(self):
        np.testing.assert_equal(
             self.agent.solve(
                [[1,0,1],[1,0,1],[1,1,1],[1,1,1],[1,1,1],[0,1,1],[0,0,1],[0,1,1],[1,1,1],[1,1,1],[1,1,1],[1,1,1],[0,1,1],[1,0,1],[1,1,1],[1,0,1],[1,0,1],[1,1,1]],
                [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
            ),
            '200002000000000000'
        )
        
unittest.main(argv=[''], verbosity=2, exit=False)

test_case_1 (__main__.TestBarBrawl.test_case_1) ... FAIL
test_case_2 (__main__.TestBarBrawl.test_case_2) ... FAIL
test_case_3 (__main__.TestBarBrawl.test_case_3) ... FAIL

FAIL: test_case_1 (__main__.TestBarBrawl.test_case_1)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\mccar\AppData\Local\Temp\ipykernel_17404\1546236693.py", line 13, in test_case_1
    np.testing.assert_equal(
  File "C:\Users\mccar\AppData\Local\Programs\Python\Python311\Lib\site-packages\numpy\testing\_private\utils.py", line 381, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: ['0', '2', '0', '0', '0', '1', '0']
 DESIRED: '0200010'

FAIL: test_case_2 (__main__.TestBarBrawl.test_case_2)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\mccar\AppData\Local\Temp\ipykernel_17404\1546236693.py", line 22, in test_case_2
    np.t

episode: 17 

<unittest.main.TestProgram at 0x20238f14dd0>