**Event Extraction with Reinforcement Learning**

- This part of event extraction process uses Reinforcement Learning. 
- The process comprises of a main-task and a subsidiary-task.


*Main-Task: Trigger Identification*
- The main-task is also referred as trigger identification. In this task, the Agent sequentially scans tokens in the input sentence. 
- At each time step, the event type is assigned to the corresponding token according to a stochastic policy in RL.
- If the current word is recognized as an event trigger, a subsidiary-task will be launched.
- If the current word has an event type of 'None', the agent will skip the argument detection sub-task.

*Subsidiary-Task: Argument Detection*
- When a token is marked as event trigger, a subsidiary task is launched.
- This task detects the arguments in the event that corresponds to the event trigger. 
- The performance of this subtask is used to compute the reward for the action taken in the main-task.

**Extraction Process Steps**

The entire process is depicted in the following diagram and explained in steps: 

> ![BMEE](BMEE.png "Biomedical Event Extraction")
> * Current token in sentence is identified as event trigger 
> * The vector of trigger word and the embedded environment information will be concatenated and fed to Agent
> * The Agent takens an action, representing the predicted event type for word, according to a stochastic policy.
> * The word embedding is then concatenated with action vector to create a new representation, for all words in the input sentence.
> * This new representation of the input sentence pass through a BiLSTM-CRF module which detects the arguments given the indentified event trigger.
> * Then the predicted result of argument detection Y' is compared with ground-truth Y to compute a Reward.
> * Additionally, Y' is transformed into a vector L by a BiLSTM layer which is concatenated to environment information. This ends up helping the subsequent trigger. 

In [None]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

In [None]:
# constants
stateDim = 300
actionDim = 100
argDetectDim = 100
outputDim = actionDim 

In [None]:
class MLP:
    def __init__(self, stateDim, actionDim, argDetectDim, outputDim):
        self.inputDim = stateDim + actionDim + argDetectDim
        self.outputDim = outputDim
        self.model = self.buildModel()
        
    def buildModel(self):
        model = Sequential()
        model.add(Dense(128, input_dim=self.inputDim, activation='relu'))
        model.add(Dense(64), activation='relu')
        model.add(Dense(self.outputDim, activation='softmax'))
        
        model.compile(optimizer='adam', loss='categorical_crossentropy')
        return model
    
    def train(self, X, y, epochs=10, batchSize=32):
        self.model.fit(X, y, epochs=epochs, batch_size=batchSize)
    
    def predict(self, state):
        return self.model.predict(state.reshape(1, -1))

In [None]:
class State:
    def __init__(self, mlpModel, stateDim, actionDim, argDetectDim):
        self.mlpModel = mlpModel
        
        self.stateDim = stateDim
        self.actionDim = actionDim
        self.argDetectDim = argDetectDim
        
        # initial values for state sub vectors
        self.resetState()
    
    def updateState(self, tokenEmbedding, actionTypeVector, argumentVector):
        self.currentTokenEmbedding = tokenEmbedding
        self.lastActionTypeVector = actionTypeVector
        self.lastArgDetectVector = argumentVector
        
        inputToMLP = np.concatenate([
            self.currentTokenEmbedding,
            self.lastActionTypeVector,
            self.lastArgDetectVector,
            self.previousStateVector
        ])
        
        self.previousStateVector = self.mlpModel.predict(inputToMLP.reshape(1, -1)).flatten()
    
    def resetState(self):
        self.currentTokenEmbedding = np.zeros(self.stateDim)
        self.lastActionTypeVector = np.zeros(self.actionDim)
        self.lastArgDetectVector = np.zeros(self.argDetectDim)
        self.previousStateVector = np.zeros(self.stateDim)
    
    def getStateVector(self):
        return self.previousStateVector        

In [None]:
class Environment:
    def __init__(self, text, kbEmbedder, mlpModel, stateDim, actionDim, argDetectDim):
        self.text = text
        self.kbEmbedder = kbEmbedder
        self.tokens = self.tokenizeText(text)
        self.currentIndex = 0
        self.state = State(mlpModel, stateDim, actionDim, argDetectDim)
        
        self.groundTruth = self.getGroundTruth()
        self.totalReward = 0
        
    def tokenizeText(self, text):
        return text.split() # default delimiter - space
    
    def reset(self):
        # reset environment for a new episode
        self.currentIndex = 0
        self.state.resetState()
        self.totalReward = 0
    
    def step(self, trigger):
        pass
    
    def getWordRepresentation(self, token):
        pass
    
    def calculateReward(self):
        pass

In [None]:
class TriggerIdentificationTask:
    def __init__(self, model):
        self.model = model
    
    def identifyTrigger(self, state):
        pass

In [None]:
class ArgumentDetectionTask:
    def __init__(self, biLSTMCRFModel):
        self.biLSTMCRFModel = biLSTMCRFModel
        
    def detectArguments(self, trigger, state):
        pass

In [None]:
class Agent:
    def __init__(self, actionSpace, stateSize):
        self.actionSpace = actionSpace
        self.stateSize = stateSize
        self.mainTask = TriggerIdentificationTask()
        self.subsidiaryTask = ArgumentDetectionTask()
    
    def chooseAction(self, state):
        pass
    
    def updatePolicy(self, reward, state, action):
        pass
    
    def performMainTask(self, state):
        pass
    
    def performSubsidiaryTask(self, trigger, state):
        pass

In [None]:
class ReinforcementLearning:
    def __init__(self, agent, environment):
        self.agent = agent
        self.environment = environment
    
    def runEpisode(self):
        stateVector = self.environment.reset()
        done = False
        episodeReward = 0
        
        while not done:
            action = self.agent.chooseAction(stateVector)
            nextStateVector, reward, done = self.environment.step(action)
            self.agent.updatePolicy(reward, nextStateVector)
            eposideReward += reward
            stateVector = nextStateVector
        return episodeReward
    
    def train(self, numEpisodes):
        totalReward = 0
        for episode in range(numEpisodes):
            episodeReward = self.runEpisode()
            totalReward += episodeReward
            print(f"Episode {episode+1}/{numEpisodes}, Reward: {episodeReward}")
        print(f"Total reward after {num_episodes} episodes: {total_reward}")

    
    def evaluate(self):
        # optional implementation
        pass