# Problem Formulation

Instead of predicting whether one event occurs before another (binary classification), the goal is to predict the entire sequence of events in the correct chronological order. This shifts the focus from pairwise comparisons to sequence prediction.

#Inputs and Outputs

Inputs: A sequence of events, represented by their textual descriptions. Each event within a context (e.g., a sentence or paragraph) is tokenized and represented as a sequence of embeddings.

Outputs: The output is a sequence representing the correct chronological order of these events. For example, if three events are given, the model might output a permutation of indices like [2, 0, 1], indicating the correct order.

#The Architecture

1. Input Representation
The input to the model is a sequence of events, where each event is represented as a tokenized sequence of words.

Event Sequence: Each event is turned into a sequence of tokens (e.g., words or subwords).
Contextual Information: The context of each event (e.g., the surrounding sentence) is also tokenized and can be included to provide additional information.
2. Embedding Layer

The tokenized sequences are passed through an Embedding Layer. This layer converts each token (word) into a dense vector representation, capturing semantic meanings and relationships between words.
The result is a sequence of embeddings representing the events and their contexts.

3. Sequence Processing (LSTM/GRU)
The embedded event sequences are fed into LSTM (Long Short-Term Memory) or GRU (Gated Recurrent Unit) layers.
These layers process the sequence of embeddings, capturing temporal dependencies and the order in which events occur.
Since LSTMs can remember long-term dependencies, they are well-suited for understanding the sequence and temporal relationships between events.

4. Attention Mechanism (Optional)
An Attention Mechanism can be applied to focus on the most important parts of the input sequence when making predictions.
Attention helps the model weigh the significance of each event in the sequence, making it easier to determine their correct order.

5. Output Layer: Sequence Prediction
After processing the sequence with LSTM/GRU layers, the model generates an output that predicts the order of events.
The output layer typically consists of dense layers with a softmax activation function, which outputs a probability distribution over the possible event orders.

6. Decoding the Output
Training: During training, the model is trained to minimize the difference between its predicted order and the correct order (ground truth). The correct order comes from the temporal relations (T-LINKs) in the dataset.
Inference: During inference, the model outputs a sequence of indices that represent the predicted chronological order of the events.

7. Example
Consider a sequence of three events:

Input Sequence: ["Event A happened", "Event B happened", "Event C happened"]

Tokenized Input: [[1, 34, 56], [2, 34, 57], [3, 34, 58]] (where numbers are token indices)

Model Prediction: The model processes this sequence and outputs [2, 1, 3], meaning the correct temporal order is Event B, Event A, Event C.


# 1.Extraction
Whats going on here?

1. Parsing TimeML Files

2. Extracting Events and Context Sentences

The events are stored in a list of dictionaries, where each dictionary contains:

EVENT ID: The unique identifier for the event.

EVENT Text: The text content of the event.

Context Sentence: The sentence in which the event appears.

3. Extracting Temporal Links (T-LINKs):

The T-LINKs are stored in a list of dictionaries, where each dictionary contains:

Event ID 1: The first event in the temporal relationship.

Event ID 2: The second event in the temporal relationship.

Relation: The type of temporal relation between the two events.


4. Combining DataFrames




#  Parsing and Combining Datasets

Function Definition: A function (parse_tml_with_context) is created to parse TimeML files and extract:

Events: Including their IDs, text, and the context sentence they belong to.

TIMEX3: Temporal expressions along with their IDs.

T-LINKs: Relationships between events.

Dataset Loading: The function is called for two datasets, TimeBank.tml and TimeEval3.tml, producing three DataFrames for each dataset (events, TIMEX3, T-LINKs).

Combining DataFrames: The resulting DataFrames from both datasets are concatenated to form combined DataFrames for events, TIMEX3, and T-LINKs.

Output: The first few rows of each combined DataFrame are printed for inspection.

# Preparing Input Data

Input Data Preparation


Padded Sequences of Event Texts:

This will be the primary input for the LSTM model.
You will tokenize the text of events and pad the sequences to ensure uniform input length.

Encoded Labels from T-LINKs:

These will serve as the target output for the model.
You will encode the relationships specified by the T-LINKs into numerical labels.

TIMEX3 Information (Optional):

If you choose to include TIMEX3 entities, this could provide additional temporal context.
You can extract TIMEX3 texts and either use them as additional features or include them in the context of the event texts.

In [14]:
import pandas as pd
import xml.etree.ElementTree as ET
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout, Bidirectional
from tensorflow.keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.regularizers import l2

# Function to parse a TML file (TimeML format) and extract events, T-LINKs, and TIMEX3
def parse_tml_with_context(file_path):
    tree = ET.parse(file_path)
    root = tree.getroot()

    events = []
    tlinks = []
    timex3s = []

    # Extract events and TIMEX3
    for s in root.iter('TEXT'):
        sentence_text = s.text
        for event in s.iter('EVENT'):
            event_id = event.attrib['eid']
            event_text = event.text
            events.append({'EVENT ID': event_id, 'EVENT Text': event_text, 'Context Sentence': sentence_text})

        for timex in s.iter('TIMEX3'):
            timex_id = timex.attrib['tid']
            timex_text = timex.text
            timex3s.append({'TIMEX3 ID': timex_id, 'TIMEX3 Text': timex_text})

    # Extract T-LINKs
    for tlink in root.iter('TLINK'):
        event_id_1 = tlink.attrib.get('eventInstanceID')
        event_id_2 = tlink.attrib.get('relatedToEventInstance')
        relation = tlink.attrib.get('relType')

        if event_id_1 and event_id_2:
            tlinks.append({'Event ID 1': event_id_1, 'Event ID 2': event_id_2, 'Relation': relation})

    events_df = pd.DataFrame(events)
    timex3_df = pd.DataFrame(timex3s)
    tlinks_df = pd.DataFrame(tlinks)

    return events_df, timex3_df, tlinks_df

# Load the datasets
timebank_events_df, timebank_timex3_df, timebank_tlinks_df = parse_tml_with_context('TimeBank.tml')
timeeval3_events_df, timeeval3_timex3_df, timeeval3_tlinks_df = parse_tml_with_context('TimeEval3.tml')

# Combine the datasets
combined_events_df = pd.concat([timebank_events_df, timeeval3_events_df], ignore_index=True)
combined_timex3_df = pd.concat([timebank_timex3_df, timeeval3_timex3_df], ignore_index=True)
combined_tlinks_df = pd.concat([timebank_tlinks_df, timeeval3_tlinks_df], ignore_index=True)

# Prepare input data for events
tokenizer = Tokenizer()
tokenizer.fit_on_texts(combined_events_df['EVENT Text'].tolist())  # Fit on event texts
event_sequences = tokenizer.texts_to_sequences(combined_events_df['EVENT Text'].tolist())  # Convert texts to sequences

# Prepare input data for TIMEX3 (if present)
if not combined_timex3_df.empty:
    timex3_sequences = tokenizer.texts_to_sequences(combined_timex3_df['TIMEX3 Text'].tolist())
else:
    timex3_sequences = []

# Pad event sequences
padded_event_sequences = pad_sequences(event_sequences, padding='post', dtype='int32')  # Pad event sequences

# Pad TIMEX3 sequences if they exist
if timex3_sequences:
    padded_timex3_sequences = pad_sequences(timex3_sequences, padding='post', dtype='int32')  # Pad TIMEX3 sequences
else:
    padded_timex3_sequences = np.empty((padded_event_sequences.shape[0], 0), dtype='int32')  # Create empty array

# Combine event and TIMEX3 sequences only if TIMEX3 sequences exist
if padded_timex3_sequences.size > 0:
    combined_input_sequences = np.concatenate((padded_event_sequences, padded_timex3_sequences), axis=1)
else:
    combined_input_sequences = padded_event_sequences  # Use only event sequences

# Display the DataFrames
print("Events DataFrame:")
print(combined_events_df.head())  # Display first few rows of events

print("\nTIMEX3 DataFrame:")
print(combined_timex3_df.head())  # Display first few rows of TIMEX3

print("\nT-LINKs DataFrame:")
print(combined_tlinks_df.head())  # Display first few rows of T-LINKs

# Create encoded labels
relation_mapping = {relation: idx for idx, relation in enumerate(combined_tlinks_df['Relation'].unique())}
encoded_labels = []
for index, row in combined_events_df.iterrows():
    # Find the corresponding relation for the event
    relation = combined_tlinks_df[
        (combined_tlinks_df['Event ID 1'] == row['EVENT ID']) |
        (combined_tlinks_df['Event ID 2'] == row['EVENT ID'])
    ]['Relation']

    if not relation.empty:
        encoded_labels.append(relation_mapping[relation.values[0]])
    else:
        # Assign a valid class index for "no relation"
        encoded_labels.append(len(relation_mapping))  # Adjust if needed

# Convert to a numpy array
encoded_labels = np.array(encoded_labels)

# Split data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(combined_input_sequences, encoded_labels, test_size=0.2, random_state=42)

print("Training Set Shape:", X_train.shape, y_train.shape)
print("Validation Set Shape:", X_val.shape, y_val.shape)

# Define model parameters
vocab_size = len(tokenizer.word_index) + 1  # Size of the vocabulary
embedding_dim = 128  # Dimension of the embedding layer
max_length = X_train.shape[1]  # Maximum length of the input sequences
num_classes = len(relation_mapping) + 1  # Number of classes including "no relation"

# Build the LSTM model with Bidirectional LSTM for better performance
model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_length))
model.add(Bidirectional(LSTM(64, return_sequences=True, kernel_regularizer=l2(0.001))))
model.add(Dropout(0.5))
model.add(Bidirectional(LSTM(64, kernel_regularizer=l2(0.001))))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax', kernel_regularizer=l2(0.001)))  # +1 for "no relation" class

# Compile the model
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Early stopping to prevent overfitting
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val), callbacks=[early_stopping])

# Display the model summary
model.summary()


Events DataFrame:
  EVENT ID EVENT Text                        Context Sentence
0       e1   watching  \nNEW YORK _ A Brooklyn woman who was 
1       e2     killed  \nNEW YORK _ A Brooklyn woman who was 
2       e4    emptied  \nNEW YORK _ A Brooklyn woman who was 
3       e6       said  \nNEW YORK _ A Brooklyn woman who was 
4       e7   appeared  \nNEW YORK _ A Brooklyn woman who was 

TIMEX3 DataFrame:
  TIMEX3 ID       TIMEX3 Text
0       t44  Thursday evening
1       t46  around 7:15 p.m.
2       t47   a few years ago
3        t1           Tuesday
4        t2      three months

T-LINKs DataFrame:
  Event ID 1 Event ID 2      Relation
0      ei216      ei215      INCLUDES
1      ei239      ei238        BEFORE
2      ei210      ei211      INCLUDES
3      ei211      ei212  SIMULTANEOUS
4      ei214      ei215         AFTER
Training Set Shape: (44, 1) (44,)
Validation Set Shape: (12, 1) (12,)
Epoch 1/10




[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 991ms/step - accuracy: 0.2642 - loss: 2.6376 - val_accuracy: 1.0000 - val_loss: 2.6000
Epoch 2/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 43ms/step - accuracy: 1.0000 - loss: 2.5957 - val_accuracy: 1.0000 - val_loss: 2.5590
Epoch 3/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step - accuracy: 1.0000 - loss: 2.5546 - val_accuracy: 1.0000 - val_loss: 2.5187
Epoch 4/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 1.0000 - loss: 2.5140 - val_accuracy: 1.0000 - val_loss: 2.4791
Epoch 5/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 43ms/step - accuracy: 1.0000 - loss: 2.4745 - val_accuracy: 1.0000 - val_loss: 2.4399
Epoch 6/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 1.0000 - loss: 2.4354 - val_accuracy: 1.0000 - val_loss: 2.4009
Epoch 7/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[3

# Handling Multiple Events

(no relation means theres no index,since we were moving away from outputting indices)


In [15]:
# Example sentences with corresponding events
example_sentences = [
    "John went to the store.",
    "He bought milk.",
    "Later, he returned home.",
    "The dog barked.",
    "He fed the dog."
]

# Corresponding events (for demonstration purposes)
example_events = [
    "went to the store",
    "bought milk",
    "returned home",
    "barked",
    "fed the dog"
]

# Tokenize and pad the example events
example_event_sequences = tokenizer.texts_to_sequences(example_events)
padded_example_event_sequences = pad_sequences(example_event_sequences, padding='post', maxlen=X_train.shape[1])

# Make predictions using the trained model
predictions = model.predict(padded_example_event_sequences)

# Decode predictions to find the corresponding relations
predicted_relations = np.argmax(predictions, axis=1)

# Create a mapping from index to relation
index_to_relation = {idx: relation for relation, idx in relation_mapping.items()}

# Display the predictions for each event
for event, relation_index in zip(example_events, predicted_relations):
    predicted_relation = index_to_relation.get(relation_index, "No relation")
    print(f"Event: '{event}' - Predicted Relation: {predicted_relation}")


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 564ms/step
Event: 'went to the store' - Predicted Relation: No relation
Event: 'bought milk' - Predicted Relation: No relation
Event: 'returned home' - Predicted Relation: No relation
Event: 'barked' - Predicted Relation: No relation
Event: 'fed the dog' - Predicted Relation: No relation


# Compound Sentences

In [16]:
# Example sentence with four events
example_sentence_four_events = "Alice woke up, made breakfast, attended a meeting, and then went for a run."

# Corresponding events
example_events_four = [
    "woke up",
    "made breakfast",
    "attended a meeting",
    "went for a run"
]

# Tokenize and pad the example events
example_event_sequences_four = tokenizer.texts_to_sequences(example_events_four)
padded_example_event_sequences_four = pad_sequences(example_event_sequences_four, padding='post', maxlen=X_train.shape[1])

# Make predictions using the trained model
predictions_four = model.predict(padded_example_event_sequences_four)

# Decode predictions to find the corresponding relations
predicted_relations_four = np.argmax(predictions_four, axis=1)

# Create a mapping from index to relation
index_to_relation = {idx: relation for relation, idx in relation_mapping.items()}

# Display the predictions for each event
for event, relation_index in zip(example_events_four, predicted_relations_four):
    predicted_relation = index_to_relation.get(relation_index, "No relation")
    print(f"Event: '{event}' - Predicted Relation: {predicted_relation}")


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 620ms/step
Event: 'woke up' - Predicted Relation: No relation
Event: 'made breakfast' - Predicted Relation: No relation
Event: 'attended a meeting' - Predicted Relation: No relation
Event: 'went for a run' - Predicted Relation: No relation
