# Problem Formulation

Instead of predicting whether one event occurs before another (binary classification), the goal is to predict the entire sequence of events in the correct chronological order. This shifts the focus from pairwise comparisons to sequence prediction.

#Inputs and Outputs

Inputs: A sequence of events, represented by their textual descriptions. Each event within a context (e.g., a sentence or paragraph) is tokenized and represented as a sequence of embeddings.

Outputs: The output is a sequence representing the correct chronological order of these events. For example, if three events are given, the model might output a permutation of indices like [2, 0, 1], indicating the correct order.

#The Architecture

1. Input Representation
The input to the model is a sequence of events, where each event is represented as a tokenized sequence of words.

Event Sequence: Each event is turned into a sequence of tokens (e.g., words or subwords).
Contextual Information: The context of each event (e.g., the surrounding sentence) is also tokenized and can be included to provide additional information.
2. Embedding Layer

The tokenized sequences are passed through an Embedding Layer. This layer converts each token (word) into a dense vector representation, capturing semantic meanings and relationships between words.
The result is a sequence of embeddings representing the events and their contexts.

3. Sequence Processing (LSTM/GRU)
The embedded event sequences are fed into LSTM (Long Short-Term Memory) or GRU (Gated Recurrent Unit) layers.
These layers process the sequence of embeddings, capturing temporal dependencies and the order in which events occur.
Since LSTMs can remember long-term dependencies, they are well-suited for understanding the sequence and temporal relationships between events.

4. Attention Mechanism (Optional)
An Attention Mechanism can be applied to focus on the most important parts of the input sequence when making predictions.
Attention helps the model weigh the significance of each event in the sequence, making it easier to determine their correct order.

5. Output Layer: Sequence Prediction
After processing the sequence with LSTM/GRU layers, the model generates an output that predicts the order of events.
The output layer typically consists of dense layers with a softmax activation function, which outputs a probability distribution over the possible event orders.

6. Decoding the Output
Training: During training, the model is trained to minimize the difference between its predicted order and the correct order (ground truth). The correct order comes from the temporal relations (T-LINKs) in the dataset.
Inference: During inference, the model outputs a sequence of indices that represent the predicted chronological order of the events.

7. Example
Consider a sequence of three events:

Input Sequence: ["Event A happened", "Event B happened", "Event C happened"]

Tokenized Input: [[1, 34, 56], [2, 34, 57], [3, 34, 58]] (where numbers are token indices)

Model Prediction: The model processes this sequence and outputs [2, 1, 3], meaning the correct temporal order is Event B, Event A, Event C.


# 1.Extraction
Whats going on here?

1. Parsing TimeML Files

2. Extracting Events and Context Sentences

The events are stored in a list of dictionaries, where each dictionary contains:

EVENT ID: The unique identifier for the event.

EVENT Text: The text content of the event.

Context Sentence: The sentence in which the event appears.

3. Extracting Temporal Links (T-LINKs):

The T-LINKs are stored in a list of dictionaries, where each dictionary contains:

Event ID 1: The first event in the temporal relationship.

Event ID 2: The second event in the temporal relationship.

Relation: The type of temporal relation between the two events.


4. Combining DataFrames




In [1]:
import pandas as pd
import xml.etree.ElementTree as ET
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

# Function to parse a TML file (TimeML format) and extract events, T-LINKs, and TIMEX3
def parse_tml_with_context(file_path):
    tree = ET.parse(file_path)
    root = tree.getroot()

    events = []
    tlinks = []
    timex3s = []

    # Extract events and TIMEX3
    for s in root.iter('TEXT'):
        sentence_text = s.text
        for event in s.iter('EVENT'):
            event_id = event.attrib['eid']
            event_text = event.text
            events.append({'EVENT ID': event_id, 'EVENT Text': event_text, 'Context Sentence': sentence_text})

        for timex in s.iter('TIMEX3'):
            timex_id = timex.attrib['tid']
            timex_text = timex.text
            timex3s.append({'TIMEX3 ID': timex_id, 'TIMEX3 Text': timex_text})

    # Extract T-LINKs
    for tlink in root.iter('TLINK'):
        event_id_1 = tlink.attrib.get('eventInstanceID')
        event_id_2 = tlink.attrib.get('relatedToEventInstance')
        relation = tlink.attrib.get('relType')

        if event_id_1 and event_id_2:
            tlinks.append({'Event ID 1': event_id_1, 'Event ID 2': event_id_2, 'Relation': relation})

    events_df = pd.DataFrame(events)
    timex3_df = pd.DataFrame(timex3s)
    tlinks_df = pd.DataFrame(tlinks)

    return events_df, timex3_df, tlinks_df


#  Parsing and Combining Datasets

Function Definition: A function (parse_tml_with_context) is created to parse TimeML files and extract:

Events: Including their IDs, text, and the context sentence they belong to.

TIMEX3: Temporal expressions along with their IDs.

T-LINKs: Relationships between events.

Dataset Loading: The function is called for two datasets, TimeBank.tml and TimeEval3.tml, producing three DataFrames for each dataset (events, TIMEX3, T-LINKs).

Combining DataFrames: The resulting DataFrames from both datasets are concatenated to form combined DataFrames for events, TIMEX3, and T-LINKs.

Output: The first few rows of each combined DataFrame are printed for inspection.

In [2]:
# Load the datasets
timebank_events_df, timebank_timex3_df, timebank_tlinks_df = parse_tml_with_context('TimeBank.tml')
timeeval3_events_df, timeeval3_timex3_df, timeeval3_tlinks_df = parse_tml_with_context('TimeEval3.tml')

# Combine the datasets
combined_events_df = pd.concat([timebank_events_df, timeeval3_events_df], ignore_index=True)
combined_timex3_df = pd.concat([timebank_timex3_df, timeeval3_timex3_df], ignore_index=True)
combined_tlinks_df = pd.concat([timebank_tlinks_df, timeeval3_tlinks_df], ignore_index=True)


# Preparing Input Data

Input Data Preparation


Padded Sequences of Event Texts:

This will be the primary input for the LSTM model.
You will tokenize the text of events and pad the sequences to ensure uniform input length.

Encoded Labels from T-LINKs:

These will serve as the target output for the model.
You will encode the relationships specified by the T-LINKs into numerical labels.

TIMEX3 Information (Optional):

If you choose to include TIMEX3 entities, this could provide additional temporal context.
You can extract TIMEX3 texts and either use them as additional features or include them in the context of the event texts.

In [4]:
import numpy as np
import pandas as pd
from tensorflow.keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

# Prepare input data for events
tokenizer = Tokenizer()
tokenizer.fit_on_texts(combined_events_df['EVENT Text'].tolist())  # Fit on event texts
event_sequences = tokenizer.texts_to_sequences(combined_events_df['EVENT Text'].tolist())  # Convert texts to sequences

# Prepare input data for TIMEX3 (if present)
timex3_sequences = tokenizer.texts_to_sequences(combined_timex3_df['TIMEX3 Text'].tolist())

# Pad both event and TIMEX3 sequences
max_length_events = max(len(s) for s in event_sequences)  # Max length for events
max_length_timex3 = max(len(s) for s in timex3_sequences)  # Max length for TIMEX3

# Pad sequences
padded_event_sequences = pad_sequences(event_sequences, maxlen=max_length_events, padding='post')  # Pad event sequences
padded_timex3_sequences = pad_sequences(timex3_sequences, maxlen=max_length_timex3, padding='post')  # Pad TIMEX3 sequences

# Ensure both sequences have the same number of rows
num_samples = max(padded_event_sequences.shape[0], padded_timex3_sequences.shape[0])

# Adjust padding for events if needed
if padded_event_sequences.shape[0] < num_samples:
    extra_event_rows = np.zeros((num_samples - padded_event_sequences.shape[0], max_length_events))
    padded_event_sequences = np.vstack([padded_event_sequences, extra_event_rows])

# Adjust padding for TIMEX3 if needed
if padded_timex3_sequences.shape[0] < num_samples:
    extra_timex3_rows = np.zeros((num_samples - padded_timex3_sequences.shape[0], max_length_timex3))
    padded_timex3_sequences = np.vstack([padded_timex3_sequences, extra_timex3_rows])

# Combine event and TIMEX3 sequences
combined_input_sequences = np.concatenate((padded_event_sequences, padded_timex3_sequences), axis=1)

print("Shape of combined_input_sequences:", combined_input_sequences.shape)


Shape of combined_input_sequences: (56, 1)


In [5]:
from sklearn.preprocessing import LabelEncoder

# Create encoded labels with a valid class for no relation
relation_mapping = {relation: idx for idx, relation in enumerate(combined_tlinks_df['Relation'].unique())}
encoded_labels = []
for index, row in combined_events_df.iterrows():
    # Find the corresponding relation for the event
    relation = combined_tlinks_df[
        (combined_tlinks_df['Event ID 1'] == row['EVENT ID']) |
        (combined_tlinks_df['Event ID 2'] == row['EVENT ID'])
    ]['Relation']

    if not relation.empty:
        encoded_labels.append(relation_mapping[relation.values[0]])
    else:
        # Assign a valid class index for "no relation"
        encoded_labels.append(len(relation_mapping))  # Adjust if needed

# Convert to a numpy array
encoded_labels = np.array(encoded_labels)
print("Encoded Labels Shape:", encoded_labels.shape)
print("Encoded Labels Unique Values:", np.unique(encoded_labels))  # Check the unique values


Encoded Labels Shape: (56,)
Encoded Labels Unique Values: [6]


In [6]:
# Split data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(combined_input_sequences, encoded_labels, test_size=0.2, random_state=42)

print("Training Set Shape:", X_train.shape, y_train.shape)
print("Validation Set Shape:", X_val.shape, y_val.shape)


Training Set Shape: (44, 1) (44,)
Validation Set Shape: (12, 1) (12,)


In [16]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.regularizers import l2

# Define model parameters
vocab_size = len(tokenizer.word_index) + 1  # Size of the vocabulary
embedding_dim = 128  # Dimension of the embedding layer
max_length = X_train.shape[1]  # Maximum length of the input sequences
num_classes = len(relation_mapping) + 1  # Number of classes including "no relation"

# Build the LSTM model
model = Sequential()

# Embedding layer
model.add(Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_length))

# LSTM layer with L2 regularization
model.add(LSTM(64, return_sequences=True, kernel_regularizer=l2(0.001)))
model.add(Dropout(0.5))  # Dropout for regularization

# LSTM layer with L2 regularization
model.add(LSTM(64, kernel_regularizer=l2(0.001)))
model.add(Dropout(0.5))

# Dense output layer with L2 regularization
model.add(Dense(num_classes, activation='softmax', kernel_regularizer=l2(0.001)))  # +1 for "no relation" class

# Compile the model
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Early stopping to prevent overfitting
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val), callbacks=[early_stopping])




Epoch 1/10




[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 510ms/step - accuracy: 0.3722 - loss: 2.2279 - val_accuracy: 1.0000 - val_loss: 2.2099
Epoch 2/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step - accuracy: 1.0000 - loss: 2.2077 - val_accuracy: 1.0000 - val_loss: 2.1894
Epoch 3/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step - accuracy: 1.0000 - loss: 2.1867 - val_accuracy: 1.0000 - val_loss: 2.1693
Epoch 4/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step - accuracy: 1.0000 - loss: 2.1667 - val_accuracy: 1.0000 - val_loss: 2.1494
Epoch 5/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step - accuracy: 1.0000 - loss: 2.1469 - val_accuracy: 1.0000 - val_loss: 2.1296
Epoch 6/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step - accuracy: 1.0000 - loss: 2.1261 - val_accuracy: 1.0000 - val_loss: 2.1099
Epoch 7/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37

In [17]:
# Evaluate the model
loss, accuracy = model.evaluate(X_val, y_val)
print(f'Validation Loss: {loss}')
print(f'Validation Accuracy: {accuracy}')


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 25ms/step - accuracy: 1.0000 - loss: 2.0297
Validation Loss: 2.0296690464019775
Validation Accuracy: 1.0


In [26]:
# Example input sentences
example_events = [
    "John started his new job 1 month ago.",
    "Mary graduated from university last week."
]

# Prepare input data for events
event_sequences = tokenizer.texts_to_sequences(example_events)
padded_event_sequences = pad_sequences(event_sequences, maxlen=max_length, padding='post')

# Assuming we have TIMEX3 texts (adjust these as needed)
example_timex3 = [
    "1 month ago",
    "last week"
]

# Prepare input data for TIMEX3
timex3_sequences = tokenizer.texts_to_sequences(example_timex3)
padded_timex3_sequences = pad_sequences(timex3_sequences, maxlen=max_length, padding='post')

# Combine event and TIMEX3 sequences
combined_input_sequences = np.concatenate((padded_event_sequences, padded_timex3_sequences), axis=1)


In [27]:
# Make predictions
predictions = model.predict(combined_input_sequences)

# Get the indices of the events (or temporal relations)
predicted_indices = np.argsort(np.argmax(predictions, axis=1))


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step


In [28]:
print("Predicted Event Indices (Temporal Order):", predicted_indices)


Predicted Event Indices (Temporal Order): [0 1]


Index 0 corresponds to the first event: "John started his new job 1 month ago."
Index 1 corresponds to the second event: "Mary graduated from university last week."
Interpretation:
The model predicts that "John started his new job 1 month ago" happens before "Mary graduated from university last week."
This ordering makes sense based on the temporal information provided in the sentences:
"1 month ago" suggests that the event occurred earlier than "last week."