Add compute remaining time + regression task #1

federicotorrielli · 2024-06-14T15:17:04Z

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

### REMAINING TIME CALCULATION ###
df_log['TIMESTAMP'] = pd.to_datetime(df_log['TIMESTAMP'])
df_log = df_log.sort_values(by=['CaseID', 'TIMESTAMP'])
df_log['remaining_time'] = df_log.groupby('CaseID')['TIMESTAMP'].transform(lambda x: x.max() - x)

### CONVERT REMAINING TIME TO SECONDS ###
df_log['remaining_time_seconds'] = df_log['remaining_time'].dt.total_seconds()

### DEFINE FEATURES AND TARGET ###
activity_columns = [col for col in df_log.columns if col.startswith('ACTIVITY_')]
X = df_log[activity_columns + ['remaining_time_seconds']]
y = df_log['OUTCOME']

### SPLIT DATA ###
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### TRAIN LOGISTIC REGRESSION MODEL ###
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

### PREDICT AND EVALUATE ###
y_pred = model.predict(X_test)
report = classification_report(y_test, y_pred, output_dict=True)
report_df = pd.DataFrame(report).transpose()

### DISPLAY CLASSIFICATION REPORT ###
print(report_df)

Non so neanche se sia 100% corretto, prova tu :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compute remaining time + regression task #1

Add compute remaining time + regression task #1

federicotorrielli commented Jun 14, 2024

Add compute remaining time + regression task #1

Add compute remaining time + regression task #1

Comments

federicotorrielli commented Jun 14, 2024