**Importation des bibliothèques necéssaires pour la prédiction de Ransomware**



In [None]:
import pandas as pd
import glob
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report

Importer les datasets et les concatener dans un seul dataframe
Ajouter à l'en-tête les noms des colonnes

In [None]:
files = glob.glob("*.csv")
column_names = ["UNIX time in sec", "UNIX time in ns", "LBA", "size", "Entropy #1", "Entropy #2"]
df = pd.DataFrame(columns=column_names)
for fl in files :
  single_df = pd.read_csv(fl, names=column_names)
  df = pd.concat([df, single_df], ignore_index=True)

Ajouter une colonne supplémentaire `is_ransomware` qu'on va utiliser comme target. On a choisit le seuil 0.2 pour décider que c'est un ransomware ou pas. Si la moyenne des deux entropies dépasse ce seuil, c'est un ransomware.

In [None]:
df["is_ransomware"] = 0
df.loc[(df["Entropy #1"] + df["Entropy #2"] > 0.2), "is_ransomware"]=1

Choix de target et features

In [None]:
y = df.is_ransomware
features = ["UNIX time in sec", "UNIX time in ns", "LBA", "size", "Entropy #1", "Entropy #2"]
X = df[features]

**Splitting the dataset**
On divise le dataset en 3 parties, 70% pour le training, 15% pour validation et 15% pour le testing

In [None]:
X_train_temp, X_test, y_train_temp, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train_temp, y_train_temp, test_size=0.5, random_state=42)

Création du modèle en utilisant **DecisionTreeRegressor**
Etape de **training**

In [None]:
rsm_model = DecisionTreeRegressor()
rsm_model.fit(X_train, y_train)

Etape de **validation**

In [None]:
y_val_pred = rsm_model.predict(X_val)

Calcul des performance de la validation

In [None]:
val_accuracy = accuracy_score(y_val, y_val_pred)
val_precision = precision_score(y_val, y_val_pred)
val_recall = recall_score(y_val, y_val_pred)
val_f1 = f1_score(y_val, y_val_pred)

print("Validation Accuracy:", val_accuracy)
print("Validation Precision:", val_precision)
print("Validation Recall:", val_recall)
print("Validation F1-score:", val_f1)

Validation Accuracy: 0.9998399343730929
Validation Precision: 0.9998511643010811
Validation Recall: 0.9999323419168888
Validation F1-score: 0.9998917514613552


Etape de **test** et calcul de performance

In [None]:
y_test_pred = rsm_model.predict(X_test)

test_accuracy = accuracy_score(y_test, y_test_pred)
test_precision = precision_score(y_test, y_test_pred)
test_recall = recall_score(y_test, y_test_pred)
test_f1 = f1_score(y_test, y_test_pred)

conf_matrix = confusion_matrix(y_test, y_test_pred)
classification_rep = classification_report(y_test, y_test_pred)

# Display test set metrics
print("\nTest Accuracy:", test_accuracy)
print("Test Precision:", test_precision)
print("Test Recall:", test_recall)
print("Test F1-score:", test_f1)

print("\nConfusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", classification_rep)


Test Accuracy: 0.9998949567571984
Test Precision: 0.9998901909109448
Test Recall: 0.999968623513539
Test F1-score: 0.9999294056742151

Confusion Matrix:
 [[21930     7]
 [    2 63740]]

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00     21937
           1       1.00      1.00      1.00     63742

    accuracy                           1.00     85679
   macro avg       1.00      1.00      1.00     85679
weighted avg       1.00      1.00      1.00     85679

