# Brain Tumor Detection with Random Forest and Custom Preprocessing
Ova bilježnica prikazuje detekciju tumora mozga na MRI slikama korištenjem Random Forest algoritma i vlastitog skupa podataka BinaryBrainTumorDataset.

## 1. Uvod i ciljevi
Cilj je izgraditi model koji klasificira MRI slike na tumorske i netumorske koristeći Random Forest algoritam. Skup podataka: `BinaryBrainTumorDataset` s dvije klase: "yes" (tumor) i "no" (bez tumora).

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
import cv2
import imutils
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_curve, auc
import seaborn as sns

## 2. Učitavanje i raspodjela podataka
Učitavamo slike iz direktorija, raspodjeljujemo ih na train, validation i test skupove.

In [None]:
IMG_SIZE = (64, 64)  # Manje slike zbog Random Foresta
DATASET_DIR = "../data/BinaryBrainTumorDataset/Training"
TEST_DIR = "../data/BinaryBrainTumorDataset/Testing"

def load_images_from_folder(folder, label, img_size):
    images = []
    labels = []
    for filename in os.listdir(folder):
        img_path = os.path.join(folder, filename)
        img = cv2.imread(img_path)
        if img is not None:
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            img = cv2.resize(img, img_size)
            images.append(img)
            labels.append(label)
    return images, labels

X = []
y = []
for class_name in ["yes", "no"]:
    imgs, labels = load_images_from_folder(os.path.join(DATASET_DIR, class_name), 1 if class_name == "yes" else 0, IMG_SIZE)
    X.extend(imgs)
    y.extend(labels)

X = np.array(X)
y = np.array(y)

# Split into train/val
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, stratify=y, random_state=42)

## 3. Vizualizacija podataka
Prikaz nekoliko slika iz obje klase.

In [None]:
plt.figure(figsize=(10, 4))
for i in range(5):
    plt.subplot(2, 5, i+1)
    plt.imshow(X_train[y_train == 0][i])
    plt.title("No Tumor")
    plt.axis("off")
    plt.subplot(2, 5, i+6)
    plt.imshow(X_train[y_train == 1][i])
    plt.title("Tumor")
    plt.axis("off")
plt.tight_layout()
plt.show()

## 4. Predobrada: automatsko croppanje mozga
Koristi se kontura za izrezivanje regije mozga iz slike.

In [None]:
def crop_brain(img):
    gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
    gray = cv2.GaussianBlur(gray, (5, 5), 0)
    thresh = cv2.threshold(gray, 45, 255, cv2.THRESH_BINARY)[1]
    thresh = cv2.erode(thresh, None, iterations=2)
    thresh = cv2.dilate(thresh, None, iterations=2)
    cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = imutils.grab_contours(cnts)
    if len(cnts) == 0:
        return img
    c = max(cnts, key=cv2.contourArea)
    extLeft = tuple(c[c[:, :, 0].argmin()][0])
    extRight = tuple(c[c[:, :, 0].argmax()][0])
    extTop = tuple(c[c[:, :, 1].argmin()][0])
    extBot = tuple(c[c[:, :, 1].argmax()][0])
    cropped = img[extTop[1]:extBot[1], extLeft[0]:extRight[0]]
    cropped = cv2.resize(cropped, IMG_SIZE)
    return cropped

X_train_crop = np.array([crop_brain(img) for img in X_train])
X_val_crop = np.array([crop_brain(img) for img in X_val])

## 5. Priprema podataka za Random Forest
Slike se pretvaraju u vektore značajki (flatten).

In [None]:
X_train_flat = X_train_crop.reshape(X_train_crop.shape[0], -1)
X_val_flat = X_val_crop.reshape(X_val_crop.shape[0], -1)

## 6. Model: Random Forest
Treniranje Random Forest klasifikatora.

In [None]:
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train_flat, y_train)

## 7. Evaluacija modela
Prikaz točnosti, ROC krivulje i matrice zabune.

In [None]:
val_preds = rf.predict(X_val_flat)
val_probs = rf.predict_proba(X_val_flat)[:, 1]
accuracy = accuracy_score(y_val, val_preds)
cm = confusion_matrix(y_val, val_preds)
fpr, tpr, _ = roc_curve(y_val, val_probs)
roc_auc = auc(fpr, tpr)

fig, axes = plt.subplots(1, 3, figsize=(18, 5))
# Accuracy
axes[0].bar(["Accuracy"], [accuracy])
axes[0].set_ylim(0, 1)
axes[0].set_title("Accuracy")
# ROC Curve
axes[1].plot(fpr, tpr, label=f"AUC={roc_auc:.2f}")
axes[1].plot([0, 1], [0, 1], "k--")
axes[1].set_title("ROC Curve")
axes[1].legend()
# Confusion Matrix
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=["No Tumor", "Tumor"], yticklabels=["No Tumor", "Tumor"], ax=axes[2])
axes[2].set_xlabel("Predicted")
axes[2].set_ylabel("Actual")
axes[2].set_title("Confusion Matrix")
plt.tight_layout()
plt.show()

print(f"Validation Accuracy: {accuracy:.4f}")
print(classification_report(y_val, val_preds, target_names=["No Tumor", "Tumor"]))

print(f"Tocnost (accuracy): {accuracy:.4f}")
report_dict = classification_report(y_val, val_preds, target_names=["no", "yes"], output_dict=True)
print(f"Preciznost (precision): {report_dict['weighted avg']['precision']:.4f}")
print(f"F1 rezultat: {report_dict['weighted avg']['f1-score']:.4f}")
print(f"AUC-ROC: {roc_auc:.4f}")

## 8. Zaključak
Random Forest model s osnovnom predobradom daje solidne rezultate na zadatku detekcije tumora mozga. Daljnja poboljšanja mogu uključivati naprednije ekstrakcije značajki i optimizaciju hiperparametara.