# Lab 2 — Basic Anomaly Detection for Cybersecurity Logs

**Student:** Nadav Shapira

MITRE ATT&CK Technique: **T1078 – Valid Accounts**

## 1. Dataset Preparation
Synthetic login data with rare anomalous behavior (~3%).

In [None]:
df.head()

### Dataset Statistics

In [None]:
print(df.shape)
print(df['attack'].value_counts())

## 2. Exploratory Data Analysis

In [None]:
plt.hist(df['login_duration'], bins=50); plt.show()

In [None]:
df['hour'].value_counts().sort_index().plot(kind='bar'); plt.show()

Normal logins cluster during business hours. Anomalies are long sessions at night.

## 3. Isolation Forest

In [None]:

X = df.drop(columns=['attack'])
y = df['attack']

preprocess = ColumnTransformer([
 ('cat', OneHotEncoder(handle_unknown='ignore'), ['user','country']),
 ('num', StandardScaler(), ['hour','login_duration'])
])

pipe = Pipeline([
 ('prep', preprocess),
 ('iforest', IsolationForest(contamination=0.03, random_state=42))
])

pipe.fit(X)
df['anomaly'] = (pipe.predict(X) == -1).astype(int)
df['score'] = pipe.named_steps['iforest'].decision_function(pipe.named_steps['prep'].transform(X))


In [None]:
print('Detected:', df['anomaly'].sum())
print('Precision:', precision_score(y, df['anomaly']))

In [None]:
plt.hist(df['score'], bins=50); plt.show()

## 4. PCA Visualization

In [None]:

X_t = preprocess.fit_transform(X)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_t.toarray())

plt.scatter(X_pca[:,0], X_pca[:,1], c=df['anomaly'], alpha=0.5)
plt.show()


Anomalies appear as sparse outliers separated from dense normal clusters.

## Conclusion
Isolation Forest successfully detected suspicious login behavior consistent with credential misuse.