<a href="https://colab.research.google.com/github/guilhermelaviola/CybersecurityProblemSolvingWithDataScience/blob/main/Class07.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Log Analysis**
Log analysis is a fundamental component of cybersecurity, providing visibility into system, network, and application activities to detect threats, investigate incidents, and ensure compliance. By leveraging data science, artificial intelligence, and machine learning, organizations can automate anomaly detection, correlate events, and perform real-time analysis on large volumes of log data. Proper log collection, normalization, enrichment, and secure storage are essential for effective analysis. When combined with best practices and tools such as SIEM platforms, advanced log analytics strengthens an organization’s ability to proactively detect, respond to, and mitigate evolving cyber threats.

In [1]:
# Importing all the necessary libraries and resources:
import pandas as pd
from sklearn.ensemble import IsolationForest
import time

## **Log Anomaly Detection**
The example below demonstrates how machine learning can be used to automatically identify unusual patterns in log data (such as abnormally high failed login attempts), helping security teams quickly spot potential security incidents.

In [2]:
# Example log data (in this case, the number of failed logins per user per hour)
data = {
    'user': ['alice', 'bob', 'charlie', 'alice', 'bob', 'charlie'],
    'failed_logins': [1, 0, 2, 15, 1, 0]
}

df = pd.DataFrame(data)

# Training an anomaly detection model:
model = IsolationForest(contamination=0.2, random_state=42)
df['anomaly'] = model.fit_predict(df[['failed_logins']])

# Marking anomalies:
df['anomaly'] = df['anomaly'].map({1: 'Normal', -1: 'Anomalous'})

print(df)

      user  failed_logins    anomaly
0    alice              1     Normal
1      bob              0     Normal
2  charlie              2     Normal
3    alice             15  Anomalous
4      bob              1     Normal
5  charlie              0     Normal


## **Real-Time Log Monitoring for Suspicious Keywords**
The example below show the monitoring of a log file in real-time. If it detects any suspicious keywords like “failed login” or “unauthorized access,” it immediately flags the line. This demonstrates a simple form of real-time log analytics, which can be extended with more advanced AI/ML models for automated threat detection.

In [5]:
# Simulated log data as a DataFrame:
logs = pd.DataFrame({
    'timestamp': [
        '2025-12-14 10:00:00',
        '2025-12-14 10:01:00',
        '2025-12-14 10:02:00',
        '2025-12-14 10:03:00'
    ],
    'message': [
        'User alice logged in successfully',
        'Failed login attempt for user bob',
        'Unauthorized access detected in system',
        'User charlie logged out'
    ]
})

# Suspicious keywords:
suspicious_keywords = ['error', 'failed login', 'unauthorized access']

# Simulating real-time log monitoring:
for index, row in logs.iterrows():
    message = row['message']
    for keyword in suspicious_keywords:
        if keyword.lower() in message.lower():
            print(f'[ALERT] Suspicious activity detected at {row['timestamp']}: {message}')
    time.sleep(1)  # Simulate delay as if logs are coming in real-time.

[ALERT] Suspicious activity detected at 2025-12-14 10:01:00: Failed login attempt for user bob
[ALERT] Suspicious activity detected at 2025-12-14 10:02:00: Unauthorized access detected in system
