## Synthetic Login Log Example (Normal Behavior + Anomalies)

In this example we generate a small synthetic dataset that mimics a simplified login log.
The goal is to create a dataset that is easy to understand, fast to process, and suitable for demonstrating anomaly detection methods such as Isolation Forest.

### Normal login events

Normal events follow realistic patterns:

- Logins happen during regular working hours (09:00–19:00).
- Users belong to a small set (user_1 … user_25).
- Cities are typical locations inside one country.
- Devices are common and benign (windows, android, ios).
- IP addresses are from private network ranges.

All logins are successful (success = 1).

These represent “normal” behavior that the model should treat as non-anomalous.

### Anomalous login events

We also generate a small set of intentionally unusual login events:

- Logins occur during the night (00:00–04:00).
- Cities come from unexpected or distant regions.
- Devices use abnormal values such as unknown_os, rooted_android, etc.
- IP addresses are public instead of private.
- Login attempts may be successful or unsuccessful.

These synthetic outliers mimic suspicious activity such as logins from unusual locations, compromised devices, or unexpected hours.

In [1]:
!pip install faker

Collecting faker
  Downloading faker-38.2.0-py3-none-any.whl.metadata (16 kB)
Downloading faker-38.2.0-py3-none-any.whl (2.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m6.2 MB/s[0m  [33m0:00:00[0m eta [36m0:00:01[0m
[?25hInstalling collected packages: faker
Successfully installed faker-38.2.0


In [2]:
from faker import Faker
import numpy as np
import pandas as pd
import random
from datetime import timedelta

fake = Faker()

# Simplified fast IP generation using Faker's random module
def fast_public_ip():
    # Using faker.random.getrandbits(...) avoids heavy IPv4 logic
    r = fake.random.getrandbits
    return f"{r(8)}.{r(8)}.{r(8)}.{r(8)}"

def fast_private_ip():
    # Simulating 192.168.x.x private networks without heavy faker checks
    r = fake.random.getrandbits
    return f"192.168.{r(8)}.{r(8)}"


def generate_normal_logins(n=500):
    data = []
    users = [f"user_{i}" for i in range(1, 26)]
    devices = ["windows", "android", "ios"]
    cities = ["Tel Aviv", "Haifa", "Jerusalem", "Raanana", "Holon"]
    
    base_time = pd.Timestamp("2025-01-01 09:00")

    for _ in range(n):
        user = random.choice(users)
        
        # Normal working hours: 09:00–19:00
        timestamp = base_time + timedelta(
            minutes=random.randint(0, 10*60)
        )
        
        device = random.choice(devices)
        city = random.choice(cities)
        ip = fast_private_ip()   # much faster than faker.ipv4_private()

        data.append({
            "timestamp": timestamp,
            "user": user,
            "city": city,
            "device": device,
            "ip": ip,
            "success": 1
        })
    
    return data


def generate_anomalies(n=20):
    data = []
    users = [f"user_{i}" for i in range(1, 26)]
    weird_devices = ["unknown_os", "legacy_nt", "rooted_android", "jailbroken_ios"]
    weird_cities = ["Moscow", "Shanghai", "Tehran", "Lagos", "Caracas"]

    for _ in range(n):
        user = random.choice(users)
        
        # Night-time anomalies: 00:00–04:00
        timestamp = pd.Timestamp("2025-01-01") + timedelta(
            hours=random.choice([0, 1, 2, 3, 4])
        )
        
        city = random.choice(weird_cities)
        device = random.choice(weird_devices)
        ip = fast_public_ip()    # faster than faker.ipv4_public()

        data.append({
            "timestamp": timestamp,
            "user": user,
            "city": city,
            "device": device,
            "ip": ip,
            "success": random.choice([0, 1])
        })
    
    return data


# Final dataset
normal = generate_normal_logins(600)
anomalies = generate_anomalies(30)

df = pd.DataFrame(normal + anomalies)
df = df.sample(frac=1).reset_index(drop=True)

In [3]:
df.head(20)

Unnamed: 0,timestamp,user,city,device,ip,success
0,2025-01-01 18:35:00,user_8,Jerusalem,ios,192.168.135.210,1
1,2025-01-01 13:26:00,user_6,Tel Aviv,windows,192.168.58.98,1
2,2025-01-01 09:28:00,user_9,Tel Aviv,android,192.168.139.77,1
3,2025-01-01 17:40:00,user_19,Haifa,ios,192.168.23.59,1
4,2025-01-01 01:00:00,user_3,Caracas,unknown_os,200.54.149.86,0
5,2025-01-01 13:25:00,user_15,Haifa,ios,192.168.195.253,1
6,2025-01-01 10:36:00,user_19,Holon,windows,192.168.99.116,1
7,2025-01-01 10:12:00,user_10,Holon,android,192.168.139.230,1
8,2025-01-01 17:34:00,user_9,Raanana,ios,192.168.124.101,1
9,2025-01-01 18:55:00,user_1,Raanana,android,192.168.138.85,1
