# Exemplo 11: Cibersegurança - Detecção de Intrusão (Spark)

Este notebook demonstra a detecção de ataques de rede (ex: DDoS) em tempo real usando **Spark Streaming**.

**Cenário**: Monitorar tráfego de rede e bloquear IPs com taxa de requisição anormalmente alta.

## 1. Configuração

In [None]:
# Instalar Java
!apt-get install openjdk-8-jdk-headless -qq > /dev/null

# Baixar e Instalar Spark
!wget https://archive.apache.org/dist/spark/spark-3.5.0/spark-3.5.0-bin-hadoop3.tgz && tar xf spark-3.5.0-bin-hadoop3.tgz

# Baixar e Instalar Kafka
!wget https://archive.apache.org/dist/kafka/3.6.1/kafka_2.13-3.6.1.tgz && tar xf kafka_2.13-3.6.1.tgz

# Instalar pacotes Python
!pip install -q findspark pyspark kafka-python

In [None]:
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-3.5.0-bin-hadoop3"
import findspark
findspark.init()

## 2. Iniciar Kafka

In [None]:
%%bash
cd kafka_2.13-3.6.1
bin/zookeeper-server-start.sh -daemon config/zookeeper.properties
sleep 5
bin/kafka-server-start.sh -daemon config/server.properties
sleep 5
bin/kafka-topics.sh --create --topic network-logs --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1

## 3. Simulador de Tráfego de Rede

In [None]:
import json
import time
import random
from kafka import KafkaProducer
import threading
from datetime import datetime

def generate_network_traffic():
    producer = KafkaProducer(bootstrap_servers=['localhost:9092'],
                             value_serializer=lambda x: json.dumps(x).encode('utf-8'))
    ips = [f'192.168.1.{i}' for i in range(1, 101)]
    attacker_ip = '10.0.0.666' # IP Atacante
    
    try:
        for _ in range(500):
            # Tráfego normal
            data = {'source_ip': random.choice(ips), 'destination_port': 80, 'timestamp': datetime.now().isoformat()}
            producer.send('network-logs', value=data)
            
            # Tráfego de Ataque (Flood)
            if random.random() < 0.3:
                for _ in range(10):
                    data = {'source_ip': attacker_ip, 'destination_port': 80, 'timestamp': datetime.now().isoformat()}
                    producer.send('network-logs', value=data)
                    
            time.sleep(0.01)
    finally:
        producer.close()

thread = threading.Thread(target=generate_network_traffic)
thread.start()

## 4. Detecção de Intrusão (DoS)

In [None]:
%%writefile kafka_consumer.py
from pyspark.sql import SparkSession
from pyspark.sql.functions import from_json, col, window, count, when
from pyspark.sql.types import StructType, StructField, StringType, TimestampType

spark = SparkSession.builder.appName("IntrusionDetection").getOrCreate()

schema = StructType([
    StructField("source_ip", StringType()),
    StructField("destination_port", StringType()),
    StructField("timestamp", TimestampType())
])

df = spark.readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", "localhost:9092") \
    .option("subscribe", "network-logs") \
    .load()

logs = df.select(from_json(col("value").cast("string"), schema).alias("data")).select("data.*")

# Contar requisições por IP em janelas de 5 segundos
traffic_stats = logs \
    .withWatermark("timestamp", "5 seconds") \
    .groupBy(
        window(col("timestamp"), "5 seconds"),
        col("source_ip")
    ) \
    .count()

# Se count > 50 em 5 seg, possível DoS
alerts = traffic_stats.filter(col("count") > 50).withColumn("alert", when(col("count") > 50, "POSSIBLE DDOS"))

query = alerts.writeStream \
    .outputMode("update") \
    .format("console") \
    .option("truncate", "false") \
    .start()


query.awaitTermination()

In [None]:
!spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1 kafka_consumer.py