# Exemplo 19: Seguros - Análise de Risco de Sinistros (Spark)

Este notebook demonstra a análise de novos sinistros em tempo real para identificar riscos elevados de fraude usando **Spark Streaming**.

**Cenário**: Seguradora recebe notificações de sinistro. Se o valor for > 50k e o histórico do cliente tiver < 1 ano, alertar para auditoria.

## 1. Configuração

In [None]:
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q  https://dlcdn.apache.org/spark/spark-3.5.0/spark-3.5.0-bin-hadoop3.tgz
!tar xf spark-3.5.0-bin-hadoop3.tgz
!wget -q https://downloads.apache.org/kafka/3.6.1/kafka_2.13-3.6.1.tgz
!tar xf kafka_2.13-3.6.1.tgz
!pip install -q findspark pyspark kafka-python

In [None]:
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-3.5.0-bin-hadoop3"
import findspark
findspark.init()

## 2. Iniciar Kafka

In [None]:
%%bash
cd kafka_2.13-3.6.1
bin/zookeeper-server-start.sh -daemon config/zookeeper.properties
sleep 5
bin/kafka-server-start.sh -daemon config/server.properties
sleep 5
bin/kafka-topics.sh --create --topic insurance-claims --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1

## 3. Simulador de Sinistros

In [None]:
import json
import time
import random
from kafka import KafkaProducer
import threading

def generate_claims():
    producer = KafkaProducer(bootstrap_servers=['localhost:9092'],
                             value_serializer=lambda x: json.dumps(x).encode('utf-8'))
    
    try:
        for _ in range(100):
            data = {
                'claim_id': random.randint(10000, 99999),
                'policy_holder_years': random.randint(0, 10),
                'claim_amount': random.randint(1000, 100000),
                'timestamp': time.time()
            }
            producer.send('insurance-claims', value=data)
            time.sleep(0.1)
    finally:
        producer.close()

thread = threading.Thread(target=generate_claims)
thread.start()

## 4. Análise de Risco

In [None]:
%%writefile kafka_consumer.py
from pyspark.sql import SparkSession
from pyspark.sql.functions import from_json, col, when
from pyspark.sql.types import StructType, StructField, IntegerType, FloatType

spark = SparkSession.builder.appName("InsuranceRisk").getOrCreate()

schema = StructType([
    StructField("claim_id", IntegerType()),
    StructField("policy_holder_years", IntegerType()),
    StructField("claim_amount", IntegerType()),
    StructField("timestamp", FloatType())
])

df = spark.readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", "localhost:9092") \
    .option("subscribe", "insurance-claims") \
    .load()

claims = df.select(from_json(col("value").cast("string"), schema).alias("data")).select("data.*")

# Regra: Se valor > 50k E cliente novo (< 1 ano) -> ALTO RISCO
risk_assessment = claims.withColumn("risk_level", 
    when((col("claim_amount") > 50000) & (col("policy_holder_years") < 1), "HIGH")
    .otherwise("LOW")
)

high_risk_claims = risk_assessment.filter("risk_level = 'HIGH'")

query = high_risk_claims.writeStream \
    .outputMode("append") \
    .format("console") \
    .start()


query.awaitTermination()

In [None]:
!spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1 kafka_consumer.py