# Exemplo 1: Aviação - Telemetria de Voo em Tempo Real

Este notebook demonstra um pipeline de processamento distribuído para dados de telemetria de aviação usando **Apache Spark Streaming** e **Kafka**.

**Cenário**: Monitorar altitude, velocidade e temperatura de motores de aeronaves em tempo real para detectar anomalias.

## 1. Configuração do Ambiente
Instalação do Java, Spark, Kafka e bibliotecas Python.

In [None]:
# Instalar Java
!apt-get install openjdk-8-jdk-headless -qq > /dev/null

# Baixar e Instalar Spark
!wget https://archive.apache.org/dist/spark/spark-3.5.0/spark-3.5.0-bin-hadoop3.tgz && tar xf spark-3.5.0-bin-hadoop3.tgz

# Baixar e Instalar Kafka
!wget https://archive.apache.org/dist/kafka/3.6.1/kafka_2.13-3.6.1.tgz && tar xf kafka_2.13-3.6.1.tgz

# Instalar pacotes Python
!pip install -q findspark pyspark kafka-python

In [None]:
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-3.5.0-bin-hadoop3"

import findspark
findspark.init()

## 2. Iniciar Kafka e Zookeeper
Executando em background.

In [None]:
%%bash
cd kafka_2.13-3.6.1
# Iniciar Zookeeper
bin/zookeeper-server-start.sh -daemon config/zookeeper.properties
sleep 5
# Iniciar Kafka
bin/kafka-server-start.sh -daemon config/server.properties
sleep 5
# Criar Tópico
bin/kafka-topics.sh --create --topic flight-telemetry --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1

## 3. Simulador de Dados (Producer)
Gera dados de voo simulados e envia para o Kafka.

In [None]:
import time
import json
import random
from kafka import KafkaProducer
from datetime import datetime
import threading

def generate_flight_data():
    producer = KafkaProducer(bootstrap_servers=['localhost:9092'],
                             value_serializer=lambda x: json.dumps(x).encode('utf-8'))
    
    flights = ['AA-101', 'BA-202', 'DA-303']
    
    try:
        for _ in range(100): # Gerar 100 mensagens
            for flight in flights:
                data = {
                    'timestamp': datetime.now().isoformat(),
                    'flight_id': flight,
                    'altitude': random.randint(30000, 40000),
                    'speed': random.randint(800, 950),
                    'engine_temp': random.randint(500, 900)
                }
                producer.send('flight-telemetry', value=data)
            time.sleep(1)
    except Exception as e:
        print(f"Erro no produtor: {e}")
    finally:
        producer.close()

# Executar produtor em uma thread separada para não bloquear o notebook
thread = threading.Thread(target=generate_flight_data)
thread.start()

## 4. Processamento com Spark Streaming (Consumer)
Lê do Kafka, calcula médias móveis e detecta anomalias (ex: temperatura do motor > 850).

In [None]:
%%writefile kafka_consumer.py
from pyspark.sql import SparkSession
from pyspark.sql.functions import from_json, col, window
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, TimestampType

spark = SparkSession.builder \
    .appName("AviationTelemetry") \
    .config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0") \
    .getOrCreate()

spark.sparkContext.setLogLevel("WARN")

# Schema dos dados
schema = StructType([
    StructField("timestamp", TimestampType()),
    StructField("flight_id", StringType()),
    StructField("altitude", IntegerType()),
    StructField("speed", IntegerType()),
    StructField("engine_temp", IntegerType())
])

# Ler do Kafka
df = spark.readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", "localhost:9092") \
    .option("subscribe", "flight-telemetry") \
    .option("startingOffsets", "latest") \
    .load()

# Parse do JSON
parsed_df = df.select(from_json(col("value").cast("string"), schema).alias("data")).select("data.*")

# Análise: Filtrar Alta Temperatura
high_temp_df = parsed_df.filter(col("engine_temp") > 850)

# Escrever no console (para fins de demonstração)
query = high_temp_df.writeStream \
    .outputMode("append") \
    .format("console") \
    .start()


query.awaitTermination()

In [None]:
# Parar Spark
spark.stop()

In [None]:
!spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1 kafka_consumer.py