# Sistem Deteksi Bullying melalui Analisis Sentimen & CCTV 
**Kelompok:** 
1. Alysa Meliana (F1D02310035)
2. Azizah Indriani Putri (F1D02310041)
3. Baiq Mutia Dewi Edelweiss (F1D02310107)
4. Fairuza Luthfiana (F1D02310111)
5. Syazwani (F1D02310140)
# 
**Universitas Mataram - Teknik Informatika - 2025/2026**
 
## Pendahuluan
Sistem ini dirancang untuk mendeteksi potensi bullying di sekolah melalui analisis sentimen dari media sosial (Twitter) dan data anomaly dari CCTV. Sistem menggunakan NLP untuk klasifikasi sentimen dan machine learning untuk deteksi anomaly.
 

## Setup dan Instalasi

In [215]:
# Install library yang diperlukan
%pip install tweepy pandas numpy matplotlib seaborn scikit-learn nltk textblob wordcloud folium streamlit pymongo dnspython plotly --quiet
%pip install --upgrade nbformat

# %%
import tweepy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import json
import random
from datetime import datetime, timedelta
import time
import warnings
warnings.filterwarnings('ignore')

# NLP Libraries
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from textblob import TextBlob
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix

# Visualisasi & Dashboard
from wordcloud import WordCloud
import folium
from folium.plugins import HeatMap
import plotly.express as px
import plotly.graph_objects as go
import plotly.subplots as sp
from plotly.subplots import make_subplots

# MongoDB
from pymongo import MongoClient
from pymongo.server_api import ServerApi
import urllib.parse

# Streamlit untuk dashboard (akan dijalankan terpisah)
import streamlit as st

# Unduh resources NLTK
nltk.download('punkt', quiet=True)
nltk.download('stopwords', quiet=True)
nltk.download('wordnet', quiet=True)
nltk.download('omw-eng', quiet=True)
nltk.download('punkt_tab')  

# Set style untuk visualisasi
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


[nltk_data] Error loading omw-eng: Package 'omw-eng' not found in
[nltk_data]     index
[nltk_data] Downloading package punkt_tab to C:\Users\LENOVO
[nltk_data]     LOQ\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


## Konfigurasi Twitter API dan MongoDB

In [None]:
# Konfigurasi Twitter API (Ganti dengan kredensial Anda)
TWITTER_API_KEY = ""
TWITTER_API_SECRET = ""
TWITTER_BEARER_TOKEN = ""
TWITTER_ACCESS_TOKEN = ""
TWITTER_ACCESS_TOKEN_SECRET = ""

# Konfigurasi MongoDB
# Untuk MongoDB Atlas (cloud) atau local

# GANTI DENGAN CONNECTION STRING MONGODB ATLAS ANDA
MONGODB_USERNAME = ""
MONGODB_PASSWORD = ""
MONGODB_CLUSTER = ""

# Encode username dan password untuk URL
encoded_username = urllib.parse.quote_plus(MONGODB_USERNAME)
encoded_password = urllib.parse.quote_plus(MONGODB_PASSWORD)

# Connection string untuk MongoDB Atlas
MONGODB_ATLAS_URI = f"mongodb+srv://{encoded_username}:{encoded_password}@{MONGODB_CLUSTER}.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0"

# Tambahkan import ini di bagian atas:

# Nama database dan koleksi
DB_NAME = "bullying_detection"
COLLECTION_TWEETS = "tweets"
COLLECTION_CCTV = "cctv_logs"
COLLECTION_SCHOOLS = "schools"
COLLECTION_ALERTS = "alerts"

# ## Fungsi Koneksi MongoDB

In [217]:
def connect_mongodb(use_atlas=True):
    try:
        if use_atlas:
            print("üîÑ Mencoba koneksi ke MongoDB Atlas...")
            client = MongoClient(MONGODB_ATLAS_URI, server_api=ServerApi('1'))
        else:
            print("üîÑ Mencoba koneksi ke MongoDB lokal...")
            client = MongoClient(MONGODB_URI_LOCAL)
        
        client.admin.command('ping')
        print("‚úÖ Koneksi berhasil!")
        
        db = client[DB_NAME]
        
        # Buat koleksi jika belum ada
        collections_to_create = {
            COLLECTION_TWEETS: {"validator": {"$jsonSchema": {
                "bsonType": "object",
                "required": ["text", "created_at"],
                "properties": {
                    "text": {"bsonType": "string"},
                    "risk_level": {"enum": ["merah", "kuning", "hijau", "aman"]}
                }
            }}},
            COLLECTION_CCTV: {},
            COLLECTION_SCHOOLS: {},
            COLLECTION_ALERTS: {}
        }
        
        existing_collections = db.list_collection_names()
        
        for collection_name, options in collections_to_create.items():
            if collection_name not in existing_collections:
                if options:  # Jika ada validasi schema
                    db.create_collection(collection_name, **options)
                else:
                    db.create_collection(collection_name)
                print(f"üìÅ Koleksi '{collection_name}' dibuat")
            else:
                print(f"üìÅ Koleksi '{collection_name}' sudah ada")
        
        # Buat index untuk performa query
        print("üîç Membuat index untuk query yang cepat...")
        db[COLLECTION_TWEETS].create_index([("created_at", -1)])  # Index untuk sorting terbaru
        db[COLLECTION_TWEETS].create_index([("risk_level", 1), ("city", 1)])  # Index untuk filter
        db[COLLECTION_ALERTS].create_index([("created_at", -1), ("status", 1)])
        db[COLLECTION_CCTV].create_index([("timestamp", -1), ("is_anomaly", 1)])
        
        print("‚úÖ Semua index berhasil dibuat!")
        
        return client, db
        
    except Exception as e:
        print(f"‚ùå Error: {e}")
        return None, None

# ## Fungsi untuk Generate Data Dummy (Karena Twitter API Terbatas)

In [218]:
# Daftar kota di Indonesia
CITIES_INDONESIA = [
    "Jakarta", "Surabaya", "Bandung", "Medan", "Semarang", 
    "Makassar", "Palembang", "Depok", "Tangerang", "Bekasi",
    "Mataram", "Denpasar", "Yogyakarta", "Malang", "Surakarta"
]

# Tambahkan dictionary koordinat kota Indonesia
CITY_COORDINATES = {
    "Jakarta": {"lat": -6.2088, "lon": 106.8456},
    "Surabaya": {"lat": -7.2575, "lon": 112.7521},
    "Bandung": {"lat": -6.9175, "lon": 107.6191},
    "Medan": {"lat": 3.5952, "lon": 98.6722},
    "Semarang": {"lat": -6.9667, "lon": 110.4167},
    "Makassar": {"lat": -5.1477, "lon": 119.4327},
    "Palembang": {"lat": -2.9911, "lon": 104.7567},
    "Depok": {"lat": -6.4025, "lon": 106.7942},
    "Tangerang": {"lat": -6.1781, "lon": 106.6300},
    "Bekasi": {"lat": -6.2349, "lon": 106.9920},
    "Mataram": {"lat": -8.5833, "lon": 116.1167},
    "Denpasar": {"lat": -8.6500, "lon": 115.2167},
    "Yogyakarta": {"lat": -7.8014, "lon": 110.3644},
    "Malang": {"lat": -7.9833, "lon": 112.6333},
    "Surakarta": {"lat": -7.5667, "lon": 110.8167}
}

# Daftar sekolah contoh
SCHOOLS_INDONESIA = [
    "SMP Negeri 1", "SMP Negeri 2", "SMP Negeri 3", "SMP Negeri 4",
    "SMA Negeri 1", "SMA Negeri 2", "SMA Negeri 3", "SMA Negeri 4",
    "SMK Negeri 1", "SMK Negeri 2", "SMP Swasta ABC", "SMA Swasta XYZ"
]

# Template tweet untuk berbagai kategori
TWEET_TEMPLATES = {
    "korban_direct": [
        "aku ngerasa sedih banget di sekolah, selalu dijauhin teman-teman",
        "gak kuat lagi, setiap hari dihina di kelas",
        "mau nangis, di-bully terus di sekolah",
        "kenapa ya aku selalu jadi bahan olokan? capek mental",
        "gak mau sekolah lagi, trauma di-bully",
        "sendiri terus di sekolah, gak ada yang mau temanin",
        "dipukuli teman di kamar mandi sekolah, tapi takut melapor",
        "uang jajan selalu diambil paksa sama senior",
        "diancam kalau melapor ke guru, jadi diam aja",
        "capek, setiap hari nangis sepulang sekolah"
    ],
    "korban_indirect": [
        "kasihan lihat temenku selalu dijauhin dan dihina",
        "ada anak di sekolahku yang sering nangis di toilet sendirian",
        "liat temen dibully tapi takut ikut campur",
        "kenapa sih ada yang tega bully anak orang?",
        "di sekolahku ada geng yang suka nakut-nakutin adik kelas",
        "sedih liat anak SMP dibully sampai mogok sekolah",
        "harusnya sekolah jadi tempat aman, bukan tempat bully",
        "guru perlu lebih aware sama bullying di sekolah",
        "ada yang tau cara bantu korban bullying?",
        "anak tetangga gak mau sekolah karena dibully"
    ],
    "pelaku": [
        "wkwk si bodoh itu makin lama makin tolol aja",
        "goblok banget sih dia, gampang banget dibully",
        "asyik ngejek si cupu, reaksinya lucu banget",
        "hari ini ngerjain si culun lagi, wkwk",
        "gampang banget nakutin anak baru",
        "si lemah itu cemen banget, gampang nangis",
        "ngegangguin dia tuh seru, gak pernah melawan",
        "wajahnya aja udah minta dijahilin",
        "asyik ngerjain orang yang gak bisa melawan",
        "bully itu seru sih, apalagi kalo korban lemah"
    ],
    "netral": [
        "hari ini ada workshop anti bullying di sekolah",
        "konseling di sekolah membantu banget",
        "guru BK peduli sama masalah siswa",
        "program mentoring di sekolahku bagus",
        "belajar tentang cyberbullying hari ini",
        "sekolah perlu lebih banyak psikolog",
        "pentingnya empati di lingkungan sekolah",
        "kerja kelompok hari ini seru",
        "liburan sekolah menyenangkan",
        "olimpiade sains di sekolah"
    ],
    "positif": [
        "akhirnya ada teman yang mau dengerin ceritaku",
        "terima kasih buat guru yang bantu atasi bullying",
        "sekolahku mulai program anti bullying, semoga berhasil",
        "ada support group untuk korban bullying",
        "senang bisa bantu teman yang dibully",
        "kampanye anti bullying berhasil di sekolah",
        "guru BK sangat membantu masalahku",
        "sekolah yang peduli membuat perbedaan",
        "teman-teman mulai berubah jadi lebih baik",
        "merasa lebih aman di sekolah sekarang"
    ]
}

# Keywords untuk deteksi bullying
BULLYING_KEYWORDS = [
    'bully', 'dibully', 'korban', 'pelaku', 'dihina', 'diejek',
    'dipukul', 'diancam', 'diusir', 'dijauhin', 'disingkirkan',
    'mental', 'depresi', 'trauma', 'sedih', 'nangis', 'sendiri',
    'takut', 'cemas', 'stress', 'sekolah', 'kelas', 'teman',
    'guru', 'konseling', 'psikolog', 'bimbingan', 'lapor',
    'melapor', 'pengaduan', 'kekerasan', 'emosional', 'fisik'
]

def generate_dummy_tweets(num_tweets=1000):
    """Generate data dummy tweet dalam jumlah besar"""
    tweets = []
    
    # Distribusi kategori: 30% korban, 25% indirect, 20% netral, 15% positif, 10% pelaku
    categories = ["korban_direct"] * 300 + ["korban_indirect"] * 250 + ["netral"] * 200 + ["positif"] * 150 + ["pelaku"] * 100
    
    for i in range(num_tweets):
        # Pilih kategori secara random
        category = random.choice(categories)
        
        # Generate text dari template
        text = random.choice(TWEET_TEMPLATES[category])
        
        # Tambahkan hashtag/lokasi/mention secara random
        if random.random() > 0.7:
            text += f" #{random.choice(['sekolah', 'pendidikan', 'bullying', 'mentalhealth'])}"
        if random.random() > 0.8:
            text += f" @{random.choice(['kemdikbud', 'school', 'teacher'])}"
        
        # Generate metadata
        city = random.choice(CITIES_INDONESIA)
        school = f"{random.choice(SCHOOLS_INDONESIA)} {city}"
        
        # Generate timestamp (dalam 30 hari terakhir)
        days_ago = random.randint(0, 30)
        hours_ago = random.randint(0, 23)
        minutes_ago = random.randint(0, 59)
        created_at = datetime.now() - timedelta(days=days_ago, hours=hours_ago, minutes=minutes_ago)
        
        # Generate engagement metrics
        retweet_count = random.randint(0, 50)
        like_count = random.randint(0, 100)
        reply_count = random.randint(0, 20)
        
        # Generate author info
        author_id = f"user_{random.randint(10000, 99999)}"
        
        tweet_data = {
            "tweet_id": f"dummy_{i}_{int(time.time())}",
            "text": text,
            "category": category,
            "city": city,
            "school": school,
            "created_at": created_at,
            "retweet_count": retweet_count,
            "like_count": like_count,
            "reply_count": reply_count,
            "author_id": author_id,
            "is_dummy": True,
            "processed": False,
            "risk_level": "aman"  # Akan diisi setelah analisis
        }
        
        tweets.append(tweet_data)
    
    return tweets


# ## Fungsi untuk Generate Data CCTV Dummy

In [219]:
def generate_cctv_data(num_records=500):
    """Generate data dummy CCTV log"""
    cctv_logs = []
    
    # Jam sekolah normal: 07:00-15:00
    # Jam istirahat: 10:00-10:30 dan 12:00-13:00
    
    for i in range(num_records):
        # Pilih sekolah dan lokasi
        city = random.choice(CITIES_INDONESIA)
        school = f"{random.choice(SCHOOLS_INDONESIA)} {city}"
        locations = ["gerbang", "lorong", "kantin", "lapangan", "parkir", "toilet", "kelas"]
        location = random.choice(locations)
        
        # Generate timestamp (hari ini)
        today = datetime.now().date()
        
        # Tentukan apakah waktu normal atau anomali
        if random.random() < 0.3:  # 30% kemungkinan anomali
            # Waktu anomali: di luar jam sekolah atau di jam kelas tapi ramai
            if random.random() < 0.5:
                # Di luar jam sekolah (sore/malam)
                hour = random.choice([6, 16, 17, 18, 19, 20, 21])
            else:
                # Di jam kelas tapi ramai
                hour = random.choice([8, 9, 11, 14])
        else:
            # Waktu normal
            if random.random() < 0.5:
                # Jam istirahat
                hour = random.choice([10, 12])
            else:
                # Jam kelas normal
                hour = random.choice([7, 13, 15])
        
        minute = random.randint(0, 59)
        timestamp = datetime(today.year, today.month, today.day, hour, minute)
        
        # Generate metrics
        crowd_level = random.randint(1, 100)  # 1-100 orang
        noise_level = random.randint(30, 100)  # dB
        
        # Tentukan apakah anomali berdasarkan aturan
        is_anomaly = False
        warning_level = "hijau"
        
        # Aturan deteksi anomali:
        # 1. Keramaian tinggi di jam kelas (bukan istirahat)
        if hour not in [10, 12] and crowd_level > 30:
            is_anomaly = True
            warning_level = "kuning"
        
        # 2. Kebisingan tinggi di jam kelas
        if hour not in [10, 12] and noise_level > 70:
            is_anomaly = True
            warning_level = "merah" if noise_level > 85 else "kuning"
        
        # 3. Keramaian di luar jam sekolah
        if hour < 7 or hour > 15:
            if crowd_level > 10:
                is_anomaly = True
                warning_level = "merah" if crowd_level > 20 else "kuning"
        
        cctv_log = {
            "cctv_id": f"cctv_{random.randint(1, 50)}",
            "school": school,
            "city": city,
            "location": location,
            "timestamp": timestamp,
            "crowd_level": crowd_level,
            "noise_level": noise_level,
            "is_anomaly": is_anomaly,
            "warning_level": warning_level,
            "processed": False
        }
        
        cctv_logs.append(cctv_log)
    
    return cctv_logs


# ## Fungsi Preprocessing dan Analisis Sentimen NLP

In [220]:
# Inisialisasi stopwords Indonesia
stop_words_indonesia = set(stopwords.words('indonesian') if 'indonesian' in stopwords.fileids() else [])
# Tambahkan custom stopwords
custom_stopwords = ['yg', 'dg', 'rt', 'dgn', 'ny', 'd', 'klo', 
                   'kalo', 'amp', 'biar', 'bikin', 'bilang', 
                   'gak', 'ga', 'krn', 'nya', 'nih', 'sih',
                   'si', 'tau', 'tdk', 'tuh', 'utk', 'ya',
                   'jd', 'jgn', 'sdh', 'aja', 'n', 't',
                   'nyg', 'hehe', 'wkwk', 'lol', 'haha']

stop_words_indonesia.update(custom_stopwords)

In [221]:
def preprocess_text(text):
    """Preprocessing text untuk analisis NLP"""
    if not isinstance(text, str):
        return ""
    
    # Lowercase
    text = text.lower()
    
    # Remove URLs
    text = re.sub(r'http\S+|www\S+|https\S+', '', text, flags=re.MULTILINE)
    
    # Remove mentions and hashtags (tapi simpan teksnya)
    text = re.sub(r'@\w+', '', text)
    text = re.sub(r'#(\w+)', r'\1', text)
    
    # Remove punctuations and numbers
    text = re.sub(r'[^\w\s]', ' ', text)
    text = re.sub(r'\d+', '', text)
    
    # Tokenization
    tokens = word_tokenize(text)
    
    # Remove stopwords
    tokens = [word for word in tokens if word not in stop_words_indonesia]
    
    # Remove short words
    tokens = [word for word in tokens if len(word) > 2]
    
    return ' '.join(tokens)

In [222]:
def analyze_sentiment(text):
    """Analisis sentimen menggunakan TextBlob"""
    try:
        analysis = TextBlob(text)
        
        # Untuk Bahasa Indonesia, kita perlu custom rules
        # karena TextBlob lebih baik untuk English
        
        # Custom sentiment analysis untuk Bahasa Indonesia
        positive_words = ['senang', 'bahagia', 'baik', 'positif', 'membantu', 'terima kasih', 
                         'peduli', 'aman', 'nyaman', 'bangga', 'sukses', 'berhasil']
        negative_words = ['sedih', 'sakit', 'marah', 'benci', 'takuti', 'trauma', 'depresi',
                         'stress', 'cemas', 'takut', 'khawatir', 'kesal', 'jengkel']
        bullying_words = ['bully', 'dibully', 'dihina', 'diejek', 'dipukul', 'diancam',
                         'diusir', 'dijauhin', 'kekerasan', 'aniaya', 'zalim']
        
        text_lower = text.lower()
        
        # Hitung skor
        score = 0
        for word in positive_words:
            if word in text_lower:
                score += 1
        
        for word in negative_words:
            if word in text_lower:
                score -= 1
        
        for word in bullying_words:
            if word in text_lower:
                score -= 2
        
        # Tentukan sentiment
        if score > 1:
            sentiment = "positif"
        elif score < -1:
            sentiment = "negatif"
        else:
            sentiment = "netral"
        
        # Tentukan kategori bullying
        bullying_detected = any(word in text_lower for word in bullying_words)
        
        # Tentukan apakah korban atau pelaku
        is_victim = any(word in text_lower for word in ['aku', 'saya', 'diriku', 'korban'])
        is_perpetrator = any(word in text_lower for word in ['asyik', 'lucu', 'seru', 'wkwk', 'gampang'])
        
        category = "unknown"
        if bullying_detected:
            if is_victim:
                category = "korban"
            elif is_perpetrator:
                category = "pelaku"
            else:
                category = "saksi"
        
        return {
            "sentiment": sentiment,
            "bullying_detected": bullying_detected,
            "category": category,
            "score": score,
            "positive_words": sum(1 for w in positive_words if w in text_lower),
            "negative_words": sum(1 for w in negative_words if w in text_lower),
            "bullying_words": sum(1 for w in bullying_words if w in text_lower)
        }
    
    except Exception as e:
        print(f"Error in sentiment analysis: {e}")
        return {
            "sentiment": "netral",
            "bullying_detected": False,
            "category": "unknown",
            "score": 0,
            "positive_words": 0,
            "negative_words": 0,
            "bullying_words": 0
        }

In [223]:
# PERBAIKI FUNGSI calculate_risk_level
def calculate_risk_level(tweet_data, sentiment_result):
    """Hitung level risiko dengan distribusi yang BAGUS"""
    risk_score = 0
    
    # Faktor dari sentimen
    if sentiment_result["sentiment"] == "negatif":
        risk_score += 2  # Dikurangi dari 3
    elif sentiment_result["sentiment"] == "positif":
        risk_score -= 2  # Ditambah dari -1
    
    # Faktor dari kategori bullying
    if sentiment_result["bullying_detected"]:
        risk_score += 3  # Dikurangi dari 5
        
        if sentiment_result["category"] == "korban":
            risk_score += 2  # Dikurangi dari 3
        elif sentiment_result["category"] == "pelaku":
            risk_score += 1  # Dikurangi dari 2
    
    # Faktor dari kata-kata spesifik
    risk_score += sentiment_result["negative_words"] * 1  # Dikurangi dari 2
    risk_score += sentiment_result["bullying_words"] * 2  # Dikurangi dari 3
    
    # Faktor dari engagement
    engagement = tweet_data.get('retweet_count', 0) + tweet_data.get('like_count', 0)
    if engagement > 50:
        risk_score += 1  # Dikurangi dari 2
    elif engagement > 20:
        risk_score += 0.5  # Dikurangi dari 1
    
    # Tentukan level risiko dengan threshold yang LEBIH RENDAH
    if risk_score >= 6:  # Dikurangi dari 8
        return "merah", risk_score
    elif risk_score >= 4:  # Dikurangi dari 5
        return "kuning", risk_score
    elif risk_score >= 2:  # Dikurangi dari 3
        return "hijau", risk_score
    else:
        return "aman", risk_score

# ## Fungsi untuk Menyimpan dan Memproses Data

In [224]:
def save_to_mongodb(db, collection_name, data):
    """Menyimpan data ke MongoDB"""
    try:
        collection = db[collection_name]
        
        if isinstance(data, list):
            result = collection.insert_many(data)
            print(f"Disimpan {len(result.inserted_ids)} dokumen ke {collection_name}")
        else:
            result = collection.insert_one(data)
            print(f"Disimpan 1 dokumen ke {collection_name}")
        
        return result
    except Exception as e:
        print(f"Error saving to MongoDB: {e}")
        return None


In [225]:
def save_processed_tweets(db, processed_tweets):
    """Simpan tweet yang sudah diproses, update jika sudah ada"""
    collection = db[COLLECTION_TWEETS]
    
    for tweet in processed_tweets:
        # Update dokumen berdasarkan tweet_id, jika tidak ada buat baru
        collection.update_one(
            {"tweet_id": tweet["tweet_id"]},  # Cari berdasarkan tweet_id
            {"$set": tweet},  # Update semua field
            upsert=True  # Buat baru jika tidak ditemukan
        )
    
    print(f"‚úÖ {len(processed_tweets)} tweet berhasil diproses & disimpan")

In [226]:
def process_tweets(db, tweets):
    """Proses batch tweets dengan analisis NLP"""
    processed_tweets = []
    alerts = []
    
    for tweet in tweets:
        # Preprocessing text
        cleaned_text = preprocess_text(tweet['text'])
        
        # Analisis sentimen
        sentiment_result = analyze_sentiment(cleaned_text)
        
        # Hitung risk level
        risk_level, risk_score = calculate_risk_level(tweet, sentiment_result)
        
        # Update tweet data
        tweet['processed_text'] = cleaned_text
        tweet['sentiment'] = sentiment_result['sentiment']
        tweet['bullying_detected'] = sentiment_result['bullying_detected']
        tweet['category'] = sentiment_result['category']
        tweet['risk_level'] = risk_level
        tweet['risk_score'] = risk_score
        tweet['processed'] = True
        tweet['processed_at'] = datetime.now()
        
        processed_tweets.append(tweet)
        
        # Buat alert jika risk level tinggi
        if risk_level in ["merah", "kuning"]:
            alert = {
                "alert_id": f"alert_{tweet.get('tweet_id', 'unknown')}_{int(time.time())}",
                "tweet_id": tweet.get('tweet_id'),
                "school": tweet.get('school'),
                "city": tweet.get('city'),
                "risk_level": risk_level,
                "risk_score": risk_score,
                "text_snippet": tweet['text'][:100] + "..." if len(tweet['text']) > 100 else tweet['text'],
                "sentiment": sentiment_result['sentiment'],
                "category": sentiment_result['category'],
                "created_at": datetime.now(),
                "status": "new",
                "alert_type": "tweet_analysis"
            }
            alerts.append(alert)
    
    # Simpan ke MongoDB
    if processed_tweets:
        save_processed_tweets(db, processed_tweets)
    
    if alerts:
        save_to_mongodb(db, COLLECTION_ALERTS, alerts)
    
    return processed_tweets, alerts

# ## Main Pipeline - Generate dan Proses Data

In [227]:
def main_pipeline():
    """Main pipeline untuk generate dan proses data"""
    print("=" * 50)
    print("MEMULAI PIPELINE ANALISIS BULLYING")
    print("=" * 50)
    
    # 1. Koneksi MongoDB
    print("\n1. Menghubungkan ke MongoDB...")
    # Panggil fungsi baru
    client, db = connect_mongodb(use_atlas=True)

     # Jika Atlas gagal, coba lokal
    if db is None:
        print("\nüîÑ MongoDB Atlas gagal, mencoba MongoDB lokal...")
        client, db = connect_mongodb(use_atlas=False)
    
    # 2. Generate data dummy
    print("\n2. Generate data dummy...")
    print("   - Generating tweets...")
    dummy_tweets = generate_dummy_tweets(1500)  # 1500 tweets
    
    print("   - Generating CCTV logs...")
    cctv_logs = generate_cctv_data(300)  # 300 CCTV logs
    
    # 3. Simpan data mentah
    print("\n3. Menyimpan data mentah ke MongoDB...")
    save_to_mongodb(db, COLLECTION_TWEETS, dummy_tweets[:500])  # Simpan 500 dulu
    save_to_mongodb(db, COLLECTION_CCTV, cctv_logs)
    
    # 4. Proses tweets dengan NLP
    print("\n4. Memproses tweets dengan NLP...")
    processed_tweets, alerts = process_tweets(db, dummy_tweets[:500])
    
    # 5. Generate data sekolah
    print("\n5. Generate data sekolah...")
    schools_data = []
    for city in CITIES_INDONESIA[:5]:  # Ambil 5 kota pertama
        for i in range(1, 4):
            school = {
                "school_id": f"school_{city.lower()}_{i}",
                "name": f"SMP Negeri {i} {city}",
                "city": city,
                "type": "SMP",
                "total_students": random.randint(300, 800),
                "counselor_count": random.randint(1, 3),
                "cctv_count": random.randint(5, 15),
                "risk_level": random.choice(["hijau", "kuning", "merah"]),
                "last_incident": datetime.now() - timedelta(days=random.randint(0, 90))
            }
            schools_data.append(school)
    
    save_to_mongodb(db, COLLECTION_SCHOOLS, schools_data)
    
    print("\n" + "=" * 50)
    print("PIPELINE SELESAI!")
    print(f"   ‚Ä¢ {len(processed_tweets)} tweets diproses")
    print(f"   ‚Ä¢ {len(alerts)} alerts dibuat")
    print(f"   ‚Ä¢ {len(cctv_logs)} logs CCTV")
    print(f"   ‚Ä¢ {len(schools_data)} data sekolah")
    print("=" * 50)
    
    return db, processed_tweets, alerts, cctv_logs, schools_data

# ## Visualisasi dan Dashboard

In [228]:
def create_choropleth_heatmap(anomaly_df):
    """Buat choropleth heatmap peta Indonesia"""
    if anomaly_df.empty or 'city' not in anomaly_df.columns:
        print("‚ö†Ô∏è  Tidak ada data untuk heatmap")
        return None
    
    # Hitung anomali per kota
    city_anomalies = anomaly_df.groupby('city').size().reset_index(name='anomali_count')
    
    # Tambah koordinat
    city_anomalies['lat'] = city_anomalies['city'].apply(
        lambda x: CITY_COORDINATES.get(x, {}).get('lat', 0)
    )
    city_anomalies['lon'] = city_anomalies['city'].apply(
        lambda x: CITY_COORDINATES.get(x, {}).get('lon', 0)
    )
    
    # Filter kota yang punya koordinat
    city_anomalies = city_anomalies[
        (city_anomalies['lat'] != 0) & (city_anomalies['lon'] != 0)
    ]
    
    if city_anomalies.empty:
        print("‚ö†Ô∏è  Tidak ada kota dengan koordinat yang valid")
        return None
    
    # Buat choropleth map
    fig = px.scatter_geo(
        city_anomalies,
        lat='lat',
        lon='lon',
        size='anomali_count',
        color='anomali_count',
        hover_name='city',
        hover_data={'anomali_count': True, 'lat': False, 'lon': False},
        size_max=30,
        projection='natural earth',
        title='Heatmap Anomali CCTV di Indonesia',
        color_continuous_scale='RdYlGn_r',  # Red-Yellow-Green (reversed)
        scope='asia',
        center={'lat': -2.5, 'lon': 118},  # Pusat peta Indonesia
        fitbounds='locations'
    )
    
    # Update layout
    fig.update_geos(
        resolution=50,
        showcoastlines=True,
        coastlinecolor="Black",
        showland=True,
        landcolor="lightgray",
        showocean=True,
        oceancolor="lightblue",
        showcountries=True,
        countrycolor="black"
    )
    
    fig.update_layout(
        height=500,
        margin={"r":0,"t":30,"l":0,"b":0}
    )
    
    return fig

In [229]:
def create_visualizations(db):
    """Buat visualisasi dari data yang sudah diproses - UPDATE DENGAN DATA MONGODB"""
    print("\nMembuat visualisasi dari MongoDB...")
    
    # Ambil data dari MongoDB
    tweets_collection = db[COLLECTION_TWEETS]
    cctv_collection = db[COLLECTION_CCTV]
    alerts_collection = db[COLLECTION_ALERTS]
    schools_collection = db[COLLECTION_SCHOOLS]
    
    # Convert ke DataFrame untuk analisis - AMBIL SEMUA DATA
    tweets_df = pd.DataFrame(list(tweets_collection.find({"processed": True})))
    cctv_df = pd.DataFrame(list(cctv_collection.find()))
    alerts_df = pd.DataFrame(list(alerts_collection.find()))
    schools_df = pd.DataFrame(list(schools_collection.find()))
    
    print(f"üìä Data dari MongoDB:")
    print(f"   ‚Ä¢ Tweets diproses: {len(tweets_df)}")
    print(f"   ‚Ä¢ Log CCTV: {len(cctv_df)}")
    print(f"   ‚Ä¢ Alerts: {len(alerts_df)}")
    print(f"   ‚Ä¢ Sekolah: {len(schools_df)}")
    
    # 1. Distribusi Sentimen - SAMA DENGAN DASHBOARD
    print("1. Membuat visualisasi distribusi sentimen...")
    if not tweets_df.empty and 'sentiment' in tweets_df.columns:
        fig1 = px.pie(tweets_df, names='sentiment', title='Distribusi Sentimen Tweet',
                     color='sentiment', 
                     color_discrete_map={'positif': 'green', 'netral': 'blue', 'negatif': 'red'})
        fig1.show()
        
        # Tampilkan statistik
        print(f"   Sentimen dari MongoDB:")
        for sentiment, count in tweets_df['sentiment'].value_counts().items():
            print(f"   - {sentiment}: {count}")
    
    # 2. Distribusi Risk Level - SAMA DENGAN DASHBOARD
    print("2. Membuat visualisasi risk level...")
    if not tweets_df.empty and 'risk_level' in tweets_df.columns:
        risk_counts = tweets_df['risk_level'].value_counts()
        fig2 = px.bar(x=risk_counts.index, y=risk_counts.values, 
                     title='Distribusi Level Risiko',
                     labels={'x': 'Level Risiko', 'y': 'Jumlah Tweet'},
                     color=risk_counts.index,
                     color_discrete_map={'merah': 'red', 'kuning': 'yellow', 'hijau': 'green', 'aman': 'blue'})
        fig2.show()
        
        print(f"   Risk Level dari MongoDB:")
        for risk, count in risk_counts.items():
            print(f"   - {risk}: {count}")
    
    print("3. Membuat heatmap anomali CCTV...")
    if not cctv_df.empty:
        anomaly_df = cctv_df[cctv_df['is_anomaly'] == True]
        print(f"   Anomali CCTV dari MongoDB: {len(anomaly_df)} records")
        
        # Buat choropleth heatmap
        heatmap_fig = create_choropleth_heatmap(anomaly_df)
        
        if heatmap_fig:
            heatmap_fig.show()
        else:
            # Fallback ke bar chart
            print("‚ö†Ô∏è  Choropleth gagal, menggunakan bar chart...")
            if not anomaly_df.empty and 'city' in anomaly_df.columns:
                city_counts = anomaly_df['city'].value_counts().reset_index()
                city_counts.columns = ['city', 'anomali_count']
                
                fig_bar = px.bar(city_counts, 
                            x='city', y='anomali_count',
                            title='Jumlah Anomali CCTV per Kota',
                            color='anomali_count',
                            color_continuous_scale='Reds')
                fig_bar.show()
    
    # 4. Trend Alert per Hari - SAMA DENGAN DASHBOARD
    print("4. Membuat trend alert harian...")
    if not alerts_df.empty and 'created_at' in alerts_df.columns:
        alerts_df['date'] = pd.to_datetime(alerts_df['created_at']).dt.date
        daily_alerts = alerts_df.groupby('date').size().reset_index(name='alert_count')
        
        fig4 = px.line(daily_alerts, x='date', y='alert_count', 
                    title='Trend Alert Harian',
                    markers=True)
        fig4.show()
    
    # 5. Dashboard Interaktif dengan Plotly - DIPERBAIKI
    print("5. Membuat dashboard interaktif...")
    
    # Buat subplots dengan layout yang konsisten
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Distribusi Sentimen', 'Level Risiko per Kota', 
                       'Trend Alert 7 Hari Terakhir', 'Anomali CCTV per Lokasi'),
        specs=[[{'type': 'pie'}, {'type': 'bar'}],
               [{'type': 'scatter'}, {'type': 'bar'}]],
        vertical_spacing=0.1,
        horizontal_spacing=0.1
    )
    
    # Plot 1: Pie chart sentimen - AMBIL DARI MONGODB
    if not tweets_df.empty and 'sentiment' in tweets_df.columns:
        sentiment_counts = tweets_df['sentiment'].value_counts()
        fig.add_trace(
            go.Pie(labels=sentiment_counts.index, values=sentiment_counts.values,
                   name="Sentimen", 
                   marker_colors=['green', 'blue', 'red'],  # Warna konsisten
                   textinfo='percent+label'),
            row=1, col=1
        )
    
    # Plot 2: Risk level per kota - AMBIL DARI MONGODB
    if not tweets_df.empty and 'city' in tweets_df.columns and 'risk_level' in tweets_df.columns:
        # Ambil top 8 kota untuk visualisasi lebih jelas
        top_cities = tweets_df['city'].value_counts().head(8).index
        tweets_top = tweets_df[tweets_df['city'].isin(top_cities)]
        
        risk_by_city = tweets_top.groupby(['city', 'risk_level']).size().unstack(fill_value=0)
        
        colors = {'merah': 'red', 'kuning': 'yellow', 'hijau': 'green', 'aman': 'blue'}
        
        for risk_level in ['merah', 'kuning', 'hijau', 'aman']:
            if risk_level in risk_by_city.columns:
                fig.add_trace(
                    go.Bar(x=risk_by_city.index, y=risk_by_city[risk_level],
                           name=f'Risiko {risk_level}', 
                           marker_color=colors[risk_level],
                           text=risk_by_city[risk_level],
                           textposition='auto'),
                    row=1, col=2
                )
    
    # Plot 3: Trend 7 hari terakhir - AMBIL DARI MONGODB
    if not alerts_df.empty and 'created_at' in alerts_df.columns:
        alerts_df['created_at'] = pd.to_datetime(alerts_df['created_at'])
        last_7_days = datetime.now() - timedelta(days=7)
        recent_alerts = alerts_df[alerts_df['created_at'] >= last_7_days]
        
        if not recent_alerts.empty:
            recent_alerts['date'] = recent_alerts['created_at'].dt.date
            daily_recent = recent_alerts.groupby('date').size().reset_index(name='alert_count')
            
            fig.add_trace(
                go.Scatter(x=daily_recent['date'], y=daily_recent['alert_count'],
                          mode='lines+markers', name='Alert Harian',
                          line=dict(color='red', width=2),
                          marker=dict(size=8)),
                row=2, col=1
            )
    
    # Plot 4: CCTV anomalies by location - AMBIL DARI MONGODB
    if not cctv_df.empty and 'is_anomaly' in cctv_df.columns:
        anomaly_locations = cctv_df[cctv_df['is_anomaly'] == True]
        
        if not anomaly_locations.empty and 'location' in anomaly_locations.columns:
            location_counts = anomaly_locations['location'].value_counts()
            
            fig.add_trace(
                go.Bar(x=location_counts.index, y=location_counts.values,
                       name='Anomali per Lokasi', 
                       marker_color='orange',
                       text=location_counts.values,
                       textposition='auto'),
                row=2, col=2
            )
    
    # Update layout untuk konsisten dengan dashboard
    fig.update_layout(
        height=800, 
        width=1200,
        title_text="Dashboard Monitoring Bullying - Data dari MongoDB",
        showlegend=True,
        legend=dict(yanchor="top", y=0.99, xanchor="left", x=1.02),
        template='plotly_white'  # Template yang sama
    )
    
    # Update axes untuk konsisten
    fig.update_xaxes(title_text="Kota", row=1, col=2)
    fig.update_yaxes(title_text="Jumlah Tweet", row=1, col=2)
    
    fig.update_xaxes(title_text="Tanggal", row=2, col=1)
    fig.update_yaxes(title_text="Jumlah Alert", row=2, col=1)
    
    fig.update_xaxes(title_text="Lokasi", row=2, col=2)
    fig.update_yaxes(title_text="Jumlah Anomali", row=2, col=2)
    
    fig.show()
    
    print("\n‚úÖ Visualisasi selesai! Data diambil dari MongoDB.")
    
    return tweets_df, cctv_df, alerts_df, schools_df

# ## Fungsi untuk Dashboard Streamlit

In [None]:
def create_streamlit_dashboard():
    """Buat dashboard dengan interface yang mirip TAPI FIX error + PETA"""
    dashboard_code = '''# dashboard_final_with_map.py
# Sistem Deteksi Bullying - Dashboard Final dengan Peta
# Universitas Mataram - Teknik Informatika 2025/2026

import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta
from pymongo import MongoClient
from pymongo.server_api import ServerApi
import time
import urllib.parse
import warnings
warnings.filterwarnings('ignore')
from plotly.subplots import make_subplots
import random

# ========== KONFIGURASI MONGODB ATLAS ==========
MONGODB_USERNAME = ""
MONGODB_PASSWORD = ""
MONGODB_CLUSTER = ""

# Encode username dan password
encoded_username = urllib.parse.quote_plus(MONGODB_USERNAME)
encoded_password = urllib.parse.quote_plus(MONGODB_PASSWORD)

# Connection string
MONGODB_ATLAS_URI = f"mongodb+srv://{encoded_username}:{encoded_password}@{MONGODB_CLUSTER}.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0"

DB_NAME = "bullying_detection"
COLLECTION_TWEETS = "tweets"
COLLECTION_CCTV = "cctv_logs"
COLLECTION_ALERTS = "alerts"
COLLECTION_SCHOOLS = "schools"

# Koordinat kota di Indonesia
CITY_COORDINATES = {
    "Jakarta": {"lat": -6.2088, "lon": 106.8456},
    "Surabaya": {"lat": -7.2575, "lon": 112.7521},
    "Bandung": {"lat": -6.9175, "lon": 107.6191},
    "Medan": {"lat": 3.5952, "lon": 98.6722},
    "Semarang": {"lat": -6.9667, "lon": 110.4167},
    "Makassar": {"lat": -5.1477, "lon": 119.4327},
    "Palembang": {"lat": -2.9911, "lon": 104.7567},
    "Depok": {"lat": -6.4025, "lon": 106.7942},
    "Tangerang": {"lat": -6.1781, "lon": 106.6300},
    "Bekasi": {"lat": -6.2349, "lon": 106.9920},
    "Mataram": {"lat": -8.5833, "lon": 116.1167},
    "Denpasar": {"lat": -8.6500, "lon": 115.2167},
    "Yogyakarta": {"lat": -7.8014, "lon": 110.3644},
    "Malang": {"lat": -7.9833, "lon": 112.6333},
    "Surakarta": {"lat": -7.5667, "lon": 110.8167}
}

# ========== SETUP PAGE ==========
st.set_page_config(
    page_title="üö® Sistem Deteksi Bullying - Dashboard Final",
    page_icon="üö®",
    layout="wide"
)

# ========== CSS CUSTOM ==========
st.markdown("""
<style>
    .main-header {
        font-size: 2.5rem;
        color: white;
        text-align: center;
        margin-bottom: 1rem;
        padding: 1rem;
        background: linear-gradient(90deg, #1E3A8A, #3B82F6);
        border-radius: 10px;
        box-shadow: 0 4px 6px rgba(0,0,0,0.1);
    }
    .metric-card {
        background-color: #f8f9fa;
        padding: 1rem;
        border-radius: 10px;
        border-left: 5px solid #3B82F6;
        margin-bottom: 1rem;
        transition: transform 0.3s;
    }
    .metric-card:hover {
        transform: translateY(-3px);
        box-shadow: 0 6px 12px rgba(0,0,0,0.1);
    }
    .sub-header {
        font-size: 1.5rem;
        color: #2D3748;
        margin-top: 1.5rem;
        margin-bottom: 1rem;
        padding-bottom: 0.5rem;
        border-bottom: 2px solid #4F46E5;
    }
    .stTabs [data-baseweb="tab-list"] {
        gap: 10px;
    }
    .stTabs [data-baseweb="tab"] {
        height: 50px;
        font-weight: 600;
        border-radius: 10px 10px 0 0;
    }
</style>
""", unsafe_allow_html=True)

# ========== FUNGSI KONEKSI MONGODB ==========
@st.cache_resource
def init_connection():
    """Connect ke MongoDB Atlas - FIX error"""
    try:
        client = MongoClient(MONGODB_ATLAS_URI, server_api=ServerApi('1'))
        db = client[DB_NAME]
        # Test koneksi
        client.admin.command('ping')
        return db
    except Exception as e:
        st.sidebar.error(f"‚ùå MongoDB Error: {str(e)[:100]}")
        return None

# ========== FUNGSI LOAD DATA ==========
@st.cache_data(ttl=30)
def load_mongodb_data():
    """Load data dari MongoDB - FIX error"""
    db = init_connection()
    
    # FIX: Pakai 'is None' bukan 'if not db'
    if db is None:
        st.warning("‚ö†Ô∏è Menggunakan data dummy karena tidak bisa konek ke MongoDB")
        return create_dummy_data(), pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
    
    try:
        # AMBIL DATA seperti di notebook
        tweets = list(db[COLLECTION_TWEETS].find({"processed": True}).limit(2000))
        cctv = list(db[COLLECTION_CCTV].find().limit(1000))
        alerts = list(db[COLLECTION_ALERTS].find().limit(500))
        schools = list(db[COLLECTION_SCHOOLS].find().limit(100))
        
        return tweets, cctv, alerts, schools
        
    except Exception as e:
        st.error(f"Error loading data: {e}")
        return create_dummy_data(), pd.DataFrame(), pd.DataFrame(), pd.DataFrame()

def create_dummy_data():
    """Buat data dummy jika MongoDB error"""
    print("‚ö†Ô∏è Membuat data dummy...")
    
    cities = list(CITY_COORDINATES.keys())[:10]  # Ambil 10 kota pertama
    sentiments = ['positif', 'netral', 'negatif']
    risk_levels = ['merah', 'kuning', 'hijau', 'aman']
    
    data = []
    for i in range(1000):
        city = random.choice(cities)
        sentiment = random.choices(sentiments, weights=[0.2, 0.3, 0.5])[0]
        risk_level = random.choices(risk_levels, weights=[0.3, 0.4, 0.2, 0.1])[0]
        
        data.append({
            'tweet_id': f'dummy_{i}',
            'text': f'Sample tweet tentang bullying di sekolah {i}',
            'city': city,
            'sentiment': sentiment,
            'risk_level': risk_level,
            'risk_score': random.randint(1, 20),
            'bullying_detected': risk_level in ['merah', 'kuning'],
            'created_at': datetime.now() - timedelta(days=random.randint(0, 30)),
            'school': f'SMP Negeri {random.randint(1, 5)} {city}',
            'category': random.choice(['korban', 'pelaku', 'saksi', 'unknown']),
            'processed': True
        })
    
    return data

# ========== FUNGSI UNTUK PETA ==========
def create_indonesia_heatmap(tweets_df, cctv_df):
    """Buat heatmap peta Indonesia seperti di notebook"""
    if tweets_df.empty and cctv_df.empty:
        return None
    
    heat_data = []
    
    # 1. Hitung tweet risiko tinggi per kota
    if not tweets_df.empty and 'city' in tweets_df.columns and 'risk_level' in tweets_df.columns:
        high_risk_tweets = tweets_df[tweets_df['risk_level'].isin(['merah', 'kuning'])]
        if not high_risk_tweets.empty:
            tweet_counts = high_risk_tweets['city'].value_counts()
            for city, count in tweet_counts.items():
                if city in CITY_COORDINATES:
                    heat_data.append({
                        'city': city,
                        'lat': CITY_COORDINATES[city]['lat'],
                        'lon': CITY_COORDINATES[city]['lon'],
                        'count': count,
                        'type': 'tweet',
                        'label': f'Tweet: {count} risiko tinggi'
                    })
    
    # 2. Hitung anomali CCTV per kota
    if not cctv_df.empty and 'city' in cctv_df.columns and 'is_anomaly' in cctv_df.columns:
        anomaly_df = cctv_df[cctv_df['is_anomaly'] == True]
        if not anomaly_df.empty:
            cctv_counts = anomaly_df['city'].value_counts()
            for city, count in cctv_counts.items():
                if city in CITY_COORDINATES:
                    # Cek apakah kota sudah ada di data
                    found = False
                    for item in heat_data:
                        if item['city'] == city:
                            item['count'] += count * 2  # CCTV lebih berat
                            item['label'] = f"{item['label']} + CCTV: {count}"
                            found = True
                            break
                    if not found:
                        heat_data.append({
                            'city': city,
                            'lat': CITY_COORDINATES[city]['lat'],
                            'lon': CITY_COORDINATES[city]['lon'],
                            'count': count * 2,
                            'type': 'cctv',
                            'label': f'CCTV: {count} anomali'
                        })
    
    if not heat_data:
        return None
    
    heat_df = pd.DataFrame(heat_data)
    
    # Buat scatter map
    fig = px.scatter_geo(
        heat_df,
        lat='lat',
        lon='lon',
        size='count',
        color='count',
        hover_name='city',
        hover_data={'count': True, 'label': True, 'lat': False, 'lon': False},
        size_max=40,
        projection='natural earth',
        title='üó∫Ô∏è Heatmap Risiko Bullying & Anomali CCTV di Indonesia',
        color_continuous_scale='RdYlGn_r',
        color_continuous_midpoint=heat_df['count'].median(),
        scope='asia',
        center={'lat': -2.5, 'lon': 118},
        template='plotly_white'
    )
    
    # Update geos settings
    fig.update_geos(
        resolution=50,
        showcoastlines=True,
        coastlinecolor="Black",
        showland=True,
        landcolor="lightgray",
        showocean=True,
        oceancolor="lightblue",
        showcountries=True,
        countrycolor="black",
        showlakes=True,
        lakecolor="lightblue"
    )
    
    # Update layout
    fig.update_layout(
        height=550,
        margin={"r": 0, "t": 60, "l": 0, "b": 0},
        title_x=0.5,
        title_font_size=18,
        geo=dict(
            projection_scale=5,
            center=dict(lat=-2.5, lon=118)
        )
    )
    
    return fig

# ========== FUNGSI VISUALISASI ==========
def create_matching_sentiment_chart(tweets_df):
    """Buat chart sentimen SAMA dengan notebook"""
    if tweets_df.empty or 'sentiment' not in tweets_df.columns:
        return None
    
    sentiment_counts = tweets_df['sentiment'].value_counts()
    
    fig = px.pie(
        values=sentiment_counts.values,
        names=sentiment_counts.index,
        title='Distribusi Sentimen Tweet',
        color=sentiment_counts.index,
        color_discrete_map={'positif': 'green', 'netral': 'blue', 'negatif': 'red'},
        hole=0.3
    )
    
    fig.update_layout(
        title_x=0.5,
        height=400,
        showlegend=True
    )
    
    return fig

def create_matching_risk_chart(tweets_df):
    """Buat chart risk level SAMA dengan notebook"""
    if tweets_df.empty or 'risk_level' not in tweets_df.columns:
        return None
    
    risk_counts = tweets_df['risk_level'].value_counts()
    
    fig = px.bar(
        x=risk_counts.index,
        y=risk_counts.values,
        title='Distribusi Level Risiko',
        labels={'x': 'Level Risiko', 'y': 'Jumlah Tweet'},
        color=risk_counts.index,
        color_discrete_map={'merah': 'red', 'kuning': 'yellow', 'hijau': 'green', 'aman': 'blue'},
        text=risk_counts.values
    )
    
    fig.update_traces(texttemplate='%{text}', textposition='outside')
    fig.update_layout(
        title_x=0.5,
        height=400,
        showlegend=False
    )
    
    return fig

def create_matching_complete_dashboard(tweets_df, cctv_df, alerts_df):
    """Buat dashboard lengkap SAMA dengan notebook"""
    if tweets_df.empty:
        return None
    
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Distribusi Sentimen', 'Level Risiko per Kota',
                       'Trend Alert 7 Hari Terakhir', 'Anomali CCTV per Lokasi'),
        specs=[[{'type': 'pie'}, {'type': 'bar'}],
               [{'type': 'scatter'}, {'type': 'bar'}]],
        vertical_spacing=0.12,
        horizontal_spacing=0.1
    )
    
    # 1. Pie chart sentimen
    if 'sentiment' in tweets_df.columns:
        sentiment_counts = tweets_df['sentiment'].value_counts()
        fig.add_trace(
            go.Pie(labels=sentiment_counts.index, values=sentiment_counts.values,
                   name="Sentimen", marker_colors=['green', 'blue', 'red']),
            row=1, col=1
        )
    
    # 2. Risk level per kota
    if 'city' in tweets_df.columns and 'risk_level' in tweets_df.columns:
        # Ambil top 8 kota
        top_cities = tweets_df['city'].value_counts().head(8).index
        tweets_top = tweets_df[tweets_df['city'].isin(top_cities)]
        risk_by_city = tweets_top.groupby(['city', 'risk_level']).size().unstack(fill_value=0)
        
        colors = {'merah': 'red', 'kuning': 'yellow', 'hijau': 'green', 'aman': 'blue'}
        
        for risk_level in ['merah', 'kuning', 'hijau', 'aman']:
            if risk_level in risk_by_city.columns:
                fig.add_trace(
                    go.Bar(x=risk_by_city.index, y=risk_by_city[risk_level],
                           name=f'Risiko {risk_level}', marker_color=colors[risk_level]),
                    row=1, col=2
                )
    
    # 3. Trend 7 hari terakhir
    if not alerts_df.empty and 'created_at' in alerts_df.columns:
        alerts_df['created_at'] = pd.to_datetime(alerts_df['created_at'])
        last_7_days = datetime.now() - timedelta(days=7)
        recent_alerts = alerts_df[alerts_df['created_at'] >= last_7_days]
        
        if not recent_alerts.empty:
            recent_alerts['date'] = recent_alerts['created_at'].dt.date
            daily_recent = recent_alerts.groupby('date').size().reset_index(name='alert_count')
            
            fig.add_trace(
                go.Scatter(x=daily_recent['date'], y=daily_recent['alert_count'],
                          mode='lines+markers', name='Alert Harian',
                          line=dict(color='red', width=2)),
                row=2, col=1
            )
    
    # 4. CCTV anomalies by location
    if not cctv_df.empty and 'is_anomaly' in cctv_df.columns:
        anomaly_locations = cctv_df[cctv_df['is_anomaly'] == True]
        
        if not anomaly_locations.empty and 'location' in anomaly_locations.columns:
            location_counts = anomaly_locations['location'].value_counts()
            
            fig.add_trace(
                go.Bar(x=location_counts.index, y=location_counts.values,
                       name='Anomali per Lokasi', marker_color='orange'),
                row=2, col=2
            )
    
    fig.update_layout(
        height=800,
        title_text="Dashboard Monitoring Bullying - Konsisten dengan Notebook",
        showlegend=True,
        template='plotly_white'
    )
    
    fig.update_xaxes(title_text="Kota", row=1, col=2)
    fig.update_yaxes(title_text="Jumlah Tweet", row=1, col=2)
    
    fig.update_xaxes(title_text="Tanggal", row=2, col=1)
    fig.update_yaxes(title_text="Jumlah Alert", row=2, col=1)
    
    fig.update_xaxes(title_text="Lokasi", row=2, col=2)
    fig.update_yaxes(title_text="Jumlah Anomali", row=2, col=2)
    
    return fig

# ========== FUNGSI UTAMA DASHBOARD ==========
def main():
    # Header
    st.markdown('<h1 class="main-header">üö® Sistem Deteksi Bullying - Dashboard Final</h1>', unsafe_allow_html=True)
    st.markdown("**Dashboard dengan Peta Heatmap Indonesia**")
    
    # Load data
    tweets_data, cctv_data, alerts_data, schools_data = load_mongodb_data()
    
    # Convert to DataFrame
    tweets_df = pd.DataFrame(tweets_data) if tweets_data else pd.DataFrame()
    cctv_df = pd.DataFrame(cctv_data) if cctv_data else pd.DataFrame()
    alerts_df = pd.DataFrame(alerts_data) if alerts_data else pd.DataFrame()
    
    # ========== SIDEBAR ==========
    st.sidebar.title("‚öôÔ∏è Kontrol Dashboard")
    
    # Refresh button
    if st.sidebar.button("üîÑ Refresh Data", use_container_width=True):
        st.cache_data.clear()
        st.rerun()
    
    st.sidebar.markdown("---")
    st.sidebar.title("üìä Statistik Data")
    
    st.sidebar.write(f"**Total Tweet:** {len(tweets_df)}")
    
    if not tweets_df.empty:
        if 'sentiment' in tweets_df.columns:
            neg_count = len(tweets_df[tweets_df['sentiment'] == 'negatif'])
            st.sidebar.write(f"**Sentimen Negatif:** {neg_count}")
        
        if 'risk_level' in tweets_df.columns:
            high_risk = len(tweets_df[tweets_df['risk_level'].isin(['merah', 'kuning'])])
            st.sidebar.write(f"**High Risk:** {high_risk}")
    
    if not cctv_df.empty:
        anomalies = len(cctv_df[cctv_df['is_anomaly'] == True])
        st.sidebar.write(f"**Anomali CCTV:** {anomalies}")
    
    st.sidebar.markdown("---")
    st.sidebar.caption(f"üïí Terakhir update: {datetime.now().strftime('%H:%M:%S')}")
    
    # ========== METRICS ==========
    st.markdown('<div class="sub-header">üìä Metrics Real-time</div>', unsafe_allow_html=True)
    
    col1, col2, col3, col4 = st.columns(4)
    
    with col1:
        st.markdown('<div class="metric-card">', unsafe_allow_html=True)
        st.metric("üìù Total Tweet", len(tweets_df))
        st.markdown('</div>', unsafe_allow_html=True)
    
    with col2:
        st.markdown('<div class="metric-card">', unsafe_allow_html=True)
        if not tweets_df.empty and 'sentiment' in tweets_df.columns:
            neg_count = len(tweets_df[tweets_df['sentiment'] == 'negatif'])
            st.metric("üòî Sentimen Negatif", neg_count)
        else:
            st.metric("üòî Sentimen Negatif", 0)
        st.markdown('</div>', unsafe_allow_html=True)
    
    with col3:
        st.markdown('<div class="metric-card">', unsafe_allow_html=True)
        if not tweets_df.empty and 'risk_level' in tweets_df.columns:
            high_risk = len(tweets_df[tweets_df['risk_level'].isin(['merah', 'kuning'])])
            st.metric("üö® High Risk", high_risk)
        else:
            st.metric("üö® High Risk", 0)
        st.markdown('</div>', unsafe_allow_html=True)
    
    with col4:
        st.markdown('<div class="metric-card">', unsafe_allow_html=True)
        if not cctv_df.empty:
            anomalies = len(cctv_df[cctv_df['is_anomaly'] == True])
            st.metric("üìπ Anomali CCTV", anomalies)
        else:
            st.metric("üìπ Anomali CCTV", 0)
        st.markdown('</div>', unsafe_allow_html=True)
    
    # ========== TABS ==========
    tab1, tab2, tab3, tab4 = st.tabs([
        "üó∫Ô∏è Peta Heatmap", 
        "üìä Visualisasi", 
        "üìà Dashboard Lengkap",
        "üìã Data Detail"
    ])
    
    with tab1:
        st.markdown('<div class="sub-header">üó∫Ô∏è Peta Heatmap Indonesia</div>', unsafe_allow_html=True)
        
        # Buat peta heatmap
        heatmap_fig = create_indonesia_heatmap(tweets_df, cctv_df)
        
        if heatmap_fig:
            st.plotly_chart(heatmap_fig, use_container_width=True)
            
            # Stats di bawah peta
            col1, col2, col3 = st.columns(3)
            with col1:
                if not tweets_df.empty and 'risk_level' in tweets_df.columns:
                    red_tweets = len(tweets_df[tweets_df['risk_level'] == 'merah'])
                    st.metric("üî¥ Tweet Merah", red_tweets)
            
            with col2:
                if not tweets_df.empty and 'risk_level' in tweets_df.columns:
                    yellow_tweets = len(tweets_df[tweets_df['risk_level'] == 'kuning'])
                    st.metric("üü° Tweet Kuning", yellow_tweets)
            
            with col3:
                if not cctv_df.empty:
                    total_anomalies = len(cctv_df[cctv_df['is_anomaly'] == True])
                    st.metric("üìπ Total Anomali", total_anomalies)
        else:
            st.info("Data tidak cukup untuk membuat peta heatmap")
            
            # Fallback: bar chart per kota
            if not tweets_df.empty and 'city' in tweets_df.columns:
                st.subheader("Distribusi per Kota")
                city_counts = tweets_df['city'].value_counts().head(10).reset_index()
                city_counts.columns = ['city', 'count']
                
                fig_bar = px.bar(
                    city_counts,
                    x='city',
                    y='count',
                    title='Jumlah Tweet per Kota',
                    color='count',
                    color_continuous_scale='Reds'
                )
                st.plotly_chart(fig_bar, use_container_width=True)
    
    with tab2:
        st.markdown('<div class="sub-header">üìà Diagram Individual</div>', unsafe_allow_html=True)
        
        col1, col2 = st.columns(2)
        
        with col1:
            fig1 = create_matching_sentiment_chart(tweets_df)
            if fig1:
                st.plotly_chart(fig1, use_container_width=True)
                st.caption("**Distribusi Sentimen Tweet**")
            else:
                st.info("Data sentimen tidak tersedia")
        
        with col2:
            fig2 = create_matching_risk_chart(tweets_df)
            if fig2:
                st.plotly_chart(fig2, use_container_width=True)
                st.caption("**Distribusi Level Risiko**")
            else:
                st.info("Data risk level tidak tersedia")
        
        # Tambahan: Trend waktu
        if not tweets_df.empty and 'created_at' in tweets_df.columns:
            st.subheader("üìÖ Trend Harian")
            
            try:
                tweets_df['date'] = pd.to_datetime(tweets_df['created_at']).dt.date
                daily_counts = tweets_df.groupby('date').size().reset_index(name='count')
                
                fig_trend = px.line(
                    daily_counts,
                    x='date',
                    y='count',
                    title='Jumlah Tweet per Hari',
                    markers=True
                )
                st.plotly_chart(fig_trend, use_container_width=True)
            except:
                pass
    
    with tab3:
        st.markdown('<div class="sub-header">üìä Dashboard Lengkap (2x2 Subplots)</div>', unsafe_allow_html=True)
        
        fig3 = create_matching_complete_dashboard(tweets_df, cctv_df, alerts_df)
        if fig3:
            st.plotly_chart(fig3, use_container_width=True)
            st.caption("**Dashboard lengkap dengan 4 visualisasi**")
        else:
            st.info("Data tidak cukup untuk membuat dashboard lengkap")
            
            # Fallback: simple dashboard
            if not tweets_df.empty:
                col1, col2 = st.columns(2)
                with col1:
                    fig_fallback1 = create_matching_sentiment_chart(tweets_df)
                    if fig_fallback1:
                        st.plotly_chart(fig_fallback1, use_container_width=True)
                
                with col2:
                    fig_fallback2 = create_matching_risk_chart(tweets_df)
                    if fig_fallback2:
                        st.plotly_chart(fig_fallback2, use_container_width=True)
    
    with tab4:
        st.markdown('<div class="sub-header">üìã Data Detail dari MongoDB</div>', unsafe_allow_html=True)
        
        if not tweets_df.empty:
            # Tampilkan distribusi
            col1, col2, col3 = st.columns(3)
            
            with col1:
                st.write("**üìä Distribusi Sentimen:**")
                if 'sentiment' in tweets_df.columns:
                    for sent, count in tweets_df['sentiment'].value_counts().items():
                        st.write(f"- {sent}: {count}")
                else:
                    st.write("Tidak ada data")
            
            with col2:
                st.write("**‚ö†Ô∏è Distribusi Risk Level:**")
                if 'risk_level' in tweets_df.columns:
                    for risk, count in tweets_df['risk_level'].value_counts().items():
                        st.write(f"- {risk}: {count}")
                else:
                    st.write("Tidak ada data")
            
            with col3:
                st.write("**üìç Top 5 Kota:**")
                if 'city' in tweets_df.columns:
                    for city, count in tweets_df['city'].value_counts().head(5).items():
                        st.write(f"- {city}: {count}")
                else:
                    st.write("Tidak ada data")
            
            # Tampilkan sample data
            st.subheader("Sample Data Tweet (10 terbaru)")
            
            # Sort by date jika ada
            if 'created_at' in tweets_df.columns:
                tweets_df_sorted = tweets_df.sort_values('created_at', ascending=False)
            else:
                tweets_df_sorted = tweets_df
            
            # Pilih kolom untuk ditampilkan
            show_cols = ['city', 'sentiment', 'risk_level', 'risk_score', 'bullying_detected', 'created_at']
            available_cols = [col for col in show_cols if col in tweets_df_sorted.columns]
            
            if available_cols:
                sample_df = tweets_df_sorted[available_cols].head(10).copy()
                
                # Format tanggal
                if 'created_at' in sample_df.columns:
                    sample_df['created_at'] = pd.to_datetime(sample_df['created_at']).dt.strftime('%Y-%m-%d %H:%M')
                
                # Format risk score
                if 'risk_score' in sample_df.columns:
                    sample_df['risk_score'] = sample_df['risk_score'].apply(lambda x: f"{x}/20")
                
                st.dataframe(sample_df, use_container_width=True)
                
                # Download button
                csv = sample_df.to_csv(index=False)
                st.download_button(
                    label="üì• Download Sample Data (CSV)",
                    data=csv,
                    file_name=f"sample_data_{datetime.now().strftime('%Y%m%d')}.csv",
                    mime="text/csv"
                )
            else:
                st.info("Kolom data tidak tersedia")
        else:
            st.info("Tidak ada data tweet")
    
    # ========== FOOTER ==========
    st.markdown("---")
    st.markdown("**Sistem Deteksi Bullying** ‚Ä¢ Teknik Informatika UNRAM ‚Ä¢ ¬© 2025")
    st.caption(f"Dashboard terakhir di-load: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

if __name__ == "__main__":
    main()'''
    
    with open('dashboard_bullying.py', 'w', encoding='utf-8') as f:
        f.write(dashboard_code)
    
    print("‚úÖ Dashboard dengan PETA berhasil dibuat: dashboard_with_map.py")
    print("üéØ Jalankan dengan: streamlit run dashboard_with_map.py")
    print("")
    print("‚ú® **FITUR UTAMA:**")
    print("1. ‚úÖ **PETA HEATMAP INDONESIA** seperti di notebook")
    print("2. ‚úÖ Interface sama dengan yang Anda mau")
    print("3. ‚úÖ FIX error MongoDB")
    print("4. ‚úÖ Diagram MIRIP dengan notebook")
    print("5. ‚úÖ 4 tab: Peta, Visualisasi, Dashboard, Data")
    print("6. ‚úÖ Metrics cards dengan styling")
    print("7. ‚úÖ Sidebar dengan statistik")
    print("8. ‚úÖ Data dummy jika MongoDB error")
    
    return dashboard_code

# ## Fungsi Utama untuk Menjalankan Semua

In [231]:
def run_complete_system():
    """Jalankan seluruh sistem dari awal hingga akhir"""
    print("=" * 60)
    print("SISTEM DETEKSI BULLYING - WEB SCRAPING & NLP")
    print("=" * 60)
    
    # Jalankan pipeline utama
    result = main_pipeline()
    
    if result:
        db, processed_tweets, alerts, cctv_logs, schools_data = result
        
        # Buat visualisasi
        print("\n" + "=" * 60)
        print("MEMBUAT VISUALISASI DAN DASHBOARD")
        print("=" * 60)
        
        tweets_df, cctv_df, alerts_df, schools_df = create_visualizations(db)
        
        # Buat dashboard Streamlit
        print("\n" + "=" * 60)
        print("MEMBUAT DASHBOARD STREAMLIT")
        print("=" * 60)
        
        create_streamlit_dashboard()
        
        # Tampilkan ringkasan
        print("\n" + "=" * 60)
        print("RINGKASAN HASIL")
        print("=" * 60)
        
        if not tweets_df.empty:
            print(f"\nüìä ANALISIS TWEET:")
            print(f"   ‚Ä¢ Total tweet diproses: {len(tweets_df)}")
            print(f"   ‚Ä¢ Sentimen negatif: {len(tweets_df[tweets_df['sentiment'] == 'negatif'])}")
            print(f"   ‚Ä¢ Bullying terdeteksi: {len(tweets_df[tweets_df['bullying_detected'] == True])}")
            print(f"   ‚Ä¢ Risk Level Merah: {len(tweets_df[tweets_df['risk_level'] == 'merah'])}")
            print(f"   ‚Ä¢ Risk Level Kuning: {len(tweets_df[tweets_df['risk_level'] == 'kuning'])}")
        
        if not cctv_df.empty:
            print(f"\nüìπ DATA CCTV:")
            print(f"   ‚Ä¢ Total log CCTV: {len(cctv_df)}")
            print(f"   ‚Ä¢ Anomali terdeteksi: {len(cctv_df[cctv_df['is_anomaly'] == True])}")
            print(f"   ‚Ä¢ Warning Merah: {len(cctv_df[cctv_df['warning_level'] == 'merah'])}")
        
        if not alerts_df.empty:
            print(f"\nüö® ALERT SISTEM:")
            print(f"   ‚Ä¢ Total alert: {len(alerts_df)}")
            print(f"   ‚Ä¢ Alert aktif (status new): {len(alerts_df[alerts_df['status'] == 'new'])}")
        
        print("\n" + "=" * 60)
        print("NEXT STEPS:")
        print("1. Jalankan dashboard: streamlit run dashboard_bullying.py")
        print("2. Buka browser ke http://localhost:8501")
        print("3. Dashboard akan menampilkan data real-time dari MongoDB")
        print("=" * 60)
        
        return True
    
    return False

# %% [markdown]
# ## 12. Eksekusi Sistem

# %%
if __name__ == "__main__":
    # Jalankan sistem lengkap
    success = run_complete_system()
    
    if success:
        print("\n‚úÖ SISTEM BERHASIL DIEKSEKUSI!")
        print("\nüìÅ File yang dihasilkan:")
        print("   ‚Ä¢ Notebook ini (berisi semua kode)")
        print("   ‚Ä¢ dashboard_bullying.py (dashboard Streamlit)")
        print("   ‚Ä¢ Data tersimpan di MongoDB")
    else:
        print("\n‚ùå Terjadi kesalahan dalam eksekusi sistem")

SISTEM DETEKSI BULLYING - WEB SCRAPING & NLP
MEMULAI PIPELINE ANALISIS BULLYING

1. Menghubungkan ke MongoDB...
üîÑ Mencoba koneksi ke MongoDB Atlas...
‚úÖ Koneksi berhasil!
üìÅ Koleksi 'tweets' sudah ada
üìÅ Koleksi 'cctv_logs' sudah ada
üìÅ Koleksi 'schools' sudah ada
üìÅ Koleksi 'alerts' sudah ada
üîç Membuat index untuk query yang cepat...
‚úÖ Semua index berhasil dibuat!

2. Generate data dummy...
   - Generating tweets...
   - Generating CCTV logs...

3. Menyimpan data mentah ke MongoDB...
Disimpan 500 dokumen ke tweets
Disimpan 300 dokumen ke cctv_logs

4. Memproses tweets dengan NLP...
‚úÖ 500 tweet berhasil diproses & disimpan
Disimpan 274 dokumen ke alerts

5. Generate data sekolah...
Disimpan 15 dokumen ke schools

PIPELINE SELESAI!
   ‚Ä¢ 500 tweets diproses
   ‚Ä¢ 274 alerts dibuat
   ‚Ä¢ 300 logs CCTV
   ‚Ä¢ 15 data sekolah

MEMBUAT VISUALISASI DAN DASHBOARD

Membuat visualisasi dari MongoDB...
üìä Data dari MongoDB:
   ‚Ä¢ Tweets diproses: 6500
   ‚Ä¢ Log CCTV: 42

   Sentimen dari MongoDB:
   - netral: 3436
   - negatif: 3064
2. Membuat visualisasi risk level...


   Risk Level dari MongoDB:
   - merah: 3503
   - aman: 2949
   - kuning: 48
3. Membuat heatmap anomali CCTV...
   Anomali CCTV dari MongoDB: 2304 records


4. Membuat trend alert harian...


5. Membuat dashboard interaktif...



‚úÖ Visualisasi selesai! Data diambil dari MongoDB.

MEMBUAT DASHBOARD STREAMLIT
‚úÖ Dashboard dengan PETA berhasil dibuat: dashboard_with_map.py
üéØ Jalankan dengan: streamlit run dashboard_with_map.py

‚ú® **FITUR UTAMA:**
1. ‚úÖ **PETA HEATMAP INDONESIA** seperti di notebook
2. ‚úÖ Interface sama dengan yang Anda mau
3. ‚úÖ FIX error MongoDB
4. ‚úÖ Diagram MIRIP dengan notebook
5. ‚úÖ 4 tab: Peta, Visualisasi, Dashboard, Data
6. ‚úÖ Metrics cards dengan styling
7. ‚úÖ Sidebar dengan statistik
8. ‚úÖ Data dummy jika MongoDB error

RINGKASAN HASIL

üìä ANALISIS TWEET:
   ‚Ä¢ Total tweet diproses: 6500
   ‚Ä¢ Sentimen negatif: 3064
   ‚Ä¢ Bullying terdeteksi: 3551
   ‚Ä¢ Risk Level Merah: 3503
   ‚Ä¢ Risk Level Kuning: 48

üìπ DATA CCTV:
   ‚Ä¢ Total log CCTV: 4200
   ‚Ä¢ Anomali terdeteksi: 2304

üö® ALERT SISTEM:
   ‚Ä¢ Total alert: 3835
   ‚Ä¢ Alert aktif (status new): 3835

NEXT STEPS:
1. Jalankan dashboard: streamlit run dashboard_bullying.py
2. Buka browser ke http://localho