# Sistem Deteksi Bullying melalui Analisis Sentimen & CCTV 
**Kelompok:** 
1. Alysa Meliana (F1D02310035)
2. Azizah Indriani Putri (F1D02310041)
3. Baiq Mutia Dewi Edelweiss (F1D02310107)
4. Fairuza Luthfiana (F1D02310111)
5. Syazwani (F1D02310140)
# 
**Universitas Mataram - Teknik Informatika - 2025/2026**
 
## Pendahuluan
Sistem ini dirancang untuk mendeteksi potensi bullying di sekolah melalui analisis sentimen dari media sosial (Twitter) dan data anomaly dari CCTV. Sistem menggunakan NLP untuk klasifikasi sentimen dan machine learning untuk deteksi anomaly.
 

## Setup dan Instalasi

In [44]:
# Install library yang diperlukan
%pip install tweepy pandas numpy matplotlib seaborn scikit-learn nltk textblob wordcloud folium streamlit pymongo dnspython plotly --quiet
%pip install --upgrade nbformat

# %%
import tweepy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import json
import random
from datetime import datetime, timedelta
import time
import warnings
import os
warnings.filterwarnings('ignore')

# NLP Libraries
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from textblob import TextBlob
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix

# Visualisasi & Dashboard
from wordcloud import WordCloud
import folium
from folium.plugins import HeatMap
import plotly.express as px
import plotly.graph_objects as go
import plotly.subplots as sp
from plotly.subplots import make_subplots

# MongoDB
from pymongo import MongoClient
from pymongo.server_api import ServerApi
import urllib.parse

# Streamlit untuk dashboard (akan dijalankan terpisah)
import streamlit as st

# Unduh resources NLTK
nltk.download('punkt', quiet=True)
nltk.download('stopwords', quiet=True)
nltk.download('wordnet', quiet=True)
nltk.download('omw-eng', quiet=True)
nltk.download('punkt_tab')  

# Set style untuk visualisasi
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


[nltk_data] Error loading omw-eng: Package 'omw-eng' not found in
[nltk_data]     index
[nltk_data] Downloading package punkt_tab to C:\Users\LENOVO
[nltk_data]     LOQ\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


## Konfigurasi Twitter API dan MongoDB

In [None]:
# Konfigurasi Twitter API (Ganti dengan kredensial Anda)
TWITTER_API_KEY = ""
TWITTER_API_SECRET = ""
TWITTER_BEARER_TOKEN = ""
TWITTER_ACCESS_TOKEN = ""
TWITTER_ACCESS_TOKEN_SECRET = ""

# Konfigurasi MongoDB
# Untuk MongoDB Atlas (cloud) atau local

# GANTI DENGAN CONNECTION STRING MONGODB ATLAS ANDA
MONGODB_USERNAME = "f1d02310107"
MONGODB_PASSWORD = ""
MONGODB_CLUSTER = ""

# Encode username dan password untuk URL
encoded_username = urllib.parse.quote_plus(MONGODB_USERNAME)
encoded_password = urllib.parse.quote_plus(MONGODB_PASSWORD)

# Connection string untuk MongoDB Atlas
MONGODB_ATLAS_URI = f"mongodb+srv://{encoded_username}:{encoded_password}@{MONGODB_CLUSTER}.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0"

# Tambahkan import ini di bagian atas:

# Nama database dan koleksi
DB_NAME = "bullying_detection"
COLLECTION_TWEETS = "tweets"
COLLECTION_CCTV = "cctv_logs"
COLLECTION_SCHOOLS = "schools"
COLLECTION_ALERTS = "alerts"

## Fungsi Koneksi MongoDB

In [46]:
def connect_mongodb(use_atlas=True):
    try:
        if use_atlas:
            print("üîÑ Mencoba koneksi ke MongoDB Atlas...")
            client = MongoClient(MONGODB_ATLAS_URI, server_api=ServerApi('1'))
        else:
            print("üîÑ Mencoba koneksi ke MongoDB lokal...")
            client = MongoClient(MONGODB_URI_LOCAL)
        
        client.admin.command('ping')
        print("‚úÖ Koneksi berhasil!")
        
        db = client[DB_NAME]
        
        # Buat koleksi jika belum ada
        collections_to_create = {
            COLLECTION_TWEETS: {"validator": {"$jsonSchema": {
                "bsonType": "object",
                "required": ["text", "created_at"],
                "properties": {
                    "text": {"bsonType": "string"},
                    "risk_level": {"enum": ["merah", "kuning", "hijau", "aman"]}
                }
            }}},
            COLLECTION_CCTV: {},
            COLLECTION_SCHOOLS: {},
            COLLECTION_ALERTS: {}
        }
        
        existing_collections = db.list_collection_names()
        
        for collection_name, options in collections_to_create.items():
            if collection_name not in existing_collections:
                if options:  # Jika ada validasi schema
                    db.create_collection(collection_name, **options)
                else:
                    db.create_collection(collection_name)
                print(f"üìÅ Koleksi '{collection_name}' dibuat")
            else:
                print(f"üìÅ Koleksi '{collection_name}' sudah ada")
        
        # Buat index untuk performa query
        print("üîç Membuat index untuk query yang cepat...")
        db[COLLECTION_TWEETS].create_index([("created_at", -1)])  # Index untuk sorting terbaru
        db[COLLECTION_TWEETS].create_index([("risk_level", 1), ("city", 1)])  # Index untuk filter
        db[COLLECTION_ALERTS].create_index([("created_at", -1), ("status", 1)])
        db[COLLECTION_CCTV].create_index([("timestamp", -1), ("is_anomaly", 1)])
        
        print("‚úÖ Semua index berhasil dibuat!")
        
        return client, db
        
    except Exception as e:
        print(f"‚ùå Error: {e}")
        return None, None

## Fungsi untuk Generate Data Dummy (Karena Twitter API Terbatas)

In [47]:
# Daftar kota di Indonesia
CITIES_INDONESIA = [
    "Jakarta", "Surabaya", "Bandung", "Medan", "Semarang", 
    "Makassar", "Palembang", "Depok", "Tangerang", "Bekasi",
    "Mataram", "Denpasar", "Yogyakarta", "Malang", "Surakarta"
]

# Tambahkan dictionary koordinat kota Indonesia
CITY_COORDINATES = {
    "Jakarta": {"lat": -6.2088, "lon": 106.8456},
    "Surabaya": {"lat": -7.2575, "lon": 112.7521},
    "Bandung": {"lat": -6.9175, "lon": 107.6191},
    "Medan": {"lat": 3.5952, "lon": 98.6722},
    "Semarang": {"lat": -6.9667, "lon": 110.4167},
    "Makassar": {"lat": -5.1477, "lon": 119.4327},
    "Palembang": {"lat": -2.9911, "lon": 104.7567},
    "Depok": {"lat": -6.4025, "lon": 106.7942},
    "Tangerang": {"lat": -6.1781, "lon": 106.6300},
    "Bekasi": {"lat": -6.2349, "lon": 106.9920},
    "Mataram": {"lat": -8.5833, "lon": 116.1167},
    "Denpasar": {"lat": -8.6500, "lon": 115.2167},
    "Yogyakarta": {"lat": -7.8014, "lon": 110.3644},
    "Malang": {"lat": -7.9833, "lon": 112.6333},
    "Surakarta": {"lat": -7.5667, "lon": 110.8167}
}

# Daftar sekolah contoh
SCHOOLS_INDONESIA = [
    "SMP Negeri 1", "SMP Negeri 2", "SMP Negeri 3", "SMP Negeri 4",
    "SMA Negeri 1", "SMA Negeri 2", "SMA Negeri 3", "SMA Negeri 4",
    "SMK Negeri 1", "SMK Negeri 2", "SMP Swasta ABC", "SMA Swasta XYZ"
]

# Template tweet untuk berbagai kategori
TWEET_TEMPLATES = {
    "korban_direct": [
        "aku ngerasa sedih banget di sekolah, selalu dijauhin teman-teman",
        "gak kuat lagi, setiap hari dihina di kelas",
        "mau nangis, di-bully terus di sekolah",
        "kenapa ya aku selalu jadi bahan olokan? capek mental",
        "gak mau sekolah lagi, trauma di-bully",
        "dipukuli teman di kamar mandi sekolah hari ini",
        "uang jajan selalu diambil paksa sama senior, takut melapor",
        "diancam kalau melapor ke guru, jadi diam aja sedih",
        "hari ini diejek habis-habisan di depan kelas, malu banget",
        "rasanya mau hilang saja, gak tahan dibully terus"
    ],
    "pelaku": [
        "wkwk si bodoh itu makin lama makin tolol aja, asyik ngejek dia",
        "goblok banget sih dia, gampang banget dibully, seru dah",
        "asyik ngejek si cupu tadi, reaksinya lucu banget hahaha",
        "hari ini ngerjain si culun lagi, reaksinya selalu bikin ketawa",
        "gampang banget nakutin anak baru, dia langsung nangis wkwk",
        "ngegangguin dia tuh seru, gak pernah berani melawan",
        "wajahnya aja udah minta dijahilin, jadi ya kita jahilin",
        "bully itu seru sih, apalagi kalo korban lemah dan gak melawan",
        "tadi ngerjain anak kelas 7, langsung lari dia wkwk",
        "bikin malu dia di depan temen-temen, asyik banget dah"
    ],
    "korban_potensial": [
        "mulai ngerasa dijauhin teman-teman akhir-akhir ini",
        "gak tau kenapa teman sekelas mulai menghindariku",
        "sedih sih, kayaknya mulai gak diterima di kelompok",
        "apa aku salah ya? kok teman-teman mulai menjaga jarak",
        "mulai ngerasa sendiri di sekolah padahal dulu ramai",
        "kayaknya aku mulai diabaikan sama teman dekat",
        "rasanya ada yang beda, teman-teman mulai berubah",
        "mulai gak diundang ke acara teman-teman",
        "kayaknya ada yang gak beres dengan pertemananku",
        "mulai ngerasa kesepian di sekolah"
    ],
    "saksi": [
        "kasihan lihat temenku selalu dijauhin dan dihina",
        "ada anak di sekolahku yang sering nangis di toilet sendirian",
        "liat temen dibully tapi takut ikut campur, sedih banget",
        "kenapa sih ada yang tega bully anak orang? kasihan liatnya",
        "di sekolahku ada geng yang suka nakut-nakutin adik kelas",
        "sedih liat anak kelas 7 dibully sampai mogok sekolah",
        "harusnya sekolah jadi tempat aman, bukan tempat bully",
        "tadi liat anak dipukuli di parkiran, tapi takut melapor",
        "ada yang tau cara bantu korban bullying tanpa ikut dibully?",
        "liat teman baikku dijauhin teman lain, sedih tapi bingung"
    ],
    "support": [
        "akhirnya ada teman yang mau dengerin ceritaku, terima kasih",
        "terima kasih buat guru yang bantu atasi bullying di sekolah",
        "sekolahku mulai program anti bullying, semoga berhasil",
        "ada support group untuk korban bullying di sekolahku",
        "senang bisa bantu teman yang dibully, semoga dia kuat",
        "kampanye anti bullying berhasil di sekolah, semoga terus",
        "guru BK sangat membantu masalahku, terima kasih banyak",
        "sekolah yang peduli membuat perbedaan besar, bangga",
        "teman-teman mulai berubah jadi lebih baik dan peduli",
        "merasa lebih aman di sekolah sekarang, terima kasih semua"
    ],
    "report": [
        "hari ini melapor ke guru tentang bullying di kelas",
        "sudah laporkan ke pihak sekolah tentang kekerasan di kantin",
        "melaporkan teman yang dibully ke guru BK hari ini",
        "orang tua sudah melapor ke sekolah tentang bullying",
        "ada pengaduan resmi tentang bullying di sekolah kami",
        "sudah melapor ke polisi tentang bullying yang parah",
        "melaporkan ke dinas pendidikan tentang kasus di sekolah",
        "pengaduan resmi sudah dibuat untuk kasus bullying",
        "melapor ke konselor sekolah tentang teman yang dibully",
        "pengaduan tentang bullying sudah disampaikan ke kepala sekolah"
    ],
    "positif_umum": [
        "hari ini belajar kelompok seru banget dengan teman-teman",
        "olimpiade sains di sekolah menyenangkan dan menantang",
        "kegiatan ekstrakurikuler basket hari ini asyik banget",
        "presentasi di kelas berjalan lancar, senang sekali",
        "dapat nilai bagus ujian, semangat belajar terus",
        "acara perpisahan sekolah penuh kenangan indah",
        "study tour ke museum edukatif dan menyenangkan",
        "kerja bakti di sekolah membuat lingkungan lebih bersih",
        "upacara bendera hari ini khidmat dan tertib",
        "kegiatan pramuka mengajarkan banyak keterampilan baru"
    ]
}

# Distribusi (dalam persentase):
# 25% korban_direct, 15% pelaku, 20% saksi, 10% support, 10% report, 10% positif_umum, 10% korban_potensial

# Keywords untuk deteksi bullying
BULLYING_KEYWORDS = [
    'bully', 'dibully', 'korban', 'pelaku', 'dihina', 'diejek',
    'dipukul', 'diancam', 'diusir', 'dijauhin', 'disingkirkan',
    'mental', 'depresi', 'trauma', 'sedih', 'nangis', 'sendiri',
    'takut', 'cemas', 'stress', 'sekolah', 'kelas', 'teman',
    'guru', 'konseling', 'psikolog', 'bimbingan', 'lapor',
    'melapor', 'pengaduan', 'kekerasan', 'emosional', 'fisik'
]

def generate_dummy_tweets(num_tweets=1000):
    """Generate data dummy"""
    tweets = []
    
    # DISTRIBUSI YANG LEBIH REALISTIS:
    categories = []
    categories += ["korban_direct"] * int(num_tweets * 0.25)      # 25%
    categories += ["pelaku"] * int(num_tweets * 0.15)             # 15%
    categories += ["saksi"] * int(num_tweets * 0.20)              # 20%
    categories += ["support"] * int(num_tweets * 0.10)            # 10%
    categories += ["report"] * int(num_tweets * 0.10)             # 10%
    categories += ["positif_umum"] * int(num_tweets * 0.10)       # 10%
    categories += ["korban_potensial"] * int(num_tweets * 0.10)   # 10%
    
    # Pastikan jumlah tepat
    categories = categories[:num_tweets]
    
    for i in range(num_tweets):
        category = categories[i]
        text = random.choice(TWEET_TEMPLATES[category])
        
        # Tambahkan variasi
        variations = [
            f" #{random.choice(['stopbullying', 'antibullying', 'mentalhealth', 'sekolahaman'])}",
            f" @{random.choice(['kemdikbud_ri', 'kemenpppa', 'school', 'teacher'])}",
            "",
            "",
            ""  # 60% tanpa hashtag/mention
        ]
        text += random.choice(variations)
        
        # Generate metadata
        city = random.choice(CITIES_INDONESIA)
        school = f"{random.choice(SCHOOLS_INDONESIA)} {city}"
        
        # Timestamp dengan distribusi waktu yang lebih realistis
        days_ago = random.randint(0, 30)
        hours_ago = random.randint(0, 23)
        minutes_ago = random.randint(0, 59)
        created_at = datetime.now() - timedelta(days=days_ago, hours=hours_ago, minutes=minutes_ago)
        
        # Engagement metrics yang lebih realistis
        retweet_count = random.randint(0, 100)
        like_count = random.randint(0, 200)
        reply_count = random.randint(0, 50)
        
        # Risk level awal (akan diupdate setelah processing)
        initial_risk = "aman"
        
        tweet_data = {
            "tweet_id": f"dummy_{i}_{int(time.time())}",
            "text": text,
            "original_category": category,  # Simpan kategori asli
            "city": city,
            "school": school,
            "created_at": created_at,
            "retweet_count": retweet_count,
            "like_count": like_count,
            "reply_count": reply_count,
            "author_id": f"user_{random.randint(10000, 99999)}",
            "is_dummy": True,
            "processed": False,
            "risk_level": initial_risk
        }
        
        tweets.append(tweet_data)
    
    print(f"‚úÖ Generated {len(tweets)} tweets dengan distribusi:")
    for cat in set(categories):
        count = categories.count(cat)
        print(f"   ‚Ä¢ {cat}: {count} ({count/len(tweets)*100:.1f}%)")
    
    return tweets

## Fungsi untuk Generate Data CCTV Dummy

In [48]:
# Di fungsi generate_cctv_data, perbaiki aturan anomali:
def generate_cctv_data(num_records=500):
    """Generate data dummy CCTV log - DIUPDATE"""
    cctv_logs = []
    
    locations_rules = {
        "gerbang": {"normal_crowd": 5, "anomaly_threshold": 20},
        "lorong": {"normal_crowd": 3, "anomaly_threshold": 15},
        "kantin": {"normal_crowd": 30, "anomaly_threshold": 50},
        "lapangan": {"normal_crowd": 10, "anomaly_threshold": 30},
        "parkir": {"normal_crowd": 5, "anomaly_threshold": 15},
        "toilet": {"normal_crowd": 2, "anomaly_threshold": 5},
        "kelas": {"normal_crowd": 25, "anomaly_threshold": 35}
    }
    
    for i in range(num_records):
        city = random.choice(CITIES_INDONESIA)
        school = f"{random.choice(SCHOOLS_INDONESIA)} {city}"
        location = random.choice(list(locations_rules.keys()))
        location_rule = locations_rules[location]
        
        # Generate waktu yang lebih realistis
        today = datetime.now().date()
        hour = random.randint(6, 18)  # Jam 6 pagi - 6 sore
        minute = random.randint(0, 59)
        timestamp = datetime(today.year, today.month, today.day, hour, minute)
        
        # Generate metrics berdasarkan lokasi
        base_crowd = location_rule["normal_crowd"]
        crowd_level = random.randint(max(1, base_crowd-5), base_crowd+10)
        
        # Noise level: lebih tinggi di kantin/lapangan
        if location in ["kantin", "lapangan"]:
            noise_level = random.randint(50, 80)
        else:
            noise_level = random.randint(30, 60)
        
        # Tentukan apakah anomali
        is_anomaly = False
        warning_level = "hijau"
        
        # Rule 1: Keramaian melebihi threshold
        if crowd_level > location_rule["anomaly_threshold"]:
            is_anomaly = True
            warning_level = "merah" if crowd_level > location_rule["anomaly_threshold"] * 1.5 else "kuning"
        
        # Rule 2: Kebisingan tinggi di luar kantin/lapangan
        if location not in ["kantin", "lapangan"] and noise_level > 65:
            is_anomaly = True
            warning_level = "merah" if noise_level > 75 else "kuning"
        
        # Rule 3: Keramaian di jam istirahat (10-10:30, 12-13)
        is_break_time = (hour == 10 and minute < 30) or (hour == 12)
        if is_break_time and crowd_level < 5 and location in ["kantin", "lapangan"]:
            is_anomaly = True  # Terlalu sepi saat istirahat
            warning_level = "kuning"
        
        cctv_log = {
            "log_id": f"cctv_log_{i}_{int(time.time())}",
            "cctv_id": f"cctv_{random.randint(1, 50)}",
            "school": school,
            "city": city,
            "location": location,
            "timestamp": timestamp,
            "crowd_level": crowd_level,
            "noise_level": noise_level,
            "is_anomaly": is_anomaly,
            "warning_level": warning_level,
            "processed": False
        }
        
        cctv_logs.append(cctv_log)
    
    return cctv_logs

## Fungsi Preprocessing dan Analisis Sentimen NLP

In [49]:
# Inisialisasi stopwords Indonesia
stop_words_indonesia = set(stopwords.words('indonesian') if 'indonesian' in stopwords.fileids() else [])
# Tambahkan custom stopwords
custom_stopwords = ['yg', 'dg', 'rt', 'dgn', 'ny', 'd', 'klo', 
                   'kalo', 'amp', 'biar', 'bikin', 'bilang', 
                   'gak', 'ga', 'krn', 'nya', 'nih', 'sih',
                   'si', 'tau', 'tdk', 'tuh', 'utk', 'ya',
                   'jd', 'jgn', 'sdh', 'aja', 'n', 't',
                   'nyg', 'hehe', 'wkwk', 'lol', 'haha']

stop_words_indonesia.update(custom_stopwords)

In [50]:
def preprocess_text(text):
    """Preprocessing text untuk analisis NLP"""
    if not isinstance(text, str):
        return ""
    
    # Lowercase
    text = text.lower()
    
    # Remove URLs
    text = re.sub(r'http\S+|www\S+|https\S+', '', text, flags=re.MULTILINE)
    
    # Remove mentions and hashtags (tapi simpan teksnya)
    text = re.sub(r'@\w+', '', text)
    text = re.sub(r'#(\w+)', r'\1', text)
    
    # Remove punctuations and numbers
    text = re.sub(r'[^\w\s]', ' ', text)
    text = re.sub(r'\d+', '', text)
    
    # Tokenization
    tokens = word_tokenize(text)
    
    # Remove stopwords
    tokens = [word for word in tokens if word not in stop_words_indonesia]
    
    # Remove short words
    tokens = [word for word in tokens if len(word) > 2]
    
    return ' '.join(tokens)

In [51]:
def analyze_sentiment(text):
    """Analisis sentimen menggunakan TextBlob"""
    
    text_lower = text.lower()
    
    # ========== KAMUS KATA DENGAN BOBOT LEBIH TEPAT ==========
    # Bobot: -5 (sangat negatif) sampai +5 (sangat positif)
    
    # === KATA POSITIF (mendukung/perbaikan) ===
    positive_words = {
        # Support/helping words: HIGH POSITIVE (+3 to +5)
        'support': 5, 'dukung': 5, 'bantu': 4, 'tolong': 4, 'peduli': 5,
        'membantu': 4, 'terima kasih': 5, 'bangga': 4, 'semangat': 3,
        'positif': 3, 'baik': 3, 'aman': 3, 'nyaman': 3, 'senang': 3,
        'bahagia': 4, 'gembira': 3, 'legah': 3, 'tenang': 2, 'damai': 3,
        'ceria': 2, 'optimis': 3, 'sukses': 3, 'berhasil': 3,
        
        # Anti-bullying initiatives: POSITIVE CONTEXT
        'anti bullying': 4, 'stop bullying': 4, 'kampanye': 2,
        'program': 1, 'konseling': 2, 'psikolog': 1, 'guru bk': 2,
        'workshop': 1, 'pelatihan': 1, 'edukasi': 2,
        
        # Recovery/support terms
        'pulih': 3, 'sembuh': 3, 'bangkit': 4, 'melawan': 2,
        'berani': 3, 'kuat': 3, 'percaya diri': 3,
        
        # Community/social support
        'komunitas': 2, 'kelompok': 1, 'teman': 2, 'sahabat': 3,
        'keluarga': 3, 'orang tua': 2, 'guru': 1, 'sekolah': 0,
    }
    
    # === KATA NEGATIF (korban/penderitaan) ===
    negative_words = {
        # Physical violence: VERY NEGATIVE (-4 to -5)
        'pukul': -5, 'dipukul': -5, 'pukuli': -5, 'pukulan': -4,
        'tendang': -4, 'ditendang': -4, 'tampar': -4, 'ditampar': -4,
        'cubit': -3, 'dicubit': -3, 'dorong': -3, 'didorong': -3,
        'serang': -4, 'diserang': -4, 'aniaya': -5, 'dianiaya': -5,
        
        # Emotional abuse: NEGATIVE (-3 to -4)
        'hina': -4, 'dihina': -5, 'hinaan': -4, 'ejek': -4, 'diejek': -5,
        'olok': -3, 'diolok': -4, 'cela': -3, 'dicela': -4,
        'rendahkan': -3, 'direndahkan': -4, 'hancurkan': -4, 'dihancurkan': -5,
        
        # Social exclusion: NEGATIVE (-3 to -4)
        'jauhi': -4, 'dijauhi': -5, 'kucil': -4, 'dikucilkan': -5,
        'asing': -3, 'terasing': -4, 'tolak': -3, 'ditolak': -4,
        'abaikan': -3, 'diabaikan': -4, 'singkir': -3, 'disingkirkan': -4,
        
        # Threats/intimidation: VERY NEGATIVE (-4 to -5)
        'ancam': -5, 'diancam': -5, 'ancaman': -4, 'intimidasi': -4,
        'teror': -5, 'terorisasi': -5, 'takut': -4, 'ketakutan': -4,
        'teror': -4, 'diteror': -5,
        
        # Theft/extortion: NEGATIVE (-4)
        'ambil': -4, 'diambil': -4, 'rampas': -5, 'dirampas': -5,
        'paksa': -4, 'dipaksa': -5, 'paksaan': -4, 'peras': -5, 'dipengaruhi': -4,
        
        # Emotional state: NEGATIVE (-2 to -3)
        'sedih': -3, 'kesedihan': -3, 'sakit hati': -4, 'terluka': -3,
        'kecewa': -3, 'kekecewaan': -3, 'marah': -3, 'kemarahan': -3,
        'benci': -4, 'kebencian': -4, 'jengkel': -2, 'kesal': -2,
        
        # Mental health issues: NEGATIVE (-3 to -5)
        'trauma': -5, 'traumatis': -5, 'depresi': -5, 'stress': -4,
        'cemas': -3, 'kecemasan': -3, 'panik': -3, 'kepanikan': -3,
        'putus asa': -5, 'tertekan': -4, 'gelisah': -3, 'bingung': -2,
        
        # School avoidance: NEGATIVE (-4)
        'bolos': -3, 'membolos': -3, 'takut sekolah': -4, 'enggan sekolah': -3,
        'mogok sekolah': -4, 'tidak mau sekolah': -4,
    }
    
    # === KATA BULLYING (konteks spesifik) ===
    # Kata "bullying" sendiri bisa netral/negatif tergantung konteks
    bullying_words = {
        'bully': -2,  # Netral jika sendiri
        'dibully': -4,  # Negatif jika sebagai korban
        'bullying': -2,  # Netral jika sendiri
        'pembullyan': -2,
        'pelaku': -1,
        'korban': -1,
    }
    
    # === KATA PERBAIKAN/POSITIF dalam konteks bullying ===
    # Kata-kata ini membuat konteks bullying menjadi POSITIF
    positive_context_words = {
        'support group': 5, 'dukungan': 4, 'bantuan': 4, 'konseling': 3,
        'psikolog': 2, 'guru bk': 2, 'terapi': 3, 'rehabilitasi': 3,
        'pemulihan': 4, 'pencegahan': 3, 'penanganan': 3, 'solusi': 3,
        'perlindungan': 4, 'keamanan': 3, 'perhatian': 2, 'peduli': 4,
        'advokasi': 3, 'pendampingan': 3, 'mediasi': 2,
    }
    
    # ========== ANALISIS KONTEKS ==========
    def check_phrase(text, phrase):
        """Cek apakah frase ada di text"""
        return phrase in text
    
    def calculate_sentiment():
        """Hitung sentiment dengan konteks"""
        score = 0
        detected_words = []
        
        # 1. Cek frase positif panjang dulu (prioritas tinggi)
        for phrase, weight in positive_context_words.items():
            if check_phrase(text_lower, phrase):
                score += weight
                detected_words.append((f"PHRASE:{phrase}", weight))
        
        # 2. Cek kata individual
        # Positive words
        for word_dict, words in [
            (positive_words, "POS"),
            (negative_words, "NEG"),
            (bullying_words, "BLY"),
            (positive_context_words, "PCTX")
        ]:
            for word, weight in word_dict.items():
                if f" {word} " in f" {text_lower} " or text_lower.startswith(word) or text_lower.endswith(word):
                    # Skip jika sudah dihitung sebagai phrase
                    if not any(f"PHRASE:{word}" in str(dw) for dw in detected_words):
                        score += weight
                        detected_words.append((f"{words}:{word}", weight))
        
        return score, detected_words
    
    # ========== TENTUKAN KONTEKS UTAMA ==========
    # Cek apakah ini tentang SUPPORT/RECOVERY atau SUFFERING
    is_support_context = any(
        check_phrase(text_lower, phrase) 
        for phrase in ['support group', 'dukungan', 'bantuan', 'konseling', 'pemulihan']
    )
    
    is_victim_context = any(
        check_phrase(text_lower, phrase)
        for phrase in ['aku ', 'saya ', 'gw ', 'gue ', 'diriku ', 'dipukul', 'dihina', 'diancam']
    )
    
    # ========== HITUNG SKOR ==========
    base_score, detected_words = calculate_sentiment()
    
    # ========== ADJUSTMENTS BERDASARKAN KONTEKS ==========
    final_score = base_score
    
    # ADJUSTMENT 1: Jika ada kata "support group" + "bullying" ‚Üí POSITIF
    if is_support_context and any('bully' in word.lower() for word, _ in detected_words):
        final_score += 6  # Large positive boost
        print(f"   CONTEXT ADJUSTMENT: Support context + bullying = +6")
    
    # ADJUSTMENT 2: Jika ada kata "diambil paksa" ‚Üí SANGAT NEGATIF
    if any(phrase in text_lower for phrase in ['diambil paksa', 'dirampas', 'dipaksa']):
        final_score -= 8  # Large negative adjustment
        print(f"   CONTEXT ADJUSTMENT: 'diambil paksa' = -8")
    
    # ADJUSTMENT 3: Jika korban langsung (aku/saya + kata negatif)
    if is_victim_context and base_score < 0:
        final_score -= 3  # Lebih negatif jika korban langsung
        print(f"   CONTEXT ADJUSTMENT: Victim direct speech = -3")
    
    # ========== TENTUKAN SENTIMEN FINAL ==========
    # Normalize score
    final_score = max(-10, min(10, final_score))
    
    if final_score >= 2:
        sentiment = "positif"
    elif final_score <= -2:
        sentiment = "negatif"
    else:
        sentiment = "netral"
    
    # ========== TENTUKAN KATEGORI ==========
    # Hitung kata spesifik
    pos_count = sum(1 for w, wt in detected_words if wt > 0)
    neg_count = sum(1 for w, wt in detected_words if wt < 0)
    bully_count = sum(1 for w, wt in detected_words if "BLY:" in str(w))
    
    category = "unknown"
    
    # Korban langsung
    if any(w in ['aku', 'saya', 'gw', 'gue'] for w in text_lower.split()):
        if any('dipukul' in w or 'dihina' in w or 'diancam' in w for w in text_lower.split()):
            category = "korban_direct"
        elif neg_count > 0:
            category = "korban_potensial"
    
    # Support/help context
    elif is_support_context:
        category = "support"
    
    # Pelaku
    elif any(w in ['asyik', 'lucu', 'seru', 'wkwk', 'haha'] for w in text_lower.split()):
        if any('bully' in w for w in text_lower.split()):
            category = "pelaku"
    
    # Saksi/laporan
    elif any(w in ['liat', 'lihat', 'temen', 'teman', 'kasihan'] for w in text_lower.split()):
        if bully_count > 0:
            category = "saksi"
    
    # Report
    elif any(w in ['lapor', 'melapor', 'laporkan', 'pengaduan'] for w in text_lower.split()):
        category = "report"
    
    return {
        "sentiment": sentiment,
        "bullying_detected": bully_count > 0,
        "category": category,
        "score": final_score,
        "positive_words": pos_count,
        "negative_words": neg_count,
        "bullying_words": bully_count,
        "detected_words": detected_words[:8],
        "is_support_context": is_support_context,
        "is_victim_context": is_victim_context
    }

In [52]:
def calculate_risk_level(tweet_data, sentiment_result):
    """Hitung risk level dengan konteks yang benar"""
    
    risk_score = 0
    
    # ========== 1. BASE SCORE dari sentiment ==========
    sentiment_base = {
        "positif": 0,    # Positif = risiko rendah
        "netral": 3,     # Netral = risiko medium
        "negatif": 8     # Negatif = risiko tinggi
    }
    risk_score += sentiment_base.get(sentiment_result["sentiment"], 3)
    
    # ========== 2. KATEGORI RISIKO ==========
    category_risk = {
        # HIGH RISK (10-15)
        "korban_direct": 12,      # Korban langsung: DARURAT
        
        # MEDIUM-HIGH RISK (7-10)
        "pelaku": 9,              # Pelaku: perlu intervensi
        "korban_potensial": 8,    # Potensi korban: waspada
        
        # MEDIUM RISK (4-7)
        "saksi": 6,               # Saksi: perlu perhatian
        
        # LOW RISK (1-4)
        "report": 4,              # Laporan: monitoring
        "unknown": 3,             # Tidak diketahui
        "support": 1,             # Support: risiko sangat rendah
        
        # VERY LOW RISK (0-1)
        "positif_umum": 0,        # Positif umum: aman
    }
    risk_score += category_risk.get(sentiment_result["category"], 3)
    
    # ========== 3. KONTEKS SPESIAL ==========
    # CASE 1: "support group" ‚Üí TURUNKAN risiko meski ada kata "bullying"
    if sentiment_result.get("is_support_context", False):
        risk_score -= 6  # Large reduction
        print(f"   RISK ADJUSTMENT: Support context = -6")
    
    # CASE 2: "diambil paksa" ‚Üí TINGKATKAN risiko signifikan
    text_lower = tweet_data.get('text', '').lower()
    if any(phrase in text_lower for phrase in ['diambil paksa', 'dirampas', 'dipaksa']):
        risk_score += 8  # Large increase
        print(f"   RISK ADJUSTMENT: 'diambil paksa' = +8")
    
    # CASE 3: "uang jajan diambil" ‚Üí RISIKO TINGGI
    if 'uang jajan' in text_lower and any(w in text_lower for w in ['ambil', 'rampas', 'paksa']):
        risk_score += 7
        print(f"   RISK ADJUSTMENT: 'uang jajan diambil' = +7")
    
    # ========== 4. FAKTOR LAIN ==========
    # Engagement (viral = lebih berbahaya)
    engagement = (tweet_data.get('retweet_count', 0) * 2 + 
                  tweet_data.get('like_count', 0) + 
                  tweet_data.get('reply_count', 0))
    
    if engagement > 100:
        risk_score += 4
    elif engagement > 50:
        risk_score += 2
    elif engagement > 20:
        risk_score += 1
    
    # Recency (lebih baru = lebih urgent)
    created_at = tweet_data.get('created_at', datetime.now())
    hours_old = (datetime.now() - created_at).total_seconds() / 3600
    
    if hours_old < 6:      # Kurang dari 6 jam
        risk_score += 3
    elif hours_old < 24:   # Kurang dari 24 jam
        risk_score += 2
    elif hours_old < 72:   # Kurang dari 3 hari
        risk_score += 1
    
    # ========== 5. NORMALIZE & TENTUKAN LEVEL ==========
    risk_score = max(0, min(20, risk_score))
    
    # THRESHOLD YANG BENAR:
    if risk_score >= 15:    # 15-20: DARURAT
        level = "merah"
    elif risk_score >= 10:  # 10-14: WASPADA
        level = "kuning"
    elif risk_score >= 5:   # 5-9: PERHATIAN
        level = "hijau"
    else:                   # 0-4: AMAN
        level = "aman"
    
    return level, risk_score

## Fungsi untuk Menyimpan dan Memproses Data

In [53]:
def save_to_mongodb(db, collection_name, data):
    """Menyimpan data ke MongoDB"""
    try:
        collection = db[collection_name]
        
        if isinstance(data, list):
            result = collection.insert_many(data)
            print(f"Disimpan {len(result.inserted_ids)} dokumen ke {collection_name}")
        else:
            result = collection.insert_one(data)
            print(f"Disimpan 1 dokumen ke {collection_name}")
        
        return result
    except Exception as e:
        print(f"Error saving to MongoDB: {e}")
        return None


In [54]:
def save_processed_tweets(db, processed_tweets):
    """Simpan tweet yang sudah diproses, update jika sudah ada"""
    collection = db[COLLECTION_TWEETS]
    
    for tweet in processed_tweets:
        # Update dokumen berdasarkan tweet_id, jika tidak ada buat baru
        collection.update_one(
            {"tweet_id": tweet["tweet_id"]},  # Cari berdasarkan tweet_id
            {"$set": tweet},  # Update semua field
            upsert=True  # Buat baru jika tidak ditemukan
        )
    
    print(f"‚úÖ {len(processed_tweets)} tweet berhasil diproses & disimpan")

In [55]:
def process_tweets(db, tweets):
    """Proses tweets """
    processed_tweets = []
    alerts = []
    
    for tweet in tweets:
        print(f"\nüìù Processing: {tweet['text'][:50]}...")
        
        # Preprocessing
        cleaned_text = preprocess_text(tweet['text'])
        
        # Analisis sentiment V2 (FIXED)
        sentiment_result = analyze_sentiment(cleaned_text)
        
        print(f"   Sentiment: {sentiment_result['sentiment']}, Score: {sentiment_result['score']}")
        print(f"   Category: {sentiment_result['category']}")
        if sentiment_result['detected_words']:
            print(f"   Key words: {sentiment_result['detected_words'][:3]}")
        
        # Risk level V2 (FIXED)
        risk_level, risk_score = calculate_risk_level(tweet, sentiment_result)
        
        print(f"   Risk Level: {risk_level}, Score: {risk_score}/20")
        
        # Update tweet
        tweet.update({
            'processed_text': cleaned_text,
            'sentiment': sentiment_result['sentiment'],
            'bullying_detected': sentiment_result['bullying_detected'],
            'category': sentiment_result['category'],
            'risk_level': risk_level,
            'risk_score': risk_score,
            'processed': True,
            'processed_at': datetime.now(),
            'sentiment_score': sentiment_result['score'],
            'is_support_context': sentiment_result.get('is_support_context', False),
            'is_victim_context': sentiment_result.get('is_victim_context', False)
        })
        
        processed_tweets.append(tweet)
        
        # Alert untuk risk tinggi
        if risk_level in ["merah", "kuning"]:
            alert = {
                "alert_id": f"alert_{tweet.get('tweet_id', 'unknown')}_{int(time.time())}",
                "tweet_id": tweet.get('tweet_id'),
                "school": tweet.get('school'),
                "city": tweet.get('city'),
                "risk_level": risk_level,
                "risk_score": risk_score,
                "text": tweet['text'][:200],
                "sentiment": sentiment_result['sentiment'],
                "category": sentiment_result['category'],
                "created_at": datetime.now(),
                "status": "new",
                "alert_type": "tweet_analysis",
                "priority": "high" if risk_level == "merah" else "medium"
            }
            alerts.append(alert)
            print(f"   üö® ALERT CREATED: {risk_level}")
    
    # Save ke MongoDB
    if processed_tweets:
        save_processed_tweets(db, processed_tweets)
        print(f"\n‚úÖ Saved {len(processed_tweets)} processed tweets")
    
    if alerts:
        save_to_mongodb(db, COLLECTION_ALERTS, alerts)
        print(f"‚úÖ Created {len(alerts)} alerts")
    
    return processed_tweets, alerts

## Main Pipeline - Generate dan Proses Data

In [56]:
# ## Tampilkan Sample Data 
def display_sample_data(db):
    """Tampilkan sample tweet dan CCTV log di notebook"""
    
    print("\n" + "="*60)
    print("üìã SAMPLE DATA TWEET DAN CCTV LOG")
    print("="*60)
    
    # Ambil data dari MongoDB
    tweets_collection = db[COLLECTION_TWEETS]
    cctv_collection = db[COLLECTION_CCTV]
    
    # Ambil 10 tweet terbaru
    latest_tweets = list(tweets_collection.find({"processed": True})
                        .sort("created_at", -1)
                        .limit(10))
    
    print("\nüîπ **10 TWEET TERBARU:**")
    print("-" * 80)
    for i, tweet in enumerate(latest_tweets, 1):
        print(f"{i}. [{tweet.get('created_at', 'N/A')}]")
        print(f"   Kota: {tweet.get('city', 'N/A')}")
        print(f"   Sekolah: {tweet.get('school', 'N/A')}")
        print(f"   Text: {tweet.get('text', 'N/A')[:100]}...")
        print(f"   Sentimen: {tweet.get('sentiment', 'N/A')}")
        print(f"   Risk Level: {tweet.get('risk_level', 'N/A')}")
        print(f"   Category: {tweet.get('category', 'N/A')}")
        print("-" * 80)
    
    # Ambil 10 log CCTV terbaru
    latest_cctv = list(cctv_collection.find()
                      .sort("timestamp", -1)
                      .limit(10))
    
    print("\nüîπ **10 CCTV LOG TERBARU:**")
    print("-" * 80)
    for i, log in enumerate(latest_cctv, 1):
        print(f"{i}. [ID: {log.get('cctv_id', 'N/A')}]")
        print(f"   Waktu: {log.get('timestamp', 'N/A')}")
        print(f"   Lokasi: {log.get('location', 'N/A')}")
        print(f"   Sekolah: {log.get('school', 'N/A')}")
        print(f"   Keramaian: {log.get('crowd_level', 'N/A')} orang")
        print(f"   Kebisingan: {log.get('noise_level', 'N/A')} dB")
        print(f"   Anomali: {'‚úÖ YA' if log.get('is_anomaly', False) else '‚ùå TIDAK'}")
        print(f"   Warning Level: {log.get('warning_level', 'N/A')}")
        print("-" * 80)
    
    return latest_tweets, latest_cctv

In [57]:
def main_pipeline():
    """Main pipeline untuk generate dan proses data"""
    print("=" * 50)
    print("MEMULAI PIPELINE ANALISIS BULLYING")
    print("=" * 50)
    
    # 1. Koneksi MongoDB
    print("\n1. Menghubungkan ke MongoDB...")
    # Panggil fungsi baru
    client, db = connect_mongodb(use_atlas=True)

     # Jika Atlas gagal, coba lokal
    if db is None:
        print("\nüîÑ MongoDB Atlas gagal, mencoba MongoDB lokal...")
        client, db = connect_mongodb(use_atlas=False)
    
    # 2. Generate data dummy
    print("\n2. Generate data dummy...")
    print("   - Generating tweets...")
    dummy_tweets = generate_dummy_tweets(1500)  # 1500 tweets
    
    print("   - Generating CCTV logs...")
    cctv_logs = generate_cctv_data(300)  # 300 CCTV logs
    
    # 3. Simpan data mentah
    print("\n3. Menyimpan data mentah ke MongoDB...")
    save_to_mongodb(db, COLLECTION_TWEETS, dummy_tweets[:500])  # Simpan 500 dulu
    save_to_mongodb(db, COLLECTION_CCTV, cctv_logs)
    
    # 4. Proses tweets dengan NLP
    print("\n4. Memproses tweets dengan NLP...")
    processed_tweets, alerts = process_tweets(db, dummy_tweets[:500])
    
    # 5. Generate data sekolah
    print("\n5. Generate data sekolah...")
    schools_data = []
    for city in CITIES_INDONESIA[:5]:  # Ambil 5 kota pertama
        for i in range(1, 4):
            school = {
                "school_id": f"school_{city.lower()}_{i}",
                "name": f"SMP Negeri {i} {city}",
                "city": city,
                "type": "SMP",
                "total_students": random.randint(300, 800),
                "counselor_count": random.randint(1, 3),
                "cctv_count": random.randint(5, 15),
                "risk_level": random.choice(["hijau", "kuning", "merah"]),
                "last_incident": datetime.now() - timedelta(days=random.randint(0, 90))
            }
            schools_data.append(school)
    
    save_to_mongodb(db, COLLECTION_SCHOOLS, schools_data)
    latest_tweets, latest_cctv = display_sample_data(db)
    
    print("\n" + "=" * 50)
    print("PIPELINE SELESAI!")
    print(f"   ‚Ä¢ {len(processed_tweets)} tweets diproses")
    print(f"   ‚Ä¢ {len(alerts)} alerts dibuat")
    print(f"   ‚Ä¢ {len(cctv_logs)} logs CCTV")
    print(f"   ‚Ä¢ {len(schools_data)} data sekolah")
    print("=" * 50)
    
    return db, processed_tweets, alerts, cctv_logs, schools_data

## Visualisasi dan Dashboard

In [58]:
def create_choropleth_heatmap(anomaly_df):
    """Buat choropleth heatmap peta Indonesia"""
    if anomaly_df.empty or 'city' not in anomaly_df.columns:
        print("‚ö†Ô∏è  Tidak ada data untuk heatmap")
        return None
    
    # Hitung anomali per kota
    city_anomalies = anomaly_df.groupby('city').size().reset_index(name='anomali_count')
    
    # Tambah koordinat
    city_anomalies['lat'] = city_anomalies['city'].apply(
        lambda x: CITY_COORDINATES.get(x, {}).get('lat', 0)
    )
    city_anomalies['lon'] = city_anomalies['city'].apply(
        lambda x: CITY_COORDINATES.get(x, {}).get('lon', 0)
    )
    
    # Filter kota yang punya koordinat
    city_anomalies = city_anomalies[
        (city_anomalies['lat'] != 0) & (city_anomalies['lon'] != 0)
    ]
    
    if city_anomalies.empty:
        print("‚ö†Ô∏è  Tidak ada kota dengan koordinat yang valid")
        return None
    
    # Buat choropleth map
    fig = px.scatter_geo(
        city_anomalies,
        lat='lat',
        lon='lon',
        size='anomali_count',
        color='anomali_count',
        hover_name='city',
        hover_data={'anomali_count': True, 'lat': False, 'lon': False},
        size_max=30,
        projection='natural earth',
        title='Heatmap Anomali CCTV di Indonesia',
        color_continuous_scale='RdYlGn_r',  # Red-Yellow-Green (reversed)
        scope='asia',
        center={'lat': -2.5, 'lon': 118},  # Pusat peta Indonesia
        fitbounds='locations'
    )
    
    # Update layout
    fig.update_geos(
        resolution=50,
        showcoastlines=True,
        coastlinecolor="Black",
        showland=True,
        landcolor="lightgray",
        showocean=True,
        oceancolor="lightblue",
        showcountries=True,
        countrycolor="black"
    )
    
    fig.update_layout(
        height=500,
        margin={"r":0,"t":30,"l":0,"b":0}
    )
    
    return fig

In [59]:
def create_visualizations(db):
    """Buat visualisasi dari data yang sudah diproses - VERSION FIXED"""
    print("\n" + "="*60)
    print("üìä MEMBUAT VISUALISASI DENGAN LOGIKA YANG BENAR")
    print("="*60)
    
    # Ambil data dari MongoDB
    tweets_collection = db[COLLECTION_TWEETS]
    cctv_collection = db[COLLECTION_CCTV]
    alerts_collection = db[COLLECTION_ALERTS]
    schools_collection = db[COLLECTION_SCHOOLS]
    
    # Convert ke DataFrame untuk analisis 
    tweets_df = pd.DataFrame(list(tweets_collection.find({"processed": True})))
    cctv_df = pd.DataFrame(list(cctv_collection.find()))
    alerts_df = pd.DataFrame(list(alerts_collection.find()))
    schools_df = pd.DataFrame(list(schools_collection.find()))
    
    print(f"üìä Data dari MongoDB:")
    print(f"   ‚Ä¢ Tweets diproses: {len(tweets_df)}")
    print(f"   ‚Ä¢ Log CCTV: {len(cctv_df)}")
    print(f"   ‚Ä¢ Alerts: {len(alerts_df)}")
    print(f"   ‚Ä¢ Sekolah: {len(schools_df)}")
    
    # ========== 1. DISTRIBUSI SENTIMEN ==========
    print("\n1. üìà Distribusi Sentimen...")
    if not tweets_df.empty and 'sentiment' in tweets_df.columns:
        # Periksa distribusi aktual
        sentiment_counts = tweets_df['sentiment'].value_counts()
        
        print(f"   üìä Distribusi aktual sentimen:")
        total = len(tweets_df)
        for sentiment, count in sentiment_counts.items():
            percentage = (count / total) * 100
            print(f"   - {sentiment}: {count} ({percentage:.1f}%)")
        
        # Pastikan semua kategori ada (tambahkan jika tidak ada)
        all_sentiments = ['positif', 'netral', 'negatif']
        for sentiment in all_sentiments:
            if sentiment not in sentiment_counts.index:
                sentiment_counts[sentiment] = 0
        
        # Urutkan sesuai logika: positif, netral, negatif
        sentiment_counts = sentiment_counts.reindex(all_sentiments)
        
        # Buat pie chart
        fig1 = px.pie(
            values=sentiment_counts.values,
            names=sentiment_counts.index,
            title='Distribusi Sentimen Tweet',
            color=sentiment_counts.index,
            color_discrete_map={'positif': '#2E7D32', 'netral': '#1976D2', 'negatif': '#C62828'},
            hole=0.3
        )
        
        fig1.update_traces(
            textinfo='percent+label',
            textposition='inside',
            marker=dict(line=dict(color='white', width=2))
        )
        
        fig1.update_layout(
            title_x=0.5,
            height=500,
            showlegend=True,
            legend=dict(
                orientation="h",
                yanchor="bottom",
                y=-0.2,
                xanchor="center",
                x=0.5
            )
        )
        
        fig1.show()
    else:
        print("   ‚ö†Ô∏è Data sentimen tidak tersedia")
    
    # ========== 2. DISTRIBUSI RISK LEVEL (URUTAN YANG BENAR) ==========
    print("\n2. ‚ö†Ô∏è Distribusi Risk Level...")
    if not tweets_df.empty and 'risk_level' in tweets_df.columns:
        # Urutan yang BENAR: merah (tertinggi) -> kuning -> hijau -> aman (terendah)
        correct_order = ['merah', 'kuning', 'hijau', 'aman']
        
        # Hitung dan urutkan
        risk_counts = tweets_df['risk_level'].value_counts()
        
        # Pastikan semua level ada
        for level in correct_order:
            if level not in risk_counts.index:
                risk_counts[level] = 0
        
        # Urutkan sesuai correct_order
        risk_counts = risk_counts.reindex(correct_order)
        
        print(f"   üìä Distribusi risk level (urutan benar):")
        total = len(tweets_df)
        for level, count in risk_counts.items():
            percentage = (count / total) * 100
            print(f"   - {level}: {count} ({percentage:.1f}%)")
        
        # Warna sesuai urutan risiko
        color_map = {
            'merah': '#D32F2F',      # Red
            'kuning': '#FBC02D',     # Yellow
            'hijau': '#388E3C',      # Green
            'aman': '#1976D2'        # Blue
        }
        
        fig2 = px.bar(
            x=risk_counts.index,
            y=risk_counts.values,
            title='Distribusi Level Risiko (Urutan: Merah ‚Üí Kuning ‚Üí Hijau ‚Üí Aman)',
            labels={'x': 'Level Risiko', 'y': 'Jumlah Tweet'},
            color=risk_counts.index,
            color_discrete_map=color_map,
            text=risk_counts.values,
            category_orders={"x": correct_order}
        )
        
        fig2.update_traces(
            texttemplate='%{text}',
            textposition='outside',
            marker=dict(line=dict(color='white', width=1))
        )
        
        fig2.update_layout(
            title_x=0.5,
            height=500,
            showlegend=False,
            xaxis=dict(
                title='Level Risiko (‚Üì risiko semakin rendah)',
                tickmode='array',
                tickvals=correct_order,
                ticktext=['üî¥ MERAH (Tinggi)', 'üü° KUNING (Sedang)', 'üü¢ HIJAU (Rendah)', 'üîµ AMAN (Normal)']
            ),
            yaxis_title='Jumlah Tweet'
        )
        
        fig2.show()
    else:
        print("   ‚ö†Ô∏è Data risk level tidak tersedia")
    
    # ========== 3. DISTRIBUSI KATEGORI BULLYING ==========
    print("\n3. üè∑Ô∏è Distribusi Kategori Bullying...")
    if not tweets_df.empty and 'category' in tweets_df.columns:
        category_counts = tweets_df['category'].value_counts()
        
        print(f"   üìä Kategori yang terdeteksi:")
        total = len(tweets_df)
        for category, count in category_counts.head(10).items():  # Tampilkan 10 teratas
            percentage = (count / total) * 100
            print(f"   - {category}: {count} ({percentage:.1f}%)")
        
        # Urutkan berdasarkan jumlah (descending)
        category_counts = category_counts.sort_values(ascending=False)
        
        # Color mapping untuk kategori
        category_colors = {
            'korban_direct': '#C62828',     # Dark Red
            'pelaku': '#FF9800',           # Orange
            'korban_potensial': '#FFB74D', # Light Orange
            'saksi': '#1976D2',            # Blue
            'support': '#388E3C',          # Green
            'report': '#7B1FA2',           # Purple
            'positif_umum': '#0097A7',     # Teal
            'unknown': '#757575'           # Grey
        }
        
        # Assign colors
        colors = [category_colors.get(cat, '#757575') for cat in category_counts.index]
        
        fig3 = px.bar(
            x=category_counts.index,
            y=category_counts.values,
            title='Distribusi Kategori Bullying',
            labels={'x': 'Kategori', 'y': 'Jumlah Tweet'},
            color=category_counts.index,
            color_discrete_map=category_colors,
            text=category_counts.values
        )
        
        fig3.update_traces(
            texttemplate='%{text}',
            textposition='outside',
            marker=dict(line=dict(color='white', width=1))
        )
        
        fig3.update_layout(
            title_x=0.5,
            height=600,
            showlegend=False,
            xaxis_tickangle=-45,
            yaxis_title='Jumlah Tweet'
        )
        
        fig3.show()
    else:
        print("   ‚ö†Ô∏è Data kategori tidak tersedia")
    
    # ========== 4. HEATMAP CCTV ANOMALI ==========
    print("\n4. üìç Heatmap Anomali CCTV...")
    if not cctv_df.empty:
        anomaly_df = cctv_df[cctv_df['is_anomaly'] == True]
        print(f"   üìä Anomali CCTV terdeteksi: {len(anomaly_df)} records")
        
        if not anomaly_df.empty and 'city' in anomaly_df.columns:
            # Hitung distribusi anomali per kota
            city_counts = anomaly_df['city'].value_counts().reset_index()
            city_counts.columns = ['city', 'anomali_count']
            
            print(f"   üìç Distribusi per kota (top 5):")
            for idx, row in city_counts.head(5).iterrows():
                print(f"   - {row['city']}: {row['anomali_count']} anomali")
            
            # Try choropleth first
            heatmap_fig = create_choropleth_heatmap(anomaly_df)
            
            if heatmap_fig:
                heatmap_fig.show()
            else:
                # Fallback ke bar chart
                fig_bar = px.bar(
                    city_counts.head(10),
                    x='city',
                    y='anomali_count',
                    title='Jumlah Anomali CCTV per Kota (Top 10)',
                    color='anomali_count',
                    color_continuous_scale='Reds',
                    text='anomali_count'
                )
                
                fig_bar.update_traces(
                    texttemplate='%{text}',
                    textposition='outside'
                )
                
                fig_bar.update_layout(
                    title_x=0.5,
                    height=500,
                    xaxis_tickangle=-45,
                    yaxis_title='Jumlah Anomali'
                )
                
                fig_bar.show()
        else:
            print("   ‚ö†Ô∏è Data kota tidak tersedia untuk anomali CCTV")
    else:
        print("   ‚ö†Ô∏è Data CCTV tidak tersedia")
    
    # ========== 5. TREND ALERT ==========
    print("\n5. üìÖ Trend Alert Harian...")
    if not alerts_df.empty and 'created_at' in alerts_df.columns:
        alerts_df['date'] = pd.to_datetime(alerts_df['created_at']).dt.date
        daily_alerts = alerts_df.groupby('date').size().reset_index(name='alert_count')
        
        print(f"   üìä Total hari dengan alert: {len(daily_alerts)}")
        
        if len(daily_alerts) > 1:
            fig4 = px.line(
                daily_alerts,
                x='date',
                y='alert_count',
                title='Trend Alert Harian',
                markers=True,
                line_shape='spline'
            )
            
            # Tambahkan area fill
            fig4.update_traces(
                fill='tozeroy',
                fillcolor='rgba(231, 76, 60, 0.2)',
                line=dict(color='#E74C3C', width=3)
            )
            
            fig4.update_layout(
                title_x=0.5,
                height=400,
                xaxis_title='Tanggal',
                yaxis_title='Jumlah Alert',
                hovermode='x unified'
            )
            
            fig4.show()
        else:
            print("   ‚ö†Ô∏è Data alert tidak cukup untuk trend analysis")
    else:
        print("   ‚ö†Ô∏è Data alert tidak tersedia")
    
    # ========== 6. DASHBOARD INTERAKTIF LENGKAP ==========
    print("\n6. üé® Membuat Dashboard Interaktif Lengkap...")
    
    # Buat subplots 3x2 untuk visualisasi yang lebih lengkap
    fig = make_subplots(
        rows=3, cols=2,
        subplot_titles=(
            'Distribusi Sentimen', 
            'Level Risiko (Urutan Benar)',
            'Kategori Bullying', 
            'Trend Alert 7 Hari',
            'Anomali CCTV per Lokasi',
            'Distribusi Risk Score'
        ),
        specs=[
            [{'type': 'pie'}, {'type': 'bar'}],
            [{'type': 'bar'}, {'type': 'scatter'}],
            [{'type': 'bar'}, {'type': 'histogram'}]
        ],
        vertical_spacing=0.08,
        horizontal_spacing=0.1
    )
    
    # Plot 1: Pie chart sentimen (row 1, col 1)
    if not tweets_df.empty and 'sentiment' in tweets_df.columns:
        sentiment_counts = tweets_df['sentiment'].value_counts()
        
        # Pastikan semua kategori ada
        for sentiment in ['positif', 'netral', 'negatif']:
            if sentiment not in sentiment_counts.index:
                sentiment_counts[sentiment] = 0
        
        sentiment_counts = sentiment_counts.reindex(['positif', 'netral', 'negatif'])
        
        fig.add_trace(
            go.Pie(
                labels=sentiment_counts.index,
                values=sentiment_counts.values,
                name="Sentimen",
                marker_colors=['#2E7D32', '#1976D2', '#C62828'],
                hole=0.4,
                textinfo='percent+label',
                showlegend=False
            ),
            row=1, col=1
        )
    
    # Plot 2: Risk level per kota (row 1, col 2) - URUTAN YANG BENAR
    if not tweets_df.empty and 'city' in tweets_df.columns and 'risk_level' in tweets_df.columns:
        # Ambil top 6 kota
        top_cities = tweets_df['city'].value_counts().head(6).index
        tweets_top = tweets_df[tweets_df['city'].isin(top_cities)]
        
        # Group dengan urutan yang benar
        correct_order = ['merah', 'kuning', 'hijau', 'aman']
        risk_by_city = tweets_top.groupby(['city', 'risk_level']).size().unstack(fill_value=0)
        
        # Pastikan semua kolom ada
        for level in correct_order:
            if level not in risk_by_city.columns:
                risk_by_city[level] = 0
        
        # Urutkan kolom
        risk_by_city = risk_by_city[correct_order]
        
        # Warna
        colors = {'merah': '#D32F2F', 'kuning': '#FBC02D', 'hijau': '#388E3C', 'aman': '#1976D2'}
        
        for risk_level in correct_order:
            fig.add_trace(
                go.Bar(
                    x=risk_by_city.index,
                    y=risk_by_city[risk_level],
                    name=f'{risk_level.upper()}',
                    marker_color=colors[risk_level],
                    text=risk_by_city[risk_level],
                    textposition='auto',
                    showlegend=True
                ),
                row=1, col=2
            )
    
    # Plot 3: Kategori bullying (row 2, col 1)
    if not tweets_df.empty and 'category' in tweets_df.columns:
        category_counts = tweets_df['category'].value_counts().head(8)
        
        fig.add_trace(
            go.Bar(
                x=category_counts.index,
                y=category_counts.values,
                name='Kategori',
                marker_color='#7B1FA2',
                text=category_counts.values,
                textposition='auto',
                showlegend=False
            ),
            row=2, col=1
        )
    
    # Plot 4: Trend 7 hari terakhir (row 2, col 2)
    if not alerts_df.empty and 'created_at' in alerts_df.columns:
        alerts_df['created_at'] = pd.to_datetime(alerts_df['created_at'])
        last_7_days = datetime.now() - timedelta(days=7)
        recent_alerts = alerts_df[alerts_df['created_at'] >= last_7_days]
        
        if not recent_alerts.empty:
            recent_alerts['date'] = recent_alerts['created_at'].dt.date
            daily_recent = recent_alerts.groupby('date').size().reset_index(name='alert_count')
            
            fig.add_trace(
                go.Scatter(
                    x=daily_recent['date'],
                    y=daily_recent['alert_count'],
                    mode='lines+markers',
                    name='Alert Harian',
                    line=dict(color='#E74C3C', width=3),
                    marker=dict(size=8, color='#C62828'),
                    fill='tozeroy',
                    fillcolor='rgba(231, 76, 60, 0.2)',
                    showlegend=False
                ),
                row=2, col=2
            )
    
    # Plot 5: CCTV anomalies by location (row 3, col 1)
    if not cctv_df.empty and 'is_anomaly' in cctv_df.columns:
        anomaly_locations = cctv_df[cctv_df['is_anomaly'] == True]
        
        if not anomaly_locations.empty and 'location' in anomaly_locations.columns:
            location_counts = anomaly_locations['location'].value_counts()
            
            fig.add_trace(
                go.Bar(
                    x=location_counts.index,
                    y=location_counts.values,
                    name='Anomali CCTV',
                    marker_color='#FF9800',
                    text=location_counts.values,
                    textposition='auto',
                    showlegend=False
                ),
                row=3, col=1
            )
    
    # Plot 6: Distribusi risk score (row 3, col 2)
    if not tweets_df.empty and 'risk_score' in tweets_df.columns:
        fig.add_trace(
            go.Histogram(
                x=tweets_df['risk_score'],
                name='Risk Score',
                marker_color='#388E3C',
                nbinsx=20,
                showlegend=False
            ),
            row=3, col=2
        )
    
    # Update layout dashboard
    fig.update_layout(
        height=1000,
        width=1400,
        title_text="üö® DASHBOARD MONITORING BULLYING - LOGIKA TELAH DIPERBAIKI",
        title_font_size=20,
        title_x=0.5,
        showlegend=True,
        legend=dict(
            yanchor="top",
            y=0.99,
            xanchor="left",
            x=1.02,
            bgcolor='rgba(255, 255, 255, 0.9)',
            bordercolor='lightgray',
            borderwidth=1
        ),
        template='plotly_white',
        hovermode='closest'
    )
    
    # Update axes untuk semua subplots
    fig.update_xaxes(title_text="Kota", row=1, col=2)
    fig.update_yaxes(title_text="Jumlah Tweet", row=1, col=2)
    
    fig.update_xaxes(title_text="Kategori", row=2, col=1)
    fig.update_yaxes(title_text="Jumlah", row=2, col=1)
    
    fig.update_xaxes(title_text="Tanggal", row=2, col=2)
    fig.update_yaxes(title_text="Jumlah Alert", row=2, col=2)
    
    fig.update_xaxes(title_text="Lokasi CCTV", row=3, col=1)
    fig.update_yaxes(title_text="Jumlah Anomali", row=3, col=1)
    
    fig.update_xaxes(title_text="Risk Score", row=3, col=2)
    fig.update_yaxes(title_text="Frekuensi", row=3, col=2)
    
    fig.show()
    
    print("\n" + "="*60)
    print("‚úÖ VISUALISASI SELESAI DENGAN LOGIKA YANG BENAR!")
    print("="*60)
    print("\nüìã RINGKASAN PERBAIKAN YANG DITERAPKAN:")
    print("1. ‚úÖ Sentimen: 3 kategori (positif, netral, negatif)")
    print("2. ‚úÖ Risk Level: Urutan benar (merah‚Üíkuning‚Üíhijau‚Üíaman)")
    print("3. ‚úÖ Kategori: 8+ kategori detil untuk analisis mendalam")
    print("4. ‚úÖ Warna: Konsisten dengan makna (merah=tinggi, hijau=rendah)")
    print("5. ‚úÖ Dashboard: Layout 3x2 yang lebih informatif")
    print("="*60)
    
    return tweets_df, cctv_df, alerts_df, schools_df

## Fungsi untuk Dashboard Streamlit

In [None]:
def create_streamlit_dashboard():
    """Buat dashboard dengan interface yang mirip TAPI FIX error + PETA"""
    dashboard_code = '''# dashboard_final_with_map.py
# Sistem Deteksi Bullying - Dashboard Final dengan Peta
# Universitas Mataram - Teknik Informatika 2025/2026

import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta
from pymongo import MongoClient
from pymongo.server_api import ServerApi
import time
import urllib.parse
import warnings
warnings.filterwarnings('ignore')
from plotly.subplots import make_subplots
import random

# ========== KONFIGURASI MONGODB ATLAS ==========
MONGODB_USERNAME = "f1d02310107"
MONGODB_PASSWORD = "bigdata123"
MONGODB_CLUSTER = "cluster0.zzt6aot"

# Encode username dan password
encoded_username = urllib.parse.quote_plus(MONGODB_USERNAME)
encoded_password = urllib.parse.quote_plus(MONGODB_PASSWORD)

# Connection string
MONGODB_ATLAS_URI = f"mongodb+srv://{encoded_username}:{encoded_password}@{MONGODB_CLUSTER}.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0"

DB_NAME = "bullying_detection"
COLLECTION_TWEETS = "tweets"
COLLECTION_CCTV = "cctv_logs"
COLLECTION_ALERTS = "alerts"
COLLECTION_SCHOOLS = "schools"

# Koordinat kota di Indonesia
CITY_COORDINATES = {
    "Jakarta": {"lat": -6.2088, "lon": 106.8456},
    "Surabaya": {"lat": -7.2575, "lon": 112.7521},
    "Bandung": {"lat": -6.9175, "lon": 107.6191},
    "Medan": {"lat": 3.5952, "lon": 98.6722},
    "Semarang": {"lat": -6.9667, "lon": 110.4167},
    "Makassar": {"lat": -5.1477, "lon": 119.4327},
    "Palembang": {"lat": -2.9911, "lon": 104.7567},
    "Depok": {"lat": -6.4025, "lon": 106.7942},
    "Tangerang": {"lat": -6.1781, "lon": 106.6300},
    "Bekasi": {"lat": -6.2349, "lon": 106.9920},
    "Mataram": {"lat": -8.5833, "lon": 116.1167},
    "Denpasar": {"lat": -8.6500, "lon": 115.2167},
    "Yogyakarta": {"lat": -7.8014, "lon": 110.3644},
    "Malang": {"lat": -7.9833, "lon": 112.6333},
    "Surakarta": {"lat": -7.5667, "lon": 110.8167}
}

# ========== SETUP PAGE ==========
st.set_page_config(
    page_title="Sistem Deteksi Bullying - Dashboard",
    page_icon="üö®",
    layout="wide"
)

# ========== CSS CUSTOM ==========
st.markdown("""
<style>
    .main-header {
        font-size: 2.5rem;
        color: white;
        text-align: center;
        margin-bottom: 1rem;
        padding: 1rem;
        background: linear-gradient(90deg, #1E3A8A, #3B82F6);
        border-radius: 10px;
        box-shadow: 0 4px 6px rgba(0,0,0,0.1);
    }
    .metric-card {
        background-color: #f8f9fa;
        padding: 1rem;
        border-radius: 10px;
        border-left: 5px solid #3B82F6;
        margin-bottom: 1rem;
        transition: transform 0.3s;
    }
    .metric-card:hover {
        transform: translateY(-3px);
        box-shadow: 0 6px 12px rgba(0,0,0,0.1);
    }
    .sub-header {
        font-size: 1.5rem;
        color: #2D3748;
        margin-top: 1.5rem;
        margin-bottom: 1rem;
        padding-bottom: 0.5rem;
        border-bottom: 2px solid #4F46E5;
    }
    .stTabs [data-baseweb="tab-list"] {
        gap: 10px;
    }
    .stTabs [data-baseweb="tab"] {
        height: 50px;
        font-weight: 600;
        border-radius: 10px 10px 0 0;
    }
    .data-container {
        max-height: 600px;
        overflow-y: auto;
        border: 1px solid #e0e0e0;
        border-radius: 10px;
        padding: 15px;
        margin-top: 10px;
        background-color: #f9f9f9;
    }
</style>
""", unsafe_allow_html=True)

# ========== FUNGSI KONEKSI MONGODB ==========
@st.cache_resource
def init_connection():
    """Connect ke MongoDB Atlas - FIX error"""
    try:
        client = MongoClient(MONGODB_ATLAS_URI, server_api=ServerApi('1'))
        db = client[DB_NAME]
        # Test koneksi
        client.admin.command('ping')
        return db
    except Exception as e:
        st.sidebar.error(f"‚ùå MongoDB Error: {str(e)[:100]}")
        return None

# ========== FUNGSI LOAD DATA ==========
@st.cache_data(ttl=30)
def load_mongodb_data():
    """Load data dari MongoDB - FIX error dengan debug info"""
    db = init_connection()
    
    if db is None:
        st.warning("‚ö†Ô∏è Menggunakan data dummy karena tidak bisa konek ke MongoDB")
        return create_dummy_data(), pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
    
    try:
        # AMBIL DATA dengan debug print
        print("üîç Loading data from MongoDB...")
        
        # Tweets
        tweets = list(db[COLLECTION_TWEETS].find({"processed": True}))
        print(f"   ‚Ä¢ Tweets loaded: {len(tweets)}")
        if tweets:
            print(f"   ‚Ä¢ Tweet fields: {list(tweets[0].keys())[:10]}...")
        
        # CCTV - ambil semua, tidak ada filter
        cctv = list(db[COLLECTION_CCTV].find())
        print(f"   ‚Ä¢ CCTV logs loaded: {len(cctv)}")
        if cctv:
            print(f"   ‚Ä¢ CCTV fields: {list(cctv[0].keys())}")
        
        # Alerts
        alerts = list(db[COLLECTION_ALERTS].find())
        print(f"   ‚Ä¢ Alerts loaded: {len(alerts)}")
        
        # Schools
        schools = list(db[COLLECTION_SCHOOLS].find())
        print(f"   ‚Ä¢ Schools loaded: {len(schools)}")
        
        return tweets, cctv, alerts, schools
        
    except Exception as e:
        st.error(f"Error loading data: {e}")
        import traceback
        traceback.print_exc()
        return create_dummy_data(), pd.DataFrame(), pd.DataFrame(), pd.DataFrame()

def create_dummy_data():
    """Buat data dummy jika MongoDB error"""
    print("‚ö†Ô∏è Membuat data dummy...")
    
    cities = list(CITY_COORDINATES.keys())[:10]
    sentiments = ['positif', 'netral', 'negatif']
    risk_levels = ['merah', 'kuning', 'hijau', 'aman']
    categories = ['korban_direct', 'pelaku', 'saksi', 'support', 'report', 'positif_umum']
    locations = ['gerbang', 'lorong', 'kantin', 'lapangan', 'parkir', 'toilet', 'kelas']
    
    # Buat dummy tweets
    tweets_data = []
    for i in range(500):
        city = random.choice(cities)
        sentiment = random.choices(sentiments, weights=[0.2, 0.3, 0.5])[0]
        risk_level = random.choices(risk_levels, weights=[0.3, 0.4, 0.2, 0.1])[0]
        category = random.choice(categories)
        
        tweets_data.append({
            'tweet_id': f'dummy_tweet_{i}',
            'text': f'Contoh tweet tentang bullying di sekolah {i} di kota {city}',
            'city': city,
            'sentiment': sentiment,
            'risk_level': risk_level,
            'risk_score': random.randint(1, 20),
            'bullying_detected': risk_level in ['merah', 'kuning'],
            'created_at': datetime.now() - timedelta(days=random.randint(0, 30)),
            'school': f'SMP Negeri {random.randint(1, 5)} {city}',
            'category': category,
            'processed': True
        })
    
    # Buat dummy CCTV logs
    cctv_data = []
    for i in range(100):
        city = random.choice(cities)
        location = random.choice(locations)
        
        cctv_data.append({
            'log_id': f'cctv_log_{i}',
            'cctv_id': f'cctv_{random.randint(1, 20)}',
            'school': f'SMP Negeri {random.randint(1, 5)} {city}',
            'city': city,
            'location': location,
            'timestamp': datetime.now() - timedelta(hours=random.randint(0, 72)),
            'crowd_level': random.randint(1, 100),
            'noise_level': random.randint(30, 90),
            'is_anomaly': random.choice([True, False, False, False]),  # 25% anomaly
            'warning_level': random.choice(['merah', 'kuning', 'hijau']),
            'processed': False
        })
    
    return tweets_data, cctv_data, [], []

# ========== FUNGSI UNTUK PETA ==========
def create_indonesia_heatmap(tweets_df, cctv_df):
    """Buat heatmap peta Indonesia seperti di notebook"""
    if tweets_df.empty and cctv_df.empty:
        return None
    
    heat_data = []
    
    # 1. Hitung tweet risiko tinggi per kota
    if not tweets_df.empty and 'city' in tweets_df.columns and 'risk_level' in tweets_df.columns:
        high_risk_tweets = tweets_df[tweets_df['risk_level'].isin(['merah', 'kuning'])]
        if not high_risk_tweets.empty:
            tweet_counts = high_risk_tweets['city'].value_counts()
            for city, count in tweet_counts.items():
                if city in CITY_COORDINATES:
                    heat_data.append({
                        'city': city,
                        'lat': CITY_COORDINATES[city]['lat'],
                        'lon': CITY_COORDINATES[city]['lon'],
                        'count': count,
                        'type': 'tweet',
                        'label': f'Tweet: {count} risiko tinggi'
                    })
    
    # 2. Hitung anomali CCTV per kota
    if not cctv_df.empty and 'city' in cctv_df.columns and 'is_anomaly' in cctv_df.columns:
        anomaly_df = cctv_df[cctv_df['is_anomaly'] == True]
        if not anomaly_df.empty:
            cctv_counts = anomaly_df['city'].value_counts()
            for city, count in cctv_counts.items():
                if city in CITY_COORDINATES:
                    # Cek apakah kota sudah ada di data
                    found = False
                    for item in heat_data:
                        if item['city'] == city:
                            item['count'] += count * 2  # CCTV lebih berat
                            item['label'] = f"{item['label']} + CCTV: {count}"
                            found = True
                            break
                    if not found:
                        heat_data.append({
                            'city': city,
                            'lat': CITY_COORDINATES[city]['lat'],
                            'lon': CITY_COORDINATES[city]['lon'],
                            'count': count * 2,
                            'type': 'cctv',
                            'label': f'CCTV: {count} anomali'
                        })
    
    if not heat_data:
        return None
    
    heat_df = pd.DataFrame(heat_data)
    
    # Buat scatter map
    fig = px.scatter_geo(
        heat_df,
        lat='lat',
        lon='lon',
        size='count',
        color='count',
        hover_name='city',
        hover_data={'count': True, 'label': True, 'lat': False, 'lon': False},
        size_max=40,
        projection='natural earth',
        title='üó∫Ô∏è Heatmap Risiko Bullying & Anomali CCTV di Indonesia',
        color_continuous_scale='RdYlGn_r',
        color_continuous_midpoint=heat_df['count'].median(),
        scope='asia',
        center={'lat': -2.5, 'lon': 118},
        template='plotly_white'
    )
    
    # Update geos settings
    fig.update_geos(
        resolution=50,
        showcoastlines=True,
        coastlinecolor="Black",
        showland=True,
        landcolor="lightgray",
        showocean=True,
        oceancolor="lightblue",
        showcountries=True,
        countrycolor="black",
        showlakes=True,
        lakecolor="lightblue"
    )
    
    # Update layout
    fig.update_layout(
        height=550,
        margin={"r": 0, "t": 60, "l": 0, "b": 0},
        title_x=0.5,
        title_font_size=18,
        geo=dict(
            projection_scale=5,
            center=dict(lat=-2.5, lon=118)
        )
    )
    
    return fig

# ========== FUNGSI VISUALISASI ==========
def create_matching_sentiment_chart(tweets_df):
    """Buat chart sentimen SAMA dengan notebook"""
    if tweets_df.empty or 'sentiment' not in tweets_df.columns:
        return None
    
    sentiment_counts = tweets_df['sentiment'].value_counts()
    
    fig = px.pie(
        values=sentiment_counts.values,
        names=sentiment_counts.index,
        title='Distribusi Sentimen Tweet',
        color=sentiment_counts.index,
        color_discrete_map={'positif': 'green', 'netral': 'blue', 'negatif': 'red'},
        hole=0.3
    )
    
    fig.update_layout(
        title_x=0.5,
        height=400,
        showlegend=True
    )
    
    return fig

def create_matching_risk_chart(tweets_df):
    """Buat chart risk level SAMA dengan notebook"""
    if tweets_df.empty or 'risk_level' not in tweets_df.columns:
        return None
    
    risk_counts = tweets_df['risk_level'].value_counts()
    
    fig = px.bar(
        x=risk_counts.index,
        y=risk_counts.values,
        title='Distribusi Level Risiko',
        labels={'x': 'Level Risiko', 'y': 'Jumlah Tweet'},
        color=risk_counts.index,
        color_discrete_map={'merah': 'red', 'kuning': 'yellow', 'hijau': 'green', 'aman': 'blue'},
        text=risk_counts.values
    )
    
    fig.update_traces(texttemplate='%{text}', textposition='outside')
    fig.update_layout(
        title_x=0.5,
        height=400,
        showlegend=False
    )
    
    return fig

def create_matching_complete_dashboard(tweets_df, cctv_df, alerts_df):
    """Buat dashboard lengkap SAMA dengan notebook"""
    if tweets_df.empty:
        return None
    
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Distribusi Sentimen', 'Level Risiko per Kota',
                       'Trend Alert 7 Hari Terakhir', 'Anomali CCTV per Lokasi'),
        specs=[[{'type': 'pie'}, {'type': 'bar'}],
               [{'type': 'scatter'}, {'type': 'bar'}]],
        vertical_spacing=0.12,
        horizontal_spacing=0.1
    )
    
    # 1. Pie chart sentimen
    if 'sentiment' in tweets_df.columns:
        sentiment_counts = tweets_df['sentiment'].value_counts()
        fig.add_trace(
            go.Pie(labels=sentiment_counts.index, values=sentiment_counts.values,
                   name="Sentimen", marker_colors=['green', 'blue', 'red']),
            row=1, col=1
        )
    
    # 2. Risk level per kota
    if 'city' in tweets_df.columns and 'risk_level' in tweets_df.columns:
        # Ambil top 8 kota
        top_cities = tweets_df['city'].value_counts().head(8).index
        tweets_top = tweets_df[tweets_df['city'].isin(top_cities)]
        risk_by_city = tweets_top.groupby(['city', 'risk_level']).size().unstack(fill_value=0)
        
        colors = {'merah': 'red', 'kuning': 'yellow', 'hijau': 'green', 'aman': 'blue'}
        
        for risk_level in ['merah', 'kuning', 'hijau', 'aman']:
            if risk_level in risk_by_city.columns:
                fig.add_trace(
                    go.Bar(x=risk_by_city.index, y=risk_by_city[risk_level],
                           name=f'Risiko {risk_level}', marker_color=colors[risk_level]),
                    row=1, col=2
                )
    
    # 3. Trend 7 hari terakhir
    if not alerts_df.empty and 'created_at' in alerts_df.columns:
        alerts_df['created_at'] = pd.to_datetime(alerts_df['created_at'])
        last_7_days = datetime.now() - timedelta(days=7)
        recent_alerts = alerts_df[alerts_df['created_at'] >= last_7_days]
        
        if not recent_alerts.empty:
            recent_alerts['date'] = recent_alerts['created_at'].dt.date
            daily_recent = recent_alerts.groupby('date').size().reset_index(name='alert_count')
            
            fig.add_trace(
                go.Scatter(x=daily_recent['date'], y=daily_recent['alert_count'],
                          mode='lines+markers', name='Alert Harian',
                          line=dict(color='red', width=2)),
                row=2, col=1
            )
    
    # 4. CCTV anomalies by location
    if not cctv_df.empty and 'is_anomaly' in cctv_df.columns:
        anomaly_locations = cctv_df[cctv_df['is_anomaly'] == True]
        
        if not anomaly_locations.empty and 'location' in anomaly_locations.columns:
            location_counts = anomaly_locations['location'].value_counts()
            
            fig.add_trace(
                go.Bar(x=location_counts.index, y=location_counts.values,
                       name='Anomali per Lokasi', marker_color='orange'),
                row=2, col=2
            )
    
    fig.update_layout(
        height=800,
        title_text="Dashboard Monitoring Bullying - Konsisten dengan Notebook",
        showlegend=True,
        template='plotly_white'
    )
    
    fig.update_xaxes(title_text="Kota", row=1, col=2)
    fig.update_yaxes(title_text="Jumlah Tweet", row=1, col=2)
    
    fig.update_xaxes(title_text="Tanggal", row=2, col=1)
    fig.update_yaxes(title_text="Jumlah Alert", row=2, col=1)
    
    fig.update_xaxes(title_text="Lokasi", row=2, col=2)
    fig.update_yaxes(title_text="Jumlah Anomali", row=2, col=2)
    
    return fig

# ========== FUNGSI UTAMA DASHBOARD ==========
def main():
    # Header
    st.markdown('<h1 class="main-header">üö® Sistem Deteksi Bullying üö®</h1>', unsafe_allow_html=True)
    st.markdown("**Dashboard dengan Peta Heatmap Indonesia**")
    
    # Load data
    tweets_data, cctv_data, alerts_data, schools_data = load_mongodb_data()
    
    # Convert to DataFrame
    tweets_df = pd.DataFrame(tweets_data) if tweets_data else pd.DataFrame()
    cctv_df = pd.DataFrame(cctv_data) if cctv_data else pd.DataFrame()
    alerts_df = pd.DataFrame(alerts_data) if alerts_data else pd.DataFrame()
    schools_df = pd.DataFrame(schools_data) if schools_data else pd.DataFrame()
    
    # Debug info di sidebar
    with st.sidebar.expander("üîç Debug Info", expanded=False):
        st.write(f"**Data Loaded:**")
        st.write(f"‚Ä¢ Tweets: {len(tweets_df)} rows")
        st.write(f"‚Ä¢ CCTV Logs: {len(cctv_df)} rows")
        st.write(f"‚Ä¢ Alerts: {len(alerts_df)} rows")
        
        if not tweets_df.empty:
            st.write(f"**Tweet Columns:** {list(tweets_df.columns)[:10]}")
        
        if not cctv_df.empty:
            st.write(f"**CCTV Columns:** {list(cctv_df.columns)}")
    
    # ========== SIDEBAR ==========
    st.sidebar.title("‚öôÔ∏è Kontrol Dashboard")
    
    # Refresh button
    if st.sidebar.button("üîÑ Refresh Data", use_container_width=True):
        st.cache_data.clear()
        st.rerun()
    
    st.sidebar.markdown("---")
    st.sidebar.title("üìä Statistik Data")
    
    st.sidebar.write(f"**Total Tweet:** {len(tweets_df)}")
    
    if not tweets_df.empty:
        if 'sentiment' in tweets_df.columns:
            neg_count = len(tweets_df[tweets_df['sentiment'] == 'negatif'])
            st.sidebar.write(f"**Sentimen Negatif:** {neg_count}")
        
        if 'risk_level' in tweets_df.columns:
            high_risk = len(tweets_df[tweets_df['risk_level'].isin(['merah', 'kuning'])])
            st.sidebar.write(f"**High Risk:** {high_risk}")
    
    if not cctv_df.empty:
        anomalies = len(cctv_df[cctv_df['is_anomaly'] == True])
        st.sidebar.write(f"**Anomali CCTV:** {anomalies}")
    
    st.sidebar.markdown("---")
    st.sidebar.caption(f"üïí Terakhir update: {datetime.now().strftime('%H:%M:%S')}")
    
    # ========== METRICS ==========
    st.markdown('<div class="sub-header">üìä Metrics Real-time</div>', unsafe_allow_html=True)
    
    col1, col2, col3, col4 = st.columns(4)
    
    with col1:
        st.markdown('<div class="metric-card">', unsafe_allow_html=True)
        st.metric("üìù Total Tweet", len(tweets_df))
        st.markdown('</div>', unsafe_allow_html=True)
    
    with col2:
        st.markdown('<div class="metric-card">', unsafe_allow_html=True)
        if not tweets_df.empty and 'sentiment' in tweets_df.columns:
            neg_count = len(tweets_df[tweets_df['sentiment'] == 'negatif'])
            st.metric("üòî Sentimen Negatif", neg_count)
        else:
            st.metric("üòî Sentimen Negatif", 0)
        st.markdown('</div>', unsafe_allow_html=True)
    
    with col3:
        st.markdown('<div class="metric-card">', unsafe_allow_html=True)
        if not tweets_df.empty and 'risk_level' in tweets_df.columns:
            high_risk = len(tweets_df[tweets_df['risk_level'].isin(['merah', 'kuning'])])
            st.metric("üö® High Risk", high_risk)
        else:
            st.metric("üö® High Risk", 0)
        st.markdown('</div>', unsafe_allow_html=True)
    
    with col4:
        st.markdown('<div class="metric-card">', unsafe_allow_html=True)
        if not cctv_df.empty:
            anomalies = len(cctv_df[cctv_df['is_anomaly'] == True])
            st.metric("üìπ Anomali CCTV", anomalies)
        else:
            st.metric("üìπ Anomali CCTV", 0)
        st.markdown('</div>', unsafe_allow_html=True)
    
    # ========== TABS ==========
    tab1, tab2, tab3, tab4, tab5 = st.tabs([
        "üó∫Ô∏è Peta Heatmap", 
        "üìä Visualisasi", 
        "üìà Dashboard Lengkap",
        "üìù Tweet & CCTV Log",
        "üìã Data Detail"
    ])
    
    with tab1:
        st.markdown('<div class="sub-header">üó∫Ô∏è Peta Heatmap Indonesia</div>', unsafe_allow_html=True)
        
        # Buat peta heatmap
        heatmap_fig = create_indonesia_heatmap(tweets_df, cctv_df)
        
        if heatmap_fig:
            st.plotly_chart(heatmap_fig, use_container_width=True)
            
            # Stats di bawah peta
            col1, col2, col3 = st.columns(3)
            with col1:
                if not tweets_df.empty and 'risk_level' in tweets_df.columns:
                    red_tweets = len(tweets_df[tweets_df['risk_level'] == 'merah'])
                    st.metric("üî¥ Tweet Merah", red_tweets)
            
            with col2:
                if not tweets_df.empty and 'risk_level' in tweets_df.columns:
                    yellow_tweets = len(tweets_df[tweets_df['risk_level'] == 'kuning'])
                    st.metric("üü° Tweet Kuning", yellow_tweets)
            
            with col3:
                if not cctv_df.empty:
                    total_anomalies = len(cctv_df[cctv_df['is_anomaly'] == True])
                    st.metric("üìπ Total Anomali", total_anomalies)
        else:
            st.info("Data tidak cukup untuk membuat peta heatmap")
            
            # Fallback: bar chart per kota
            if not tweets_df.empty and 'city' in tweets_df.columns:
                st.subheader("Distribusi per Kota")
                city_counts = tweets_df['city'].value_counts().head(10).reset_index()
                city_counts.columns = ['city', 'count']
                
                fig_bar = px.bar(
                    city_counts,
                    x='city',
                    y='count',
                    title='Jumlah Tweet per Kota',
                    color='count',
                    color_continuous_scale='Reds'
                )
                st.plotly_chart(fig_bar, use_container_width=True)
    
    with tab2:
        st.markdown('<div class="sub-header">üìà Diagram Individual</div>', unsafe_allow_html=True)
        
        col1, col2 = st.columns(2)
        
        with col1:
            fig1 = create_matching_sentiment_chart(tweets_df)
            if fig1:
                st.plotly_chart(fig1, use_container_width=True)
                st.caption("**Distribusi Sentimen Tweet**")
            else:
                st.info("Data sentimen tidak tersedia")
        
        with col2:
            fig2 = create_matching_risk_chart(tweets_df)
            if fig2:
                st.plotly_chart(fig2, use_container_width=True)
                st.caption("**Distribusi Level Risiko**")
            else:
                st.info("Data risk level tidak tersedia")
        
        # Tambahan: Trend waktu
        if not tweets_df.empty and 'created_at' in tweets_df.columns:
            st.subheader("üìÖ Trend Harian")
            
            try:
                tweets_df['date'] = pd.to_datetime(tweets_df['created_at']).dt.date
                daily_counts = tweets_df.groupby('date').size().reset_index(name='count')
                
                fig_trend = px.line(
                    daily_counts,
                    x='date',
                    y='count',
                    title='Jumlah Tweet per Hari',
                    markers=True
                )
                st.plotly_chart(fig_trend, use_container_width=True)
            except:
                pass
    
    with tab3:
        st.markdown('<div class="sub-header">üìä Dashboard Lengkap (2x2 Subplots)</div>', unsafe_allow_html=True)
        
        fig3 = create_matching_complete_dashboard(tweets_df, cctv_df, alerts_df)
        if fig3:
            st.plotly_chart(fig3, use_container_width=True)
            st.caption("**Dashboard lengkap dengan 4 visualisasi**")
        else:
            st.info("Data tidak cukup untuk membuat dashboard lengkap")
            
            # Fallback: simple dashboard
            if not tweets_df.empty:
                col1, col2 = st.columns(2)
                with col1:
                    fig_fallback1 = create_matching_sentiment_chart(tweets_df)
                    if fig_fallback1:
                        st.plotly_chart(fig_fallback1, use_container_width=True)
                
                with col2:
                    fig_fallback2 = create_matching_risk_chart(tweets_df)
                    if fig_fallback2:
                        st.plotly_chart(fig_fallback2, use_container_width=True)
    
    # Tab 4: Tweet & CCTV Log - FIXED VERSION
    with tab4:
        st.markdown('<div class="sub-header">üìù Data Tweet & CCTV Log Lengkap</div>', unsafe_allow_html=True)
        
        # Sub-tabs untuk tweet dan CCTV
        sub_tab1, sub_tab2 = st.tabs(["üì® Semua Tweet", "üìπ CCTV Log"])
        
        with sub_tab1:
            if tweets_df.empty:
                st.info("üì≠ Tidak ada data tweet yang tersedia")
                st.write("Jalankan pipeline di notebook untuk generate data tweet")
            else:
                st.markdown("**üîç Filter Data Tweet:**")
                
                # Dapatkan unique values untuk filter
                unique_cities = ["Semua"]
                if 'city' in tweets_df.columns:
                    city_list = sorted([str(c) for c in tweets_df['city'].dropna().unique()])
                    unique_cities += city_list[:15]  # Batasi ke 15 kota pertama
                
                col1, col2, col3 = st.columns(3)
                with col1:
                    selected_city = st.selectbox(
                        "Pilih Kota:",
                        unique_cities,
                        key="city_filter_tab4"
                    )
                
                with col2:
                    selected_risk = st.selectbox(
                        "Pilih Risk Level:",
                        ["Semua", "merah", "kuning", "hijau", "aman"],
                        key="risk_filter_tab4"
                    )
                
                with col3:
                    selected_sentiment = st.selectbox(
                        "Pilih Sentimen:",
                        ["Semua", "positif", "netral", "negatif"],
                        key="sentiment_filter_tab4"
                    )
                
                # Filter data
                filtered_tweets = tweets_df.copy()
                
                if selected_city != "Semua" and 'city' in filtered_tweets.columns:
                    filtered_tweets = filtered_tweets[filtered_tweets['city'] == selected_city]
                
                if selected_risk != "Semua" and 'risk_level' in filtered_tweets.columns:
                    filtered_tweets = filtered_tweets[filtered_tweets['risk_level'] == selected_risk]
                
                if selected_sentiment != "Semua" and 'sentiment' in filtered_tweets.columns:
                    filtered_tweets = filtered_tweets[filtered_tweets['sentiment'] == selected_sentiment]
                
                # Tampilkan jumlah hasil
                st.markdown(f"**üìä Menampilkan {len(filtered_tweets)} dari {len(tweets_df)} tweet**")
                
                if not filtered_tweets.empty:
                    # Pagination
                    items_per_page = st.selectbox(
                        "Items per page:",
                        [10, 25, 50, 100],
                        index=1,
                        key="tweet_pagination_tab4"
                    )
                    
                    total_pages = max(1, (len(filtered_tweets) + items_per_page - 1) // items_per_page)
                    page_number = st.number_input(
                        "Page:",
                        min_value=1,
                        max_value=total_pages,
                        value=1,
                        step=1,
                        key="tweet_page_tab4"
                    )
                    
                    start_idx = (page_number - 1) * items_per_page
                    end_idx = min(start_idx + items_per_page, len(filtered_tweets))
                    
                    st.write(f"**Halaman {page_number}/{total_pages}** (Item {start_idx+1}-{end_idx})")
                    
                    # Container dengan scroll
                    st.markdown('<div class="data-container">', unsafe_allow_html=True)
                    
                    for idx in range(start_idx, end_idx):
                        tweet = filtered_tweets.iloc[idx]
                        
                        # Format date jika ada
                        created_at = tweet.get('created_at', 'N/A')
                        if isinstance(created_at, (datetime, pd.Timestamp)):
                            created_at_str = created_at.strftime('%Y-%m-%d %H:%M')
                        else:
                            created_at_str = str(created_at)
                        
                        # Format city
                        city = str(tweet.get('city', 'N/A'))
                        
                        # Risk color mapping
                        risk_level = tweet.get('risk_level', 'aman')
                        risk_color = {
                            'merah': 'üî¥',
                            'kuning': 'üü°', 
                            'hijau': 'üü¢',
                            'aman': 'üîµ'
                        }.get(risk_level, '‚ö™')
                        
                        # Buat expander
                        with st.expander(f"Tweet dari {city} - {created_at_str}", expanded=False):
                            col_a, col_b = st.columns([3, 1])
                            
                            with col_a:
                                # Text (potong jika terlalu panjang)
                                text = str(tweet.get('text', 'N/A'))
                                if len(text) > 300:
                                    text = text[:300] + "..."
                                st.write(f"**üí¨ Text:** {text}")
                                
                                # School
                                school = str(tweet.get('school', 'N/A'))
                                st.write(f"**üè´ Sekolah:** {school}")
                            
                            with col_b:
                                st.write(f"{risk_color} **Risk:** {risk_level}")
                                
                                # Sentiment dengan emoji
                                sentiment = tweet.get('sentiment', 'N/A')
                                sentiment_emoji = {
                                    'positif': 'üòä',
                                    'netral': 'üòê', 
                                    'negatif': 'üòî'
                                }.get(sentiment, '‚ùì')
                                st.write(f"{sentiment_emoji} **Sentimen:** {sentiment}")
                                
                                # Risk score
                                risk_score = tweet.get('risk_score', 'N/A')
                                st.write(f"üìä **Score:** {risk_score}/20")
                            
                            # Footer info
                            st.caption(f"ID: {tweet.get('tweet_id', 'N/A')} ‚Ä¢ Category: {tweet.get('category', 'N/A')}")
                    
                    st.markdown('</div>', unsafe_allow_html=True)
                    
                    # Download button
                    if 'text' in filtered_tweets.columns:
                        download_cols = ['text', 'city', 'school', 'sentiment', 'risk_level', 'risk_score', 'created_at']
                        available_cols = [col for col in download_cols if col in filtered_tweets.columns]
                        
                        if available_cols:
                            csv_tweets = filtered_tweets[available_cols].to_csv(index=False)
                            st.download_button(
                                label="üì• Download Data Tweet (CSV)",
                                data=csv_tweets,
                                file_name=f"tweet_data_{datetime.now().strftime('%Y%m%d_%H%M')}.csv",
                                mime="text/csv",
                                key="download_tweets_tab4"
                            )
                else:
                    st.info("Tidak ada tweet yang sesuai dengan filter")
        
        with sub_tab2:
            if cctv_df.empty:
                st.info("üì≠ Tidak ada data CCTV yang tersedia")
                st.write("Jalankan fungsi `generate_cctv_data()` di notebook untuk membuat data CCTV")
            else:
                st.markdown("**üîç Filter CCTV Log:**")
                
                # Dapatkan unique values untuk filter
                cctv_cities = ["Semua"]
                if 'city' in cctv_df.columns:
                    city_list = sorted([str(c) for c in cctv_df['city'].dropna().unique()])
                    cctv_cities += city_list[:10]  # Batasi ke 10 kota pertama
                
                cctv_locations = ["Semua"]
                if 'location' in cctv_df.columns:
                    location_list = sorted([str(l) for l in cctv_df['location'].dropna().unique()])
                    cctv_locations += location_list
                
                col1, col2, col3 = st.columns(3)
                with col1:
                    cctv_city_filter = st.selectbox(
                        "Pilih Kota CCTV:",
                        cctv_cities,
                        key="cctv_city_filter_tab4"
                    )
                
                with col2:
                    cctv_location_filter = st.selectbox(
                        "Pilih Lokasi:",
                        cctv_locations,
                        key="cctv_location_filter_tab4"
                    )
                
                with col3:
                    cctv_anomaly_filter = st.selectbox(
                        "Status Anomali:",
                        ["Semua", "Anomali", "Normal"],
                        key="cctv_anomaly_filter_tab4"
                    )
                
                # Filter CCTV data
                filtered_cctv = cctv_df.copy()
                
                if cctv_city_filter != "Semua" and 'city' in filtered_cctv.columns:
                    filtered_cctv = filtered_cctv[filtered_cctv['city'] == cctv_city_filter]
                
                if cctv_location_filter != "Semua" and 'location' in filtered_cctv.columns:
                    filtered_cctv = filtered_cctv[filtered_cctv['location'] == cctv_location_filter]
                
                if cctv_anomaly_filter != "Semua":
                    if cctv_anomaly_filter == "Anomali":
                        filtered_cctv = filtered_cctv[filtered_cctv['is_anomaly'] == True]
                    else:
                        filtered_cctv = filtered_cctv[filtered_cctv['is_anomaly'] == False]
                
                # Tampilkan jumlah hasil
                st.markdown(f"**üìä Menampilkan {len(filtered_cctv)} dari {len(cctv_df)} log CCTV**")
                
                if not filtered_cctv.empty:
                    # Pagination untuk CCTV
                    cctv_items_per_page = st.selectbox(
                        "Items per page CCTV:",
                        [10, 25, 50, 100],
                        index=1,
                        key="cctv_items_tab4"
                    )
                    
                    cctv_total_pages = max(1, (len(filtered_cctv) + cctv_items_per_page - 1) // cctv_items_per_page)
                    cctv_page_number = st.number_input(
                        "Page CCTV:",
                        min_value=1,
                        max_value=cctv_total_pages,
                        value=1,
                        step=1,
                        key="cctv_page_tab4"
                    )
                    
                    cctv_start_idx = (cctv_page_number - 1) * cctv_items_per_page
                    cctv_end_idx = min(cctv_start_idx + cctv_items_per_page, len(filtered_cctv))
                    
                    st.write(f"**Halaman {cctv_page_number}/{cctv_total_pages}** (Item {cctv_start_idx+1}-{cctv_end_idx})")
                    
                    # Container dengan scroll
                    st.markdown('<div class="data-container">', unsafe_allow_html=True)
                    
                    for idx in range(cctv_start_idx, cctv_end_idx):
                        log = filtered_cctv.iloc[idx]
                        
                        # Format timestamp
                        timestamp = log.get('timestamp', 'N/A')
                        if isinstance(timestamp, (datetime, pd.Timestamp)):
                            timestamp_str = timestamp.strftime('%Y-%m-%d %H:%M')
                        else:
                            timestamp_str = str(timestamp)
                        
                        # Anomaly status
                        is_anomaly = log.get('is_anomaly', False)
                        anomaly_status = "üî¥ ANOMALI" if is_anomaly else "üü¢ NORMAL"
                        anomaly_color = "üî¥" if is_anomaly else "üü¢"
                        
                        # CCTV ID - coba beberapa kemungkinan field
                        cctv_id = log.get('cctv_id', 'N/A')
                        if cctv_id == 'N/A':
                            cctv_id = log.get('log_id', 'N/A')  # Coba field alternatif
                        
                        # Buat expander
                        with st.expander(f"{anomaly_color} CCTV {cctv_id} - {timestamp_str}", expanded=False):
                            col_a, col_b = st.columns(2)
                            
                            with col_a:
                                # School
                                school = str(log.get('school', 'N/A'))
                                st.write(f"**üè´ Sekolah:** {school}")
                                
                                # Location
                                location = str(log.get('location', 'N/A'))
                                st.write(f"**üìç Lokasi:** {location}")
                                
                                # City
                                city = str(log.get('city', 'N/A'))
                                st.write(f"**üåÜ Kota:** {city}")
                            
                            with col_b:
                                # Status
                                st.write(f"**üìä Status:** {anomaly_status}")
                                
                                # Warning level
                                warning_level = log.get('warning_level', 'N/A')
                                warning_emoji = {
                                    'merah': 'üî¥',
                                    'kuning': 'üü°',
                                    'hijau': 'üü¢'
                                }.get(warning_level, '‚ö™')
                                st.write(f"{warning_emoji} **Warning:** {warning_level}")
                                
                                # Metrics
                                crowd_level = log.get('crowd_level', 'N/A')
                                noise_level = log.get('noise_level', 'N/A')
                                st.write(f"**üë• Keramaian:** {crowd_level} orang")
                                st.write(f"**üîä Kebisingan:** {noise_level} dB")
                    
                    st.markdown('</div>', unsafe_allow_html=True)
                    
                    # Download button untuk CCTV
                    available_cctv_cols = []
                    possible_cols = ['timestamp', 'cctv_id', 'log_id', 'school', 'city', 'location', 
                                   'crowd_level', 'noise_level', 'is_anomaly', 'warning_level']
                    
                    for col in possible_cols:
                        if col in filtered_cctv.columns:
                            available_cctv_cols.append(col)
                    
                    if available_cctv_cols:
                        # Buat copy untuk download
                        download_cctv = filtered_cctv[available_cctv_cols].copy()
                        
                        # Format timestamp untuk CSV
                        if 'timestamp' in download_cctv.columns:
                            download_cctv['timestamp'] = download_cctv['timestamp'].apply(
                                lambda x: x.strftime('%Y-%m-%d %H:%M') if isinstance(x, (datetime, pd.Timestamp)) else str(x)
                            )
                        
                        csv_cctv = download_cctv.to_csv(index=False)
                        st.download_button(
                            label="üì• Download Data CCTV (CSV)",
                            data=csv_cctv,
                            file_name=f"cctv_logs_{datetime.now().strftime('%Y%m%d_%H%M')}.csv",
                            mime="text/csv",
                            key="download_cctv_tab4"
                        )
                else:
                    st.info("Tidak ada log CCTV yang sesuai dengan filter")
    
    # Tab 5: Data Detail - FIXED VERSION
    with tab5:
        st.markdown('<div class="sub-header">üìã Data Detail dari MongoDB</div>', unsafe_allow_html=True)
        
        if not tweets_df.empty:
            # Tampilkan distribusi
            col1, col2, col3 = st.columns(3)
            
            with col1:
                st.write("**üìä Distribusi Sentimen:**")
                if 'sentiment' in tweets_df.columns:
                    sentiment_counts = tweets_df['sentiment'].value_counts()
                    for sent, count in sentiment_counts.items():
                        percentage = (count / len(tweets_df)) * 100
                        st.write(f"- {sent}: {count} ({percentage:.1f}%)")
                else:
                    st.write("Kolom 'sentiment' tidak ditemukan")
            
            with col2:
                st.write("**‚ö†Ô∏è Distribusi Risk Level:**")
                if 'risk_level' in tweets_df.columns:
                    risk_counts = tweets_df['risk_level'].value_counts()
                    for risk, count in risk_counts.items():
                        percentage = (count / len(tweets_df)) * 100
                        st.write(f"- {risk}: {count} ({percentage:.1f}%)")
                else:
                    st.write("Kolom 'risk_level' tidak ditemukan")
            
            with col3:
                st.write("**üìç Top 5 Kota:**")
                if 'city' in tweets_df.columns:
                    city_counts = tweets_df['city'].value_counts().head(5)
                    for city, count in city_counts.items():
                        percentage = (count / len(tweets_df)) * 100
                        st.write(f"- {city}: {count} ({percentage:.1f}%)")
                else:
                    st.write("Kolom 'city' tidak ditemukan")
            
            # Tampilkan sample data
            st.subheader("üìä Sample Data Tweet (10 terbaru)")
            
            # Sort by date jika ada
            if 'created_at' in tweets_df.columns:
                tweets_df_sorted = tweets_df.sort_values('created_at', ascending=False)
            else:
                tweets_df_sorted = tweets_df
            
            # Pilih kolom untuk ditampilkan
            show_cols = ['text', 'city', 'school', 'sentiment', 'risk_level', 'risk_score', 'category', 'created_at']
            available_cols = [col for col in show_cols if col in tweets_df_sorted.columns]
            
            if available_cols:
                sample_df = tweets_df_sorted[available_cols].head(10).copy()
                
                # Format tanggal
                if 'created_at' in sample_df.columns:
                    sample_df['created_at'] = pd.to_datetime(sample_df['created_at']).dt.strftime('%Y-%m-%d %H:%M')
                
                # Format text (potong jika terlalu panjang)
                if 'text' in sample_df.columns:
                    sample_df['text'] = sample_df['text'].apply(lambda x: x[:100] + '...' if len(str(x)) > 100 else str(x))
                
                # Tampilkan dataframe
                st.dataframe(sample_df, use_container_width=True, height=400)
                
                # Download button
                csv = sample_df.to_csv(index=False)
                st.download_button(
                    label="üì• Download Sample Data (CSV)",
                    data=csv,
                    file_name=f"sample_tweets_{datetime.now().strftime('%Y%m%d')}.csv",
                    mime="text/csv",
                    key="download_sample_tab5"
                )
            else:
                st.info("Kolom data tidak tersedia")
        else:
            st.info("Tidak ada data tweet yang tersedia")
        
        # CCTV Data Section
        st.subheader("üìπ Data CCTV Log")
        
        if not cctv_df.empty:
            # Tampilkan distribusi CCTV
            col1, col2 = st.columns(2)
            
            with col1:
                st.write("**üìç Lokasi CCTV:**")
                if 'location' in cctv_df.columns:
                    location_counts = cctv_df['location'].value_counts()
                    for loc, count in location_counts.items():
                        percentage = (count / len(cctv_df)) * 100
                        st.write(f"- {loc}: {count} ({percentage:.1f}%)")
            
            with col2:
                st.write("**‚ö†Ô∏è Status Anomali:**")
                if 'is_anomaly' in cctv_df.columns:
                    anomaly_counts = cctv_df['is_anomaly'].value_counts()
                    total = len(cctv_df)
                    for status, count in anomaly_counts.items():
                        status_text = "Anomali" if status else "Normal"
                        percentage = (count / total) * 100
                        st.write(f"- {status_text}: {count} ({percentage:.1f}%)")
            
            # Tampilkan sample CCTV data
            st.write("**Sample CCTV Logs (10 terbaru):**")
            
            # Sort by timestamp jika ada
            if 'timestamp' in cctv_df.columns:
                cctv_df_sorted = cctv_df.sort_values('timestamp', ascending=False)
            else:
                cctv_df_sorted = cctv_df
            
            # Pilih kolom CCTV untuk ditampilkan
            cctv_show_cols = ['cctv_id', 'log_id', 'school', 'city', 'location', 'crowd_level', 'noise_level', 'is_anomaly', 'timestamp']
            cctv_available_cols = [col for col in cctv_show_cols if col in cctv_df_sorted.columns]
            
            if cctv_available_cols:
                cctv_sample = cctv_df_sorted[cctv_available_cols].head(10).copy()
                
                # Format timestamp
                if 'timestamp' in cctv_sample.columns:
                    cctv_sample['timestamp'] = pd.to_datetime(cctv_sample['timestamp']).dt.strftime('%Y-%m-%d %H:%M')
                
                # Format boolean untuk anomaly
                if 'is_anomaly' in cctv_sample.columns:
                    cctv_sample['is_anomaly'] = cctv_sample['is_anomaly'].apply(lambda x: '‚úÖ YA' if x else '‚ùå TIDAK')
                
                st.dataframe(cctv_sample, use_container_width=True, height=300)
                
                # Download button untuk CCTV
                csv_cctv = cctv_sample.to_csv(index=False)
                st.download_button(
                    label="üì• Download Sample CCTV (CSV)",
                    data=csv_cctv,
                    file_name=f"sample_cctv_{datetime.now().strftime('%Y%m%d')}.csv",
                    mime="text/csv",
                    key="download_cctv_tab5"
                )
            else:
                st.info("Kolom data CCTV tidak tersedia")
        else:
            st.info("Tidak ada data CCTV yang tersedia")
    
    # ========== FOOTER ==========
    st.markdown("---")
    st.markdown("**Sistem Deteksi Bullying** ‚Ä¢ Teknik Informatika UNRAM ‚Ä¢ ¬© 2025")
    st.caption(f"Dashboard terakhir di-load: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

if __name__ == "__main__":
    main()'''
    
    with open('dashboard_bullying.py', 'w', encoding='utf-8') as f:
        f.write(dashboard_code)
    
    print("‚úÖ Dashboard dengan PETA berhasil dibuat: dashboard_bullying.py")
    print("üéØ Jalankan dengan: streamlit run dashboard_bullying.py")
    print("")
    print("‚ú® **PERBAIKAN UTAMA YANG DITERAPKAN:**")
    print("1. ‚úÖ **Tab 4 (Tweet & CCTV Log):**")
    print("   ‚Ä¢ Pagination untuk data besar")
    print("   ‚Ä¢ Error handling untuk DataFrame kosong")
    print("   ‚Ä¢ Format datetime yang benar")
    print("   ‚Ä¢ Key unik untuk semua Streamlit widgets")
    print("   ‚Ä¢ Download button untuk CSV")
    
    print("\n2. ‚úÖ **Tab 5 (Data Detail):**")
    print("   ‚Ä¢ Tampilkan distribusi dengan persentase")
    print("   ‚Ä¢ Sample data yang informatif")
    print("   ‚Ä¢ Section terpisah untuk Tweet dan CCTV")
    print("   ‚Ä¢ Debug info di sidebar")
    
    print("\n3. ‚úÖ **Fungsionalitas Lengkap:**")
    print("   ‚Ä¢ Peta heatmap Indonesia")
    print("   ‚Ä¢ Visualisasi sama dengan notebook")
    print("   ‚Ä¢ Data dummy jika MongoDB error")
    print("   ‚Ä¢ CSS styling yang konsisten")
    print("   ‚Ä¢ Refresh button untuk update data")
    
    return dashboard_code

## Fungsi untuk Ekspor Data

In [61]:
def setup_export_folders(base_folder="hasil_analisis"):
    """Buat struktur folder untuk export data"""
    
    folders = {
        'base': base_folder,
        'tweets': os.path.join(base_folder, "data_tweet"),
        'cctv': os.path.join(base_folder, "data_cctv"),
        'alerts': os.path.join(base_folder, "data_alert"),
        'visualisasi': os.path.join(base_folder, "data_visualisasi"),
        'sample': os.path.join(base_folder, "sample_data"),
        'dashboard': os.path.join(base_folder, "dashboard_assets")
    }
    
    # Buat semua folder
    for folder_name, folder_path in folders.items():
        if not os.path.exists(folder_path):
            os.makedirs(folder_path)
            print(f"üìÅ Folder '{folder_name}' dibuat: {folder_path}")
        else:
            print(f"üìÅ Folder '{folder_name}' sudah ada: {folder_path}")
    
    return folders


In [62]:
def export_data_to_csv(db, base_folder="hasil_analisis", export_tweets=True, export_cctv=True, export_alerts=True):
    """Export data dari MongoDB ke CSV file dengan folder terpisah"""
    print("\n" + "="*60)
    print("üìÅ EXPORT DATA KE CSV - FOLDER TERPISAH")
    print("="*60)
    
    # Setup folder structure
    folders = setup_export_folders(base_folder)
    
    # Buat timestamp untuk nama file
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    date_only = datetime.now().strftime("%Y%m%d")
    
    try:
        # 1. EXPORT TWEETS ke folder /data_tweet/
        if export_tweets:
            print(f"\nüî∏ Export Tweet ke: {folders['tweets']}")
            tweets = list(db[COLLECTION_TWEETS].find({"processed": True}))
            if tweets:
                tweets_df = pd.DataFrame(tweets)
                
                # Pilih kolom yang relevan untuk export
                tweet_export_cols = [
                    'tweet_id', 'text', 'city', 'school', 'sentiment', 
                    'risk_level', 'risk_score', 'category', 'bullying_detected',
                    'created_at', 'processed_at', 'retweet_count', 'like_count'
                ]
                
                # Filter kolom yang ada di DataFrame
                available_cols = [col for col in tweet_export_cols if col in tweets_df.columns]
                
                # Buat DataFrame untuk export
                tweet_export_df = tweets_df[available_cols]
                
                # Format tanggal
                date_cols = ['created_at', 'processed_at']
                for col in date_cols:
                    if col in tweet_export_df.columns:
                        tweet_export_df[col] = pd.to_datetime(tweet_export_df[col]).dt.strftime('%Y-%m-%d %H:%M:%S')
                
                # Simpan ke CSV di folder tweets
                tweet_filename = os.path.join(folders['tweets'], f"tweets_export_{timestamp}.csv")
                tweet_summary_filename = os.path.join(folders['tweets'], f"tweets_summary_{date_only}.csv")
                
                tweet_export_df.to_csv(tweet_filename, index=False, encoding='utf-8')
                
                # Buat juga file summary (hanya kolom penting)
                summary_cols = ['text', 'city', 'sentiment', 'risk_level', 'risk_score', 'created_at']
                summary_cols = [col for col in summary_cols if col in tweet_export_df.columns]
                
                if summary_cols:
                    tweet_export_df[summary_cols].head(100).to_csv(tweet_summary_filename, index=False, encoding='utf-8')
                
                print(f"‚úÖ Tweets exported:")
                print(f"   ‚Ä¢ File lengkap: {tweet_filename} ({len(tweet_export_df)} rows)")
                print(f"   ‚Ä¢ File ringkasan: {tweet_summary_filename} (100 rows)")
                print(f"   ‚Ä¢ Columns: {len(tweet_export_df.columns)} kolom")
                
                # Buat statistik
                if 'risk_level' in tweet_export_df.columns:
                    risk_stats = tweet_export_df['risk_level'].value_counts()
                    print(f"   ‚Ä¢ Distribusi risiko: {dict(risk_stats)}")
            else:
                print("‚ö†Ô∏è No tweets data to export")
        
        # 2. EXPORT CCTV LOGS ke folder /data_cctv/
        if export_cctv:
            print(f"\nüî∏ Export CCTV Logs ke: {folders['cctv']}")
            cctv_logs = list(db[COLLECTION_CCTV].find())
            if cctv_logs:
                cctv_df = pd.DataFrame(cctv_logs)
                
                # Pilih kolom yang relevan
                cctv_export_cols = [
                    'log_id', 'cctv_id', 'school', 'city', 'location',
                    'timestamp', 'crowd_level', 'noise_level', 
                    'is_anomaly', 'warning_level', 'processed'
                ]
                
                # Filter kolom yang ada
                available_cols = [col for col in cctv_export_cols if col in cctv_df.columns]
                
                # Buat DataFrame untuk export
                cctv_export_df = cctv_df[available_cols]
                
                # Format tanggal
                if 'timestamp' in cctv_export_df.columns:
                    cctv_export_df['timestamp'] = pd.to_datetime(cctv_export_df['timestamp']).dt.strftime('%Y-%m-%d %H:%M:%S')
                
                # Format boolean
                if 'is_anomaly' in cctv_export_df.columns:
                    cctv_export_df['is_anomaly'] = cctv_export_df['is_anomaly'].apply(lambda x: 'YA' if x else 'TIDAK')
                
                # Simpan ke CSV di folder cctv
                cctv_filename = os.path.join(folders['cctv'], f"cctv_logs_export_{timestamp}.csv")
                cctv_export_df.to_csv(cctv_filename, index=False, encoding='utf-8')
                
                print(f"‚úÖ CCTV logs exported:")
                print(f"   ‚Ä¢ File: {cctv_filename} ({len(cctv_export_df)} rows)")
                print(f"   ‚Ä¢ Columns: {list(cctv_export_df.columns)}")
                
                # Hitung anomali
                if 'is_anomaly' in cctv_export_df.columns:
                    anomaly_count = len(cctv_export_df[cctv_export_df['is_anomaly'] == 'YA'])
                    print(f"   ‚Ä¢ Anomali terdeteksi: {anomaly_count} ({anomaly_count/len(cctv_export_df)*100:.1f}%)")
            else:
                print("‚ö†Ô∏è No CCTV data to export")
        
        # 3. EXPORT ALERTS ke folder /data_alert/
        if export_alerts:
            print(f"\nüî∏ Export Alerts ke: {folders['alerts']}")
            alerts = list(db[COLLECTION_ALERTS].find())
            if alerts:
                alerts_df = pd.DataFrame(alerts)
                
                # Pilih kolom yang relevan
                alert_export_cols = [
                    'alert_id', 'tweet_id', 'school', 'city', 'risk_level',
                    'risk_score', 'sentiment', 'category', 'status',
                    'created_at', 'alert_type', 'priority'
                ]
                
                # Filter kolom yang ada
                available_cols = [col for col in alert_export_cols if col in alerts_df.columns]
                
                # Buat DataFrame untuk export
                alert_export_df = alerts_df[available_cols]
                
                # Format tanggal
                if 'created_at' in alert_export_df.columns:
                    alert_export_df['created_at'] = pd.to_datetime(alert_export_df['created_at']).dt.strftime('%Y-%m-%d %H:%M:%S')
                
                # Simpan ke CSV di folder alerts
                alert_filename = os.path.join(folders['alerts'], f"alerts_export_{timestamp}.csv")
                alert_export_df.to_csv(alert_filename, index=False, encoding='utf-8')
                
                print(f"‚úÖ Alerts exported:")
                print(f"   ‚Ä¢ File: {alert_filename} ({len(alert_export_df)} rows)")
                
                # Urutkan berdasarkan risiko
                if 'risk_level' in alert_export_df.columns:
                    high_priority = len(alert_export_df[alert_export_df['risk_level'].isin(['merah', 'kuning'])])
                    print(f"   ‚Ä¢ High priority alerts: {high_priority}")
            else:
                print("‚ö†Ô∏è No alerts data to export")
        
        # 4. BUAT FILE README di folder base
        readme_path = os.path.join(folders['base'], "README.txt")
        with open(readme_path, 'w', encoding='utf-8') as f:
            f.write("="*60 + "\n")
            f.write("HASIL ANALISIS SISTEM DETEKSI BULLYING\n")
            f.write("="*60 + "\n\n")
            f.write(f"Tanggal Ekspor: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
            f.write(f"Folder Base: {folders['base']}\n\n")
            
            f.write("üìÇ STRUKTUR FOLDER:\n")
            f.write(f"  ‚îú‚îÄ‚îÄ üìÅ data_tweet/      - File CSV data tweet\n")
            f.write(f"  ‚îú‚îÄ‚îÄ üìÅ data_cctv/       - File CSV data CCTV\n")
            f.write(f"  ‚îú‚îÄ‚îÄ üìÅ data_alert/      - File CSV data alert\n")
            f.write(f"  ‚îú‚îÄ‚îÄ üìÅ data_visualisasi/- Data untuk chart\n")
            f.write(f"  ‚îú‚îÄ‚îÄ üìÅ sample_data/     - Data sample\n")
            f.write(f"  ‚îî‚îÄ‚îÄ üìÅ dashboard_assets/- Asset untuk dashboard\n\n")
            
            f.write("üìä DATA YANG DIEXPORT:\n")
            if export_tweets and tweets:
                f.write(f"  ‚Ä¢ Tweet: {len(tweets)} data\n")
            if export_cctv and cctv_logs:
                f.write(f"  ‚Ä¢ CCTV Logs: {len(cctv_logs)} data\n")
            if export_alerts and alerts:
                f.write(f"  ‚Ä¢ Alerts: {len(alerts)} data\n")
        
        print(f"\n" + "="*60)
        print(f"‚úÖ SEMUA EKSPORT SELESAI!")
        print(f"üìÅ Semua file disimpan di: {folders['base']}")
        print(f"üìÑ File README: {readme_path}")
        print("="*60)
        
        return folders
        
    except Exception as e:
        print(f"‚ùå Error exporting data: {e}")
        import traceback
        traceback.print_exc()
        return None

In [63]:
def export_sample_data(db, base_folder="hasil_analisis", sample_size=100):
    """Export sample data untuk testing ke folder /sample_data/"""
    print("\n" + "="*60)
    print("üìä EXPORT SAMPLE DATA - FOLDER TERPISAH")
    print("="*60)
    
    # Setup folder structure
    folders = setup_export_folders(base_folder)
    
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    date_only = datetime.now().strftime("%Y%m%d")
    
    try:
        # 1. Sample Tweets ke folder /sample_data/
        print(f"\nüî∏ Sample Tweet ke: {folders['sample']}")
        tweets = list(db[COLLECTION_TWEETS].find({"processed": True}).limit(sample_size))
        if tweets:
            tweets_df = pd.DataFrame(tweets)
            
            # Pilih kolom penting
            tweet_cols = ['text', 'city', 'school', 'sentiment', 'risk_level', 'risk_score', 'category', 'created_at']
            available_cols = [col for col in tweet_cols if col in tweets_df.columns]
            
            sample_tweets_df = tweets_df[available_cols]
            
            # Format tanggal
            if 'created_at' in sample_tweets_df.columns:
                sample_tweets_df['created_at'] = pd.to_datetime(sample_tweets_df['created_at']).dt.strftime('%Y-%m-%d %H:%M')
            
            # Simpan ke folder sample
            sample_tweet_file = os.path.join(folders['sample'], f"sample_tweets_{timestamp}.csv")
            sample_tweets_df.to_csv(sample_tweet_file, index=False, encoding='utf-8')
            
            # Buat juga file mini sample (10 data)
            mini_sample_file = os.path.join(folders['sample'], f"mini_sample_tweets_{date_only}.csv")
            sample_tweets_df.head(10).to_csv(mini_sample_file, index=False, encoding='utf-8')
            
            print(f"‚úÖ Sample tweets:")
            print(f"   ‚Ä¢ File lengkap: {sample_tweet_file} ({len(sample_tweets_df)} rows)")
            print(f"   ‚Ä¢ File mini: {mini_sample_file} (10 rows)")
        
        # 2. Sample CCTV Logs ke folder /sample_data/
        print(f"\nüî∏ Sample CCTV Logs ke: {folders['sample']}")
        cctv_logs = list(db[COLLECTION_CCTV].find().limit(sample_size))
        if cctv_logs:
            cctv_df = pd.DataFrame(cctv_logs)
            
            # Pilih kolom penting
            cctv_cols = ['cctv_id', 'school', 'city', 'location', 'timestamp', 'crowd_level', 'noise_level', 'is_anomaly']
            available_cols = [col for col in cctv_cols if col in cctv_df.columns]
            
            sample_cctv_df = cctv_df[available_cols]
            
            # Format tanggal
            if 'timestamp' in sample_cctv_df.columns:
                sample_cctv_df['timestamp'] = pd.to_datetime(sample_cctv_df['timestamp']).dt.strftime('%Y-%m-%d %H:%M')
            
            # Format boolean
            if 'is_anomaly' in sample_cctv_df.columns:
                sample_cctv_df['is_anomaly'] = sample_cctv_df['is_anomaly'].apply(lambda x: 'YA' if x else 'TIDAK')
            
            # Simpan ke folder sample
            sample_cctv_file = os.path.join(folders['sample'], f"sample_cctv_logs_{timestamp}.csv")
            sample_cctv_df.to_csv(sample_cctv_file, index=False, encoding='utf-8')
            
            # Buat juga file mini sample
            mini_cctv_file = os.path.join(folders['sample'], f"mini_sample_cctv_{date_only}.csv")
            sample_cctv_df.head(10).to_csv(mini_cctv_file, index=False, encoding='utf-8')
            
            print(f"‚úÖ Sample CCTV logs:")
            print(f"   ‚Ä¢ File lengkap: {sample_cctv_file} ({len(sample_cctv_df)} rows)")
            print(f"   ‚Ä¢ File mini: {mini_cctv_file} (10 rows)")
        
        print(f"\n‚úÖ Sample data exported successfully!")
        print(f"üìÅ Semua sample disimpan di: {folders['sample']}")
        
        return True
        
    except Exception as e:
        print(f"‚ùå Error exporting sample data: {e}")
        return False

In [64]:
def export_visualization_data(tweets_df, cctv_df, alerts_df, base_folder="hasil_analisis"):
    """Export data yang sudah diproses untuk visualisasi ke folder /data_visualisasi/"""
    
    # Setup folder structure
    folders = setup_export_folders(base_folder)
    
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    date_only = datetime.now().strftime("%Y%m%d")
    
    print(f"\n" + "="*60)
    print(f"üìà EXPORT DATA VISUALISASI")
    print(f"üìÅ Folder: {folders['visualisasi']}")
    print("="*60)
    
    try:
        # 1. Export aggregated data untuk chart - ke folder visualisasi
        if not tweets_df.empty:
            print(f"\nüî∏ Export data tweet untuk visualisasi...")
            
            # Data untuk pie chart sentimen
            sentiment_counts = tweets_df['sentiment'].value_counts().reset_index()
            sentiment_counts.columns = ['sentiment', 'count']
            sentiment_file = os.path.join(folders['visualisasi'], f"sentiment_distribution_{timestamp}.csv")
            sentiment_counts.to_csv(sentiment_file, index=False)
            print(f"   ‚úÖ Sentiment distribution: {sentiment_file}")
            
            # Data untuk risk level chart
            if 'risk_level' in tweets_df.columns:
                risk_counts = tweets_df['risk_level'].value_counts().reset_index()
                risk_counts.columns = ['risk_level', 'count']
                risk_file = os.path.join(folders['visualisasi'], f"risk_level_distribution_{timestamp}.csv")
                risk_counts.to_csv(risk_file, index=False)
                print(f"   ‚úÖ Risk level distribution: {risk_file}")
            
            # Data untuk kategori
            if 'category' in tweets_df.columns:
                category_counts = tweets_df['category'].value_counts().reset_index()
                category_counts.columns = ['category', 'count']
                category_file = os.path.join(folders['visualisasi'], f"category_distribution_{timestamp}.csv")
                category_counts.to_csv(category_file, index=False)
                print(f"   ‚úÖ Category distribution: {category_file}")
            
            # Data untuk trend waktu
            if 'created_at' in tweets_df.columns:
                tweets_df['date'] = pd.to_datetime(tweets_df['created_at']).dt.date
                daily_tweets = tweets_df.groupby('date').size().reset_index(name='tweet_count')
                trend_file = os.path.join(folders['visualisasi'], f"tweet_trend_{date_only}.csv")
                daily_tweets.to_csv(trend_file, index=False)
                print(f"   ‚úÖ Tweet trend: {trend_file}")
        
        # 2. Export CCTV anomaly data - ke folder visualisasi
        if not cctv_df.empty and 'is_anomaly' in cctv_df.columns:
            print(f"\nüî∏ Export data CCTV untuk visualisasi...")
            anomaly_data = cctv_df[cctv_df['is_anomaly'] == True]
            
            if not anomaly_data.empty:
                # Anomali per lokasi
                if 'location' in anomaly_data.columns:
                    location_counts = anomaly_data['location'].value_counts().reset_index()
                    location_counts.columns = ['location', 'anomaly_count']
                    location_file = os.path.join(folders['visualisasi'], f"cctv_anomaly_location_{timestamp}.csv")
                    location_counts.to_csv(location_file, index=False)
                    print(f"   ‚úÖ CCTV anomaly by location: {location_file}")
                
                # Anomali per kota
                if 'city' in anomaly_data.columns:
                    city_counts = anomaly_data['city'].value_counts().reset_index()
                    city_counts.columns = ['city', 'anomaly_count']
                    city_file = os.path.join(folders['visualisasi'], f"cctv_anomaly_city_{timestamp}.csv")
                    city_counts.to_csv(city_file, index=False)
                    print(f"   ‚úÖ CCTV anomaly by city: {city_file}")
                
                # Trend waktu anomali
                if 'timestamp' in anomaly_data.columns:
                    anomaly_data['date'] = pd.to_datetime(anomaly_data['timestamp']).dt.date
                    daily_anomalies = anomaly_data.groupby('date').size().reset_index(name='anomaly_count')
                    anomaly_trend_file = os.path.join(folders['visualisasi'], f"anomaly_trend_{date_only}.csv")
                    daily_anomalies.to_csv(anomaly_trend_file, index=False)
                    print(f"   ‚úÖ Anomaly trend: {anomaly_trend_file}")
        
        # 3. Export alerts trend data - ke folder visualisasi
        if not alerts_df.empty and 'created_at' in alerts_df.columns:
            print(f"\nüî∏ Export data alert untuk visualisasi...")
            alerts_df['date'] = pd.to_datetime(alerts_df['created_at']).dt.date
            daily_alerts = alerts_df.groupby('date').size().reset_index(name='alert_count')
            alerts_trend_file = os.path.join(folders['visualisasi'], f"alerts_trend_{timestamp}.csv")
            daily_alerts.to_csv(alerts_trend_file, index=False)
            print(f"   ‚úÖ Alerts trend: {alerts_trend_file}")
            
            # Alert by risk level
            if 'risk_level' in alerts_df.columns:
                alert_by_risk = alerts_df['risk_level'].value_counts().reset_index()
                alert_by_risk.columns = ['risk_level', 'count']
                alert_risk_file = os.path.join(folders['visualisasi'], f"alert_by_risk_{timestamp}.csv")
                alert_by_risk.to_csv(alert_risk_file, index=False)
                print(f"   ‚úÖ Alerts by risk level: {alert_risk_file}")
        
        # 4. Buat file konfigurasi visualisasi
        config_file = os.path.join(folders['visualisasi'], f"visualization_config_{date_only}.json")
        config_data = {
            "export_date": datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            "tweet_count": len(tweets_df) if not tweets_df.empty else 0,
            "cctv_count": len(cctv_df) if not cctv_df.empty else 0,
            "alert_count": len(alerts_df) if not alerts_df.empty else 0,
            "files_generated": [
                "sentiment_distribution.csv",
                "risk_level_distribution.csv", 
                "category_distribution.csv",
                "tweet_trend.csv",
                "cctv_anomaly_location.csv",
                "cctv_anomaly_city.csv",
                "alerts_trend.csv"
            ]
        }
        
        import json
        with open(config_file, 'w', encoding='utf-8') as f:
            json.dump(config_data, f, indent=2, ensure_ascii=False)
        
        print(f"\n‚úÖ Visualization data exported successfully!")
        print(f"üìÅ Semua file visualisasi di: {folders['visualisasi']}")
        print(f"üìÑ File konfigurasi: {config_file}")
        
        return True
        
    except Exception as e:
        print(f"‚ùå Error exporting visualization data: {e}")
        return False


In [65]:
def export_dashboard_assets(tweets_df, cctv_df, base_folder="hasil_analisis"):
    """Export asset untuk dashboard ke folder /dashboard_assets/"""
    
    # Setup folder structure
    folders = setup_export_folders(base_folder)
    
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    
    print(f"\n" + "="*60)
    print(f"üé® EXPORT DASHBOARD ASSETS")
    print(f"üìÅ Folder: {folders['dashboard']}")
    print("="*60)
    
    try:
        # 1. Export data untuk peta heatmap
        if not tweets_df.empty and 'city' in tweets_df.columns:
            # Hitung tweet per kota
            city_tweet_counts = tweets_df['city'].value_counts().reset_index()
            city_tweet_counts.columns = ['city', 'tweet_count']
            
            # Tambah koordinat jika ada
            if 'CITY_COORDINATES' in globals():
                city_coords = globals()['CITY_COORDINATES']
                city_tweet_counts['lat'] = city_tweet_counts['city'].apply(lambda x: city_coords.get(x, {}).get('lat', 0))
                city_tweet_counts['lon'] = city_tweet_counts['city'].apply(lambda x: city_coords.get(x, {}).get('lon', 0))
            
            heatmap_file = os.path.join(folders['dashboard'], f"heatmap_data_{timestamp}.csv")
            city_tweet_counts.to_csv(heatmap_file, index=False)
            print(f"‚úÖ Heatmap data: {heatmap_file}")
        
        # 2. Export top schools data
        if not tweets_df.empty and 'school' in tweets_df.columns:
            school_counts = tweets_df['school'].value_counts().head(20).reset_index()
            school_counts.columns = ['school', 'tweet_count']
            school_file = os.path.join(folders['dashboard'], f"top_schools_{timestamp}.csv")
            school_counts.to_csv(school_file, index=False)
            print(f"‚úÖ Top schools data: {school_file}")
        
        # 3. Export real-time metrics
        metrics = {
            'total_tweets': len(tweets_df) if not tweets_df.empty else 0,
            'total_cctv_logs': len(cctv_df) if not cctv_df.empty else 0,
            'high_risk_tweets': len(tweets_df[tweets_df['risk_level'].isin(['merah', 'kuning'])]) if not tweets_df.empty and 'risk_level' in tweets_df.columns else 0,
            'cctv_anomalies': len(cctv_df[cctv_df['is_anomaly'] == True]) if not cctv_df.empty and 'is_anomaly' in cctv_df.columns else 0,
            'export_time': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        }
        
        import json
        metrics_file = os.path.join(folders['dashboard'], f"dashboard_metrics_{timestamp}.json")
        with open(metrics_file, 'w', encoding='utf-8') as f:
            json.dump(metrics, f, indent=2, ensure_ascii=False)
        
        print(f"‚úÖ Dashboard metrics: {metrics_file}")
        
        print(f"\nüé® Dashboard assets exported successfully!")
        print(f"üìÅ Semua asset di: {folders['dashboard']}")
        
        return True
        
    except Exception as e:
        print(f"‚ùå Error exporting dashboard assets: {e}")
        return False

## Fungsi Utama untuk Menjalankan Semua

In [66]:
def run_complete_system():
    """Jalankan seluruh sistem dari awal hingga akhir"""
    print("=" * 60)
    print("SISTEM DETEKSI BULLYING - WEB SCRAPING & NLP")
    print("=" * 60)
    
    # Jalankan pipeline utama
    print("\nüöÄ MEMULAI PIPELINE ANALISIS...")
    result = main_pipeline()
    
    if result:
        db, processed_tweets, alerts, cctv_logs, schools_data = result
        
        # ========== TAMPILKAN SAMPLE DATA DI NOTEBOOK ==========
        print("\n" + "=" * 60)
        print("üìã SAMPLE DATA DI NOTEBOOK")
        print("=" * 60)
        
        # Ambil data dari MongoDB untuk ditampilkan
        tweets_collection = db[COLLECTION_TWEETS]
        cctv_collection = db[COLLECTION_CCTV]
        alerts_collection = db[COLLECTION_ALERTS]
        
        # Ambil 10 data terbaru untuk ditampilkan
        latest_tweets = list(tweets_collection.find({"processed": True})
                            .sort("created_at", -1)
                            .limit(10))
        
        latest_cctv = list(cctv_collection.find()
                          .sort("timestamp", -1)
                          .limit(10))
        
        print("\nüîπ **10 TWEET TERBARU:**")
        print("-" * 100)
        for i, tweet in enumerate(latest_tweets, 1):
            print(f"{i}. [{tweet.get('created_at', 'N/A')}]")
            print(f"   üìç Kota: {tweet.get('city', 'N/A')}")
            print(f"   üè´ Sekolah: {tweet.get('school', 'N/A')}")
            print(f"   üí¨ Text: {tweet.get('text', 'N/A')[:80]}...")
            print(f"   üòä Sentimen: {tweet.get('sentiment', 'N/A')}")
            print(f"   ‚ö†Ô∏è Risk Level: {tweet.get('risk_level', 'N/A')}")
            print(f"   üìä Score: {tweet.get('risk_score', 'N/A')}/20")
            print(f"   üîç Category: {tweet.get('category', 'N/A')}")
            print("-" * 100)
        
        print("\nüîπ **10 CCTV LOG TERBARU:**")
        print("-" * 100)
        for i, log in enumerate(latest_cctv, 1):
            print(f"{i}. [ID: {log.get('cctv_id', log.get('log_id', 'N/A'))}]")
            print(f"   üïê Waktu: {log.get('timestamp', 'N/A')}")
            print(f"   üìç Lokasi: {log.get('location', 'N/A')}")
            print(f"   üè´ Sekolah: {log.get('school', 'N/A')}")
            print(f"   üë• Keramaian: {log.get('crowd_level', 'N/A')} orang")
            print(f"   üîä Kebisingan: {log.get('noise_level', 'N/A')} dB")
            print(f"   ‚ö†Ô∏è Anomali: {'‚úÖ YA' if log.get('is_anomaly', False) else '‚ùå TIDAK'}")
            print(f"   üö® Warning Level: {log.get('warning_level', 'N/A')}")
            print("-" * 100)
        
        # ========== VISUALISASI DI NOTEBOOK ==========
        print("\n" + "=" * 60)
        print("üìä MEMBUAT VISUALISASI DI NOTEBOOK")
        print("=" * 60)
        
        tweets_df, cctv_df, alerts_df, schools_df = create_visualizations(db)
        
        # ========== TAMPILKAN STATISTIK DI NOTEBOOK ==========
        print("\n" + "=" * 60)
        print("üìà STATISTIK LENGKAP DI NOTEBOOK")
        print("=" * 60)
        
        if not tweets_df.empty:
            print(f"\nüìä STATISTIK TWEET:")
            print(f"   ‚Ä¢ Total tweet diproses: {len(tweets_df)}")
            
            if 'sentiment' in tweets_df.columns:
                sentiment_counts = tweets_df['sentiment'].value_counts()
                print(f"   ‚Ä¢ Distribusi Sentimen:")
                for sentiment, count in sentiment_counts.items():
                    percentage = (count / len(tweets_df)) * 100
                    print(f"     - {sentiment}: {count} ({percentage:.1f}%)")
            
            if 'risk_level' in tweets_df.columns:
                risk_counts = tweets_df['risk_level'].value_counts()
                print(f"   ‚Ä¢ Distribusi Risk Level:")
                for risk, count in risk_counts.items():
                    percentage = (count / len(tweets_df)) * 100
                    print(f"     - {risk}: {count} ({percentage:.1f}%)")
            
            if 'category' in tweets_df.columns:
                category_counts = tweets_df['category'].value_counts().head(5)
                print(f"   ‚Ä¢ Top 5 Kategori:")
                for category, count in category_counts.items():
                    percentage = (count / len(tweets_df)) * 100
                    print(f"     - {category}: {count} ({percentage:.1f}%)")
            
            if 'city' in tweets_df.columns:
                city_counts = tweets_df['city'].value_counts().head(5)
                print(f"   ‚Ä¢ Top 5 Kota:")
                for city, count in city_counts.items():
                    percentage = (count / len(tweets_df)) * 100
                    print(f"     - {city}: {count} ({percentage:.1f}%)")
        
        if not cctv_df.empty:
            print(f"\nüìπ STATISTIK CCTV:")
            print(f"   ‚Ä¢ Total log CCTV: {len(cctv_df)}")
            
            if 'is_anomaly' in cctv_df.columns:
                anomalies = len(cctv_df[cctv_df['is_anomaly'] == True])
                percentage = (anomalies / len(cctv_df)) * 100 if len(cctv_df) > 0 else 0
                print(f"   ‚Ä¢ Anomali terdeteksi: {anomalies} ({percentage:.1f}%)")
            
            if 'location' in cctv_df.columns:
                location_counts = cctv_df['location'].value_counts()
                print(f"   ‚Ä¢ Distribusi Lokasi:")
                for location, count in location_counts.items():
                    percentage = (count / len(cctv_df)) * 100
                    print(f"     - {location}: {count} ({percentage:.1f}%)")
        
        # ========== TAMPILKAN DATAFRAME DI NOTEBOOK ==========
        print("\n" + "=" * 60)
        print("üìã DATAFRAME PREVIEW DI NOTEBOOK")
        print("=" * 60)
        
        # Tampilkan DataFrame tweets
        if not tweets_df.empty:
            print("\nüìù Dataframe Tweets (5 rows):")
            # Pilih kolom untuk display
            tweet_display_cols = ['text', 'city', 'school', 'sentiment', 'risk_level', 'risk_score', 'category', 'created_at']
            available_cols = [col for col in tweet_display_cols if col in tweets_df.columns]
            
            if available_cols:
                # Buat copy untuk display
                display_tweets = tweets_df[available_cols].head(5).copy()
                
                # Format text agar tidak terlalu panjang
                if 'text' in display_tweets.columns:
                    display_tweets['text'] = display_tweets['text'].apply(lambda x: str(x)[:50] + '...' if len(str(x)) > 50 else str(x))
                
                # Format tanggal
                if 'created_at' in display_tweets.columns:
                    display_tweets['created_at'] = pd.to_datetime(display_tweets['created_at']).dt.strftime('%Y-%m-%d %H:%M')
                
                # Tampilkan sebagai tabel
                print(display_tweets.to_string(index=False))
            else:
                print("   Tidak ada kolom yang tersedia untuk display")
        
        # Tampilkan DataFrame CCTV
        if not cctv_df.empty:
            print("\nüìπ Dataframe CCTV Logs (5 rows):")
            # Pilih kolom untuk display
            cctv_display_cols = ['cctv_id', 'school', 'city', 'location', 'timestamp', 'crowd_level', 'noise_level', 'is_anomaly']
            available_cols = [col for col in cctv_display_cols if col in cctv_df.columns]
            
            if available_cols:
                # Buat copy untuk display
                display_cctv = cctv_df[available_cols].head(5).copy()
                
                # Format tanggal
                if 'timestamp' in display_cctv.columns:
                    display_cctv['timestamp'] = pd.to_datetime(display_cctv['timestamp']).dt.strftime('%Y-%m-%d %H:%M')
                
                # Format boolean
                if 'is_anomaly' in display_cctv.columns:
                    display_cctv['is_anomaly'] = display_cctv['is_anomaly'].apply(lambda x: '‚úÖ' if x else '‚ùå')
                
                # Tampilkan sebagai tabel
                print(display_cctv.to_string(index=False))
            else:
                print("   Tidak ada kolom yang tersedia untuk display")
        
        # ========== EXPORT KE CSV (SETELAH TAMPIL DI NOTEBOOK) ==========
        print("\n" + "=" * 60)
        print("üìÅ EXPORT DATA KE CSV FILE")
        print("=" * 60)
        
        # MENJADI INI:
        folders = export_data_to_csv(
            db, 
            base_folder="HASIL_ANALISIS_BULLYING",  # Nama folder utama
            export_tweets=True, 
            export_cctv=True, 
            export_alerts=True
        )
        
        # Export data visualisasi juga
        export_visualization_data(tweets_df, cctv_df, alerts_df, base_folder="HASIL_ANALISIS_BULLYING")
        
        # Export sample data
        export_sample_data(db, base_folder="HASIL_ANALISIS_BULLYING", sample_size=50)
        
        # Export dashboard assets
        export_dashboard_assets(tweets_df, cctv_df, base_folder="HASIL_ANALISIS_BULLYING")
        
        # ========== BUAT DASHBOARD STREAMLIT ==========
        print("\n" + "=" * 60)
        print("üöÄ MEMBUAT DASHBOARD STREAMLIT")
        print("=" * 60)
        
        create_streamlit_dashboard()
        
        # ========== RINGKASAN AKHIR ==========
        print("\n" + "=" * 60)
        print("üéâ SISTEM BERHASIL DIEKSEKUSI!")
        print("=" * 60)
        
        print("\nüìÅ FILE YANG DIHASILKAN:")
        print("   üìä DI NOTEBOOK (sudah dilihat):")
        print("     ‚Ä¢ Visualisasi chart & diagram")
        print("     ‚Ä¢ Sample data tweet & CCTV")
        print("     ‚Ä¢ Statistik lengkap")
        print("     ‚Ä¢ DataFrame preview")
        
        print("\n   üíæ FILE CSV (untuk analisis lebih lanjut):")
        print("     ‚Ä¢ tweets_export_YYYYMMDD_HHMMSS.csv")
        print("     ‚Ä¢ cctv_logs_export_YYYYMMDD_HHMMSS.csv")
        print("     ‚Ä¢ alerts_export_YYYYMMDD_HHMMSS.csv")
        print("     ‚Ä¢ sample_tweets_YYYYMMDD.csv")
        print("     ‚Ä¢ sample_cctv_logs_YYYYMMDD.csv")
        
        print("\n   üìà DATA VISUALISASI:")
        print("     ‚Ä¢ sentiment_distribution_YYYYMMDD_HHMMSS.csv")
        print("     ‚Ä¢ risk_level_distribution_YYYYMMDD_HHMMSS.csv")
        print("     ‚Ä¢ cctv_anomaly_location_YYYYMMDD_HHMMSS.csv")
        
        print("\n   üåê DASHBOARD STREAMLIT:")
        print("     ‚Ä¢ dashboard_bullying.py")
        
        print("\n" + "=" * 60)
        print("üöÄ NEXT STEPS:")
        print("1. File CSV bisa dibuka di Excel/Google Sheets")
        print("2. Dashboard: streamlit run dashboard_bullying.py")
        print("3. Buka browser ke http://localhost:8501")
        print("=" * 60)
        
        return True
    
    return False

In [67]:
run_complete_system()

SISTEM DETEKSI BULLYING - WEB SCRAPING & NLP

üöÄ MEMULAI PIPELINE ANALISIS...
MEMULAI PIPELINE ANALISIS BULLYING

1. Menghubungkan ke MongoDB...
üîÑ Mencoba koneksi ke MongoDB Atlas...
‚úÖ Koneksi berhasil!
üìÅ Koleksi 'tweets' sudah ada
üìÅ Koleksi 'cctv_logs' sudah ada
üìÅ Koleksi 'schools' sudah ada
üìÅ Koleksi 'alerts' sudah ada
üîç Membuat index untuk query yang cepat...
‚úÖ Semua index berhasil dibuat!

2. Generate data dummy...
   - Generating tweets...
‚úÖ Generated 1500 tweets dengan distribusi:
   ‚Ä¢ korban_direct: 375 (25.0%)
   ‚Ä¢ pelaku: 225 (15.0%)
   ‚Ä¢ report: 150 (10.0%)
   ‚Ä¢ saksi: 300 (20.0%)
   ‚Ä¢ support: 150 (10.0%)
   ‚Ä¢ positif_umum: 150 (10.0%)
   ‚Ä¢ korban_potensial: 150 (10.0%)
   - Generating CCTV logs...

3. Menyimpan data mentah ke MongoDB...
Disimpan 500 dokumen ke tweets
Disimpan 300 dokumen ke cctv_logs

4. Memproses tweets dengan NLP...

üìù Processing: kenapa ya aku selalu jadi bahan olokan? capek ment...
   Sentiment: netral, Score: 


2. ‚ö†Ô∏è Distribusi Risk Level...
   üìä Distribusi risk level (urutan benar):
   - merah: 5706 (57.1%)
   - kuning: 955 (9.6%)
   - hijau: 164 (1.6%)
   - aman: 3175 (31.8%)



3. üè∑Ô∏è Distribusi Kategori Bullying...
   üìä Kategori yang terdeteksi:
   - unknown: 5488 (54.9%)
   - saksi: 3040 (30.4%)
   - korban: 731 (7.3%)
   - report: 442 (4.4%)
   - pelaku: 299 (3.0%)



4. üìç Heatmap Anomali CCTV...
   üìä Anomali CCTV terdeteksi: 2487 records
   üìç Distribusi per kota (top 5):
   - Bekasi: 195 anomali
   - Depok: 185 anomali
   - Mataram: 175 anomali
   - Surabaya: 174 anomali
   - Makassar: 170 anomali



5. üìÖ Trend Alert Harian...
   üìä Total hari dengan alert: 2



6. üé® Membuat Dashboard Interaktif Lengkap...



‚úÖ VISUALISASI SELESAI DENGAN LOGIKA YANG BENAR!

üìã RINGKASAN PERBAIKAN YANG DITERAPKAN:
1. ‚úÖ Sentimen: 3 kategori (positif, netral, negatif)
2. ‚úÖ Risk Level: Urutan benar (merah‚Üíkuning‚Üíhijau‚Üíaman)
3. ‚úÖ Kategori: 8+ kategori detil untuk analisis mendalam
4. ‚úÖ Warna: Konsisten dengan makna (merah=tinggi, hijau=rendah)
5. ‚úÖ Dashboard: Layout 3x2 yang lebih informatif

üìà STATISTIK LENGKAP DI NOTEBOOK

üìä STATISTIK TWEET:
   ‚Ä¢ Total tweet diproses: 10000
   ‚Ä¢ Distribusi Sentimen:
     - negatif: 5304 (53.0%)
     - netral: 4582 (45.8%)
     - positif: 114 (1.1%)
   ‚Ä¢ Distribusi Risk Level:
     - merah: 5706 (57.1%)
     - aman: 3175 (31.8%)
     - kuning: 955 (9.6%)
     - hijau: 164 (1.6%)
   ‚Ä¢ Top 5 Kategori:
     - unknown: 5488 (54.9%)
     - saksi: 3040 (30.4%)
     - korban: 731 (7.3%)
     - report: 442 (4.4%)
     - pelaku: 299 (3.0%)
   ‚Ä¢ Top 5 Kota:
     - Jakarta: 716 (7.2%)
     - Bekasi: 702 (7.0%)
     - Surakarta: 692 (6.9%)
     - Denpas

True