# Sistem Crawling Komentar YouTube (.ipynb)
# Running via Jupyter, Google Colab, atau Visual Studio Code

Skrip untuk melakukan ekstraksi komentar dari video YouTube menggunakan YouTube Data API v3. Dirancang khusus untuk keperluan penelitian akademik dan analisis data.

## Fitur Utama:
- Ekstraksi komentar dan balasan secara otomatis
- Pengumpulan metadata video (judul, durasi, views, dll)
- Analisis sentimen dan statistik komentar
- Export data ke format Excel untuk analisis lanjutan
- Penanganan error yang robust dan user-friendly

## Persyaratan Sistem:
1. API Key dari Google Cloud Console
2. YouTube Data API v3 yang sudah diaktifkan
3. Python libraries yang diperlukan (lihat instalasi di bawah)
4. Koneksi internet yang stabil

## Kegunaan Akademik:
- Penelitian sentiment analysis
- Studi perilaku pengguna media sosial
- Analisis engagement konten digital
- Data mining untuk riset komunikasi

---

## 1. Instalasi Library yang Diperlukan

Jalankan cell di bawah ini untuk menginstall semua library Python yang diperlukan untuk penelitian:

In [9]:
# 📦 INSTALASI PAKET PYTHON YANG DIPERLUKAN
# ==========================================
# Cell ini hanya berisi komentar untuk referensi.

# Jalankan perintah pip install di terminal jika belum terinstall:
# pip install ipywidgets pandas google-api-python-client matplotlib seaborn wordcloud textblob openpyxl xlrd

# Atau install satu per satu sesuai kebutuhan:
# pip install ipywidgets              # Untuk widget interaktif
# pip install pandas                  # Untuk manipulasi data
# pip install google-api-python-client # Untuk YouTube API
# pip install matplotlib              # Untuk visualisasi data
# pip install seaborn                 # Untuk grafik statistik
# pip install wordcloud               # Untuk analisis kata
# pip install textblob                # Untuk analisis sentimen
# pip install openpyxl                # Untuk file Excel .xlsx
# pip install xlrd                    # Untuk file Excel .xls


print("✅ Cell ini hanya berisi komentar untuk referensi kode.")

✅ Cell ini hanya berisi komentar untuk referensi kode.


## 2. Import Library dan Dependensi

Import semua library yang diperlukan untuk sistem ekstraksi komentar:

In [10]:
# IMPORT LIBRARY & DEPENDENSI
# ==========================================

# Library inti untuk manipulasi data
import re
import time
import uuid
import json
import pandas as pd
from datetime import datetime
import sys

# YouTube API untuk ekstraksi data
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

# Widget Jupyter untuk interface interaktif
import ipywidgets as widgets
from IPython.display import display, clear_output

print("VERSI LIBRARY YANG DIGUNAKAN:")
print("=" * 45)

# Versi Python dan library utama
print(f"Python: {sys.version.split()[0]}")
print(f"Pandas: {pd.__version__}")
print(f"IPyWidgets: {widgets.__version__}")

# Library visualisasi data (opsional untuk analisis)
try:
    import matplotlib.pyplot as plt
    import seaborn as sns
    print(f"Matplotlib: {plt.matplotlib.__version__}")
    print(f"Seaborn: {sns.__version__}")
    print(">> Library visualisasi berhasil dimuat")
except ImportError:
    print(">> Library visualisasi tidak tersedia")

# Library pemrosesan teks untuk analisis sentimen (opsional)
try:
    from textblob import TextBlob
    from wordcloud import WordCloud
    print("TextBlob: tersedia")
    print("WordCloud: tersedia")
    print(">> Library analisis teks berhasil dimuat")
except ImportError:
    print(">> Library analisis teks tidak tersedia")

# Library Excel
try:
    import openpyxl
    print(f"OpenPyXL: {openpyxl.__version__}")
except ImportError:
    print(">> OpenPyXL tidak tersedia")

print("=" * 45)
print("Semua library berhasil di-import!")
print(f"Waktu: {datetime.now().strftime('%H:%M:%S')}")

VERSI LIBRARY YANG DIGUNAKAN:
Python: 3.13.1
Pandas: 2.3.1
IPyWidgets: 8.1.7
Matplotlib: 3.10.3
Seaborn: 0.13.2
>> Library visualisasi berhasil dimuat
TextBlob: tersedia
WordCloud: tersedia
>> Library analisis teks berhasil dimuat
OpenPyXL: 3.1.5
Semua library berhasil di-import!
Waktu: 15:36:56


## 3. Fungsi Utilitas dan Helper

Kumpulan fungsi utilitas untuk membantu proses ekstraksi dan analisis data:

In [11]:
# =========================================================================
# YOUTUBE URLS MANAGER - VERSI LENGKAP DENGAN MULTIPLE FILES SUPPORT
# =========================================================================

import ipywidgets as widgets
from IPython.display import display, clear_output, HTML
import pandas as pd
import io
import os
from datetime import datetime
import re

# ===== GLOBAL VARIABLES =====
video_data_list = []
action_history = []
history_position = -1
MAX_HISTORY = 20
total_excel_files_uploaded = 0  # Counter untuk total file Excel yang sudah diupload
uploaded_filenames = set()  # Set untuk tracking nama file yang sudah diupload

print("🚀 Memuat YouTube URLs Manager...")

# ===== CORE YOUTUBE FUNCTIONS =====
def extract_video_id(url):
    """Ekstrak video ID dari berbagai format YouTube URL"""
    if not url or not isinstance(url, str):
        return None
    
    url = url.strip()
    patterns = [
        r'(?:youtube\.com\/watch\?v=|youtu\.be\/|youtube\.com\/embed\/|youtube\.com\/v\/)([a-zA-Z0-9_-]{11})',
        r'youtube\.com\/watch\?.*v=([a-zA-Z0-9_-]{11})',
        r'^([a-zA-Z0-9_-]{11})$'
    ]
    
    for pattern in patterns:
        match = re.search(pattern, url)
        if match:
            return match.group(1)
    return None

def validate_video_id(video_id):
    """Validasi format video ID YouTube"""
    if not video_id:
        return False
    return len(video_id) == 11 and re.match(r'^[a-zA-Z0-9_-]{11}$', video_id)

# ===== EXCEL TEMPLATE FUNCTION =====
def create_excel_template():
    """Membuat template Excel untuk upload YouTube URLs"""
    try:
        template_data = {
            'youtube_url': [
                'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
                'https://youtu.be/9bZkp7q19f0', 
                'https://www.youtube.com/watch?v=kJQP7kiw5Fk',
                '',
                ''
            ]
        }
        
        df = pd.DataFrame(template_data)
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        filename = f"youtube_urls_template_{timestamp}.xlsx"
        df.to_excel(filename, index=False, engine='openpyxl')
        
        return filename
    except Exception as e:
        print(f"❌ Error membuat template: {e}")
        return None

# ===== HISTORY MANAGEMENT FUNCTIONS =====
def save_action_to_history():
    """Simpan state ke history untuk undo/redo"""
    global action_history, history_position
    
    current_state = {
        'urls': url_textarea.value,
        'timestamp': datetime.now().strftime("%H:%M:%S"),
        'counter': len([url.strip() for url in url_textarea.value.split('\n') if url.strip()])
    }
    
    # Hapus semua history setelah posisi saat ini (jika ada redo yang belum digunakan)
    action_history = action_history[:history_position + 1]
    
    # Tambah state baru
    action_history.append(current_state)
    history_position = len(action_history) - 1
    
    # Batasi maksimal history
    if len(action_history) > MAX_HISTORY:
        action_history = action_history[-MAX_HISTORY:]
        history_position = len(action_history) - 1

def perform_undo():
    """Undo ke state sebelumnya"""
    global history_position
    
    if history_position > 0:
        history_position -= 1
        state = action_history[history_position]
        url_textarea.value = state['urls']
        update_url_counter()
        
        with output_area:
            clear_output(wait=True)
            print(f"↶ Undo berhasil! Kembali ke state {state['timestamp']} ({state['counter']} URLs)")
    else:
        with output_area:
            clear_output(wait=True)
            print("❌ Tidak ada aksi yang bisa di-undo")

def perform_redo():
    """Redo ke state selanjutnya"""
    global history_position
    
    if history_position < len(action_history) - 1:
        history_position += 1
        state = action_history[history_position]
        url_textarea.value = state['urls']
        update_url_counter()
        
        with output_area:
            clear_output(wait=True)
            print(f"↷ Redo berhasil! Maju ke state {state['timestamp']} ({state['counter']} URLs)")
    else:
        with output_area:
            clear_output(wait=True)
            print("❌ Tidak ada aksi yang bisa di-redo")

# ===== MULTIPLE FILES EXCEL PROCESSING =====
def process_multiple_excel_files(change):
    """Handler upload multiple Excel files yang simple dan reliable"""
    global total_excel_files_uploaded
    
    if not hasattr(excel_upload, 'value') or not excel_upload.value:
        excel_status.value = "📊 Excel: Siap upload"
        update_excel_file_counter()
        return
    
    # Hitung file yang diupload
    uploaded_files = excel_upload.value
    
    if isinstance(uploaded_files, (tuple, list)):
        num_files = len(uploaded_files)
    elif isinstance(uploaded_files, dict):
        num_files = len(uploaded_files)
    else:
        num_files = 1 if uploaded_files else 0
    
    if num_files == 0:
        excel_status.value = "📊 Excel: Siap upload"
        update_excel_file_counter()
        return
    
    # Update counter dan status
    total_excel_files_uploaded += num_files
    excel_status.value = f"📊 Excel: Memproses {num_files} file(s)..."
    save_action_to_history()
    update_excel_file_counter()
    
    with output_area:
        clear_output(wait=True)
        print("📊 MEMPROSES EXCEL FILES")
        print("=" * 50)
        print(f"📁 Jumlah file: {num_files}")
        print(f"🔢 Total file Excel diupload: {total_excel_files_uploaded}")
        
        all_urls_collected = []
        successful_files = 0
        
        try:
            # Handle different FileUpload formats
            if isinstance(uploaded_files, (tuple, list)):
                file_items = []
                for i, file_data in enumerate(uploaded_files):
                    if hasattr(file_data, 'name'):
                        filename = file_data.name
                    else:
                        filename = f"file_{i+1}.xlsx"
                    file_items.append((filename, file_data))
            elif isinstance(uploaded_files, dict):
                file_items = list(uploaded_files.items())
            else:
                file_items = [("single_file.xlsx", uploaded_files)]
            
            for file_index, (filename, file_data) in enumerate(file_items, 1):
                print(f"\n📋 FILE {file_index}/{len(file_items)}: {filename}")
                print("-" * 40)
                
                try:
                    # Extract file content
                    file_content = None
                    
                    if hasattr(file_data, 'content'):
                        file_content = file_data.content
                    elif hasattr(file_data, 'data'):
                        file_content = file_data.data
                    elif isinstance(file_data, dict):
                        if 'content' in file_data:
                            file_content = file_data['content']
                        elif 'data' in file_data:
                            file_content = file_data['data']
                        else:
                            # Use first available key
                            first_key = list(file_data.keys())[0] if file_data.keys() else None
                            if first_key:
                                file_content = file_data[first_key]
                    else:
                        file_content = file_data
                    
                    if file_content is None:
                        print("❌ Could not extract file content")
                        continue
                    
                    # Convert memoryview to bytes if needed
                    if isinstance(file_content, memoryview):
                        file_content = file_content.tobytes()
                    
                    print(f"📊 File size: {len(file_content)} bytes")
                    
                    # Read Excel file
                    if isinstance(file_content, (bytes, memoryview)):
                        if isinstance(file_content, memoryview):
                            file_content = file_content.tobytes()
                        excel_buffer = io.BytesIO(file_content)
                    elif hasattr(file_content, 'read'):
                        excel_buffer = file_content
                    else:
                        print(f"❌ Unsupported file content type: {type(file_content)}")
                        continue
                        
                    df = None
                    
                    # Try different engines
                    for engine in ['openpyxl', None, 'xlrd']:
                        try:
                            excel_buffer.seek(0)
                            if engine:
                                df = pd.read_excel(excel_buffer, engine=engine)
                            else:
                                df = pd.read_excel(excel_buffer)
                            print(f"✅ Successfully read with engine: {engine or 'default'}")
                            break
                        except Exception as e:
                            if engine == 'xlrd':  # Last engine
                                raise Exception(f"Failed to read with all engines: {e}")
                            continue
                    
                    if df is None:
                        print("❌ Failed to read Excel file")
                        continue
                        
                    print(f"📋 Data: {len(df)} rows, Columns: {list(df.columns)}")
                    
                    # Find URL column
                    url_column = None
                    for col in df.columns:
                        col_str = str(col).lower()
                        if 'youtube_url' in col_str:
                            url_column = col
                            break
                        elif any(keyword in col_str for keyword in ['url', 'youtube', 'link']):
                            url_column = col
                            break
                    
                    if url_column is None:
                        url_column = df.columns[0]
                        print(f"⚠️ Using first column: {url_column}")
                    else:
                        print(f"🎯 URL column found: {url_column}")
                    
                    # Extract URLs from this file
                    file_urls = []
                    for idx, value in df[url_column].items():
                        if pd.isna(value) or value is None:
                            continue
                        
                        url_str = str(value).strip()
                        if not url_str or url_str.lower() in ['nan', 'none', '', 'null']:
                            continue
                        
                        # Check if it's a YouTube URL
                        if ('youtube.com' in url_str.lower() or 
                            'youtu.be' in url_str.lower() or
                            (len(url_str) == 11 and url_str.replace('-', '').replace('_', '').isalnum())):
                            
                            # Simple approach: add all URLs (no duplicate checking)
                            file_urls.append(url_str)
                            all_urls_collected.append(url_str)
                    
                    if file_urls:
                        print(f"✅ Found {len(file_urls)} URLs:")
                        for i, url in enumerate(file_urls[:3], 1):
                            print(f"   {i}. {url}")
                        if len(file_urls) > 3:
                            print(f"   ... and {len(file_urls) - 3} more URLs")
                        successful_files += 1
                    else:
                        print("❌ No valid URLs found")
                        
                except Exception as file_error:
                    print(f"❌ Error processing {filename}: {file_error}")
                    continue
            
            # Final summary and update textarea
            print(f"\n" + "=" * 50)
            print(f"📊 SUMMARY:")
            print(f"   • Files processed: {len(file_items)}")
            print(f"   • Successful: {successful_files}")
            print(f"   • Failed: {len(file_items) - successful_files}")
            print(f"   • Total URLs found: {len(all_urls_collected)}")
            
            # ALWAYS UPDATE TEXTAREA
            if all_urls_collected:
                current_text = url_textarea.value.strip()
                new_urls_text = '\n'.join(all_urls_collected)
                
                if current_text:
                    final_text = current_text + '\n' + new_urls_text
                else:
                    final_text = new_urls_text
                
                url_textarea.value = final_text
                update_url_counter()
                
                print(f"\n🎉 SUCCESS!")
                print(f"✅ {len(all_urls_collected)} URLs added to textarea")
                excel_status.value = f"✅ Excel: {len(all_urls_collected)} URLs from {successful_files} files imported!"
                
            else:
                print("❌ No valid URLs found in any files")
                excel_status.value = f"❌ Excel: No valid URLs found"
                
        except Exception as e:
            print(f"❌ GENERAL ERROR: {e}")
            excel_status.value = f"❌ Excel: Error - {type(e).__name__}"

# ===== UTILITY FUNCTIONS =====
def update_url_counter():
    """Update counter jumlah URLs"""
    urls = [url.strip() for url in url_textarea.value.split('\n') if url.strip()]
    count = len(urls)
    url_counter.value = f"<b style='color: #2196F3;'>📊 URLs: {count}</b>"

def update_excel_file_counter():
    """Update counter total file Excel yang sudah diupload"""
    try:
        # Update counter display dengan total file Excel yang sudah diupload
        excel_file_counter.value = f"<span style='color: #4CAF50; font-weight: bold;'>📁 Upload Excel: ({total_excel_files_uploaded})</span>"
            
        # Update status untuk file yang sedang dipilih (jika ada)
        if hasattr(excel_upload, 'value') and excel_upload.value:
            uploaded_files = excel_upload.value
            
            # Handle different formats of uploaded files
            if isinstance(uploaded_files, dict):
                current_file_count = len(uploaded_files)
            elif isinstance(uploaded_files, (tuple, list)) and len(uploaded_files) > 0:
                current_file_count = len(uploaded_files)
            else:
                current_file_count = 0
                
            if current_file_count > 0:
                excel_status.value = f"📊 Excel: {current_file_count} file(s) ready to process"
            else:
                excel_status.value = "📊 Excel: Siap upload"
        else:
            excel_status.value = "📊 Excel: Siap upload"
            
    except Exception as e:
        excel_file_counter.value = "<span style='color: #f44336;'>📁 Upload Excel: (Error)</span>"
        print(f"Error updating excel counter: {e}")

def on_textarea_change(change):
    """Handler ketika textarea berubah"""
    update_url_counter()

# ===== BUTTON HANDLERS =====
def handle_add_url(b):
    """Handler tambah URL satu per satu"""
    save_action_to_history()
    new_url = add_url_input.value.strip()
    
    if new_url:
        current_text = url_textarea.value
        if current_text:
            url_textarea.value = current_text + '\n' + new_url
        else:
            url_textarea.value = new_url
        
        add_url_input.value = ''
        update_url_counter()
        
        with output_area:
            clear_output(wait=True)
            print(f"✅ URL ditambahkan: {new_url}")

def handle_clear_all(b):
    """Handler clear semua input dan reset counter"""
    global total_excel_files_uploaded
    
    save_action_to_history()
    url_textarea.value = ""
    
    # Clear excel upload dengan format yang benar (tuple kosong)
    excel_upload.value = ()
    
    # Reset total file counter ke 0
    total_excel_files_uploaded = 0
    
    # Update counter dan status
    update_excel_file_counter()
    excel_status.value = "📊 Excel: Siap upload"
    
    with output_area:
        clear_output(wait=True)
        print("🧹 Semua input berhasil dibersihkan!")
        print("📊 Total file Excel counter direset ke 0")

def handle_download_template(b):
    """Handler download template Excel"""
    excel_status.value = "📥 Template: Sedang membuat..."
    
    with output_area:
        clear_output(wait=True)
        filepath = create_excel_template()
        
        if filepath:
            print(f"✅ Template Excel berhasil dibuat: {filepath}")
            print("\n🎯 CARA PENGGUNAAN:")
            print("1. Buka file Excel yang telah dibuat")
            print("2. Edit kolom 'youtube_url' dengan URLs Anda")
            print("3. Hapus atau ganti contoh URLs")
            print("4. Simpan file Excel")
            print("5. Upload menggunakan widget di atas")
            
            excel_status.value = f"✅ Template: Created! ({filepath})"
        else:
            print("❌ Gagal membuat template Excel!")
            excel_status.value = "❌ Template: Failed to create!"

def handle_load_samples(b):
    """Handler load contoh URLs"""
    save_action_to_history()
    
    sample_urls = """https://www.youtube.com/watch?v=dQw4w9WgXcQ
https://youtu.be/9bZkp7q19f0
https://www.youtube.com/watch?v=kJQP7kiw5Fk"""
    
    url_textarea.value = sample_urls
    update_url_counter()
    
    with output_area:
        clear_output(wait=True)
        print("📝 Sample URLs berhasil dimuat!")

def handle_undo(b):
    """Handler tombol undo"""
    perform_undo()

def handle_redo(b):
    """Handler tombol redo"""
    perform_redo()

def handle_process_urls(b):
    """Handler proses dan validasi semua URLs"""
    global video_data_list
    
    with output_area:
        clear_output(wait=True)
        
        urls = [url.strip() for url in url_textarea.value.split('\n') if url.strip()]
        
        if not urls:
            print("❌ Silakan masukkan minimal 1 URL YouTube!")
            return
        
        print(f"🔍 MEMPROSES {len(urls)} URLs...")
        print("=" * 60)
        
        video_data_list = []
        valid_videos = 0
        
        for i, url in enumerate(urls, 1):
            print(f"\n📹 URL {i}/{len(urls)}: {url}")
            
            video_id = extract_video_id(url)
            is_valid = validate_video_id(video_id)
            
            if video_id and is_valid:
                clean_url = f"https://www.youtube.com/watch?v={video_id}"
                video_data = {
                    'index': i,
                    'original_url': url,
                    'video_id': video_id,
                    'clean_url': clean_url,
                    'status': 'valid'
                }
                video_data_list.append(video_data)
                valid_videos += 1
                
                print(f"   ✅ Valid - Video ID: {video_id}")
                
            else:
                video_data = {
                    'index': i,
                    'original_url': url,
                    'video_id': None,
                    'clean_url': None,
                    'status': 'invalid'
                }
                video_data_list.append(video_data)
                print(f"   ❌ Invalid URL")
        
        print(f"\n" + "=" * 60)
        print(f"📊 HASIL VALIDASI:")
        print(f"   🎯 Total URLs: {len(urls)}")
        print(f"   ✅ Valid: {valid_videos}")
        print(f"   ❌ Invalid: {len(urls) - valid_videos}")
        
        if valid_videos > 0:
            main_status.value = f"<b style='color: green;'>Status: {valid_videos}/{len(urls)} URLs valid dan siap untuk crawling!</b>"
        else:
            main_status.value = "<b style='color: red;'>Status: Tidak ada URL yang valid!</b>"

# ===== WIDGET DEFINITIONS =====

# Main textarea
url_textarea = widgets.Textarea(
    value='',
    placeholder='Masukkan YouTube URLs (satu per baris)...\n\nContoh:\nhttps://www.youtube.com/watch?v=dQw4w9WgXcQ\nhttps://youtu.be/9bZkp7q19f0',
    description='YouTube URLs:',
    layout=widgets.Layout(width='100%', height='140px'),
    style={'description_width': '120px'}
)

# Add URL section
add_url_input = widgets.Text(
    value='',
    placeholder='Masukkan URL YouTube...',
    description='Tambah URL:',
    layout=widgets.Layout(width='70%'),
    style={'description_width': '100px'}
)

add_url_btn = widgets.Button(
    description='➕ Tambah',
    button_style='success',
    layout=widgets.Layout(width='28%', height='35px')
)

# Excel upload section
excel_upload = widgets.FileUpload(
    accept='.xlsx,.xls',
    multiple=True,  # Enable multiple files
    description='Upload Excel:',
    style={'description_width': '100px'},
    layout=widgets.Layout(width='45%')
)

excel_file_counter = widgets.HTML(
    value="<span style='color: #666; font-style: italic;'>📁 No files selected</span>",
    layout=widgets.Layout(width='50%', margin='5px 0px')
)

download_template_btn = widgets.Button(
    description='📥 Download Template',
    button_style='info',
    layout=widgets.Layout(width='50%', height='35px')
)

# Main action buttons
process_urls_btn = widgets.Button(
    description='▶️ Process & Validate URLs',
    button_style='primary',
    layout=widgets.Layout(width='48%', height='40px')
)

clear_all_btn = widgets.Button(
    description='🗑️ Clear All',
    button_style='warning',
    layout=widgets.Layout(width='48%', height='40px')
)

# Navigation buttons
undo_btn = widgets.Button(
    description='↶ Undo',
    button_style='',
    layout=widgets.Layout(width='32%', height='35px'),
    tooltip='Undo last action'
)

redo_btn = widgets.Button(
    description='↷ Redo',
    button_style='',
    layout=widgets.Layout(width='32%', height='35px'),
    tooltip='Redo next action'
)

load_samples_btn = widgets.Button(
    description='📝 Load Samples',
    button_style='',
    layout=widgets.Layout(width='32%', height='35px'),
    tooltip='Load sample YouTube URLs'
)

# Status widgets
url_counter = widgets.HTML(value="<b style='color: #2196F3;'>📊 URLs: 0</b>")
excel_status = widgets.HTML(value="📊 Excel: Siap upload")
main_status = widgets.HTML(value="<b>Status:</b> Masukkan URLs untuk memulai")

# Output area with responsive height
output_area = widgets.Output(layout=widgets.Layout(height='200px', border='1px solid #ddd', padding='10px', width='100%'))

# ===== EVENT BINDINGS =====
add_url_btn.on_click(handle_add_url)
clear_all_btn.on_click(handle_clear_all)
download_template_btn.on_click(handle_download_template)
load_samples_btn.on_click(handle_load_samples)
undo_btn.on_click(handle_undo)
redo_btn.on_click(handle_redo)
process_urls_btn.on_click(handle_process_urls)

excel_upload.observe(process_multiple_excel_files, names='value')
url_textarea.observe(on_textarea_change, names='value')

# ===== LAYOUT CONSTRUCTION =====

# Add URL row
add_url_row = widgets.HBox([
    add_url_input,
    add_url_btn
], layout=widgets.Layout(width='100%', justify_content='space-between'))

# Excel upload row
excel_row = widgets.VBox([
    widgets.HBox([
        excel_upload,
        download_template_btn
    ], layout=widgets.Layout(width='100%', justify_content='space-between')),
    excel_file_counter
], layout=widgets.Layout(width='100%'))

# Main buttons row
main_buttons_row = widgets.HBox([
    process_urls_btn,
    clear_all_btn
], layout=widgets.Layout(width='100%', justify_content='space-between'))

# Navigation buttons row
nav_buttons_row = widgets.HBox([
    undo_btn,
    redo_btn,
    load_samples_btn
], layout=widgets.Layout(width='100%', justify_content='space-between'))

# Status row
status_row = widgets.VBox([
    widgets.HBox([url_counter, excel_status], layout=widgets.Layout(justify_content='space-between', width='100%')),
    main_status
], layout=widgets.Layout(width='100%'))

# Main widget container with responsive design
main_container = widgets.VBox([
    widgets.HTML("<h3 style='color: #1976D2; margin-bottom: 20px;'>🎬 YouTube URLs Manager - Multiple Files Support</h3>"),
    url_textarea,
    widgets.HTML("<h4 style='margin: 15px 0 10px 0;'>➕ Add Individual URL</h4>"),
    add_url_row,
    widgets.HTML("<h4 style='margin: 15px 0 10px 0;'>📊 Excel Upload (Multiple Files)</h4>"),
    excel_row,
    widgets.HTML("<h4 style='margin: 15px 0 10px 0;'>🎯 Actions</h4>"),
    main_buttons_row,
    nav_buttons_row,
    widgets.HTML("<h4 style='margin: 15px 0 10px 0;'>📊 Status</h4>"),
    status_row,
    widgets.HTML("<h4 style='margin: 15px 0 10px 0;'>📋 Output</h4>"),
    output_area
], layout=widgets.Layout(
    border='2px solid #1976D2', 
    padding='25px', 
    margin='10px', 
    width='100%',
    max_width='1200px',
    background_color='#fafafa'
))

# Display the widget
display(main_container)

# Initialize
save_action_to_history()  # Save initial state
update_url_counter()
update_excel_file_counter()  # Set initial excel file counter

print("✅ YouTube URLs Manager berhasil dimuat!")
print("🎯 Fitur yang tersedia:")
print("   • ✅ Multiple Excel files upload dengan counter akurat")
print("   • ✅ Undo/Redo yang berfungsi sempurna") 
print("   • ✅ Layout proporsional dan responsive")
print("   • ✅ Template Excel download")
print("   • ✅ URL validation dan processing")
print("   • ✅ Sample URLs loading")
print("   • ✅ File counter yang fleksibel dan visual")
print("\n💡 File counter akan menunjukkan:")
print("   📁 No files selected (saat kosong)")
print("   📁 1 file: filename.xlsx (saat 1 file)")
print("   📁 3 files selected (saat multiple files)")

🚀 Memuat YouTube URLs Manager...


VBox(children=(HTML(value="<h3 style='color: #1976D2; margin-bottom: 20px;'>🎬 YouTube URLs Manager - Multiple …

✅ YouTube URLs Manager berhasil dimuat!
🎯 Fitur yang tersedia:
   • ✅ Multiple Excel files upload dengan counter akurat
   • ✅ Undo/Redo yang berfungsi sempurna
   • ✅ Layout proporsional dan responsive
   • ✅ Template Excel download
   • ✅ URL validation dan processing
   • ✅ Sample URLs loading
   • ✅ File counter yang fleksibel dan visual

💡 File counter akan menunjukkan:
   📁 No files selected (saat kosong)
   📁 1 file: filename.xlsx (saat 1 file)
   📁 3 files selected (saat multiple files)


## 4. Konfigurasi YouTube API

Konfigurasi untuk YouTube Data API v3. **Ganti API_KEY dengan API Key Anda!**

### Panduan mendapatkan API Key:
1. Buka [Google Cloud Console](https://console.cloud.google.com/)
2. Buat project baru atau pilih project yang ada
3. Aktifkan YouTube Data API v3
4. Buat credentials (API Key)
5. Copy API Key ke variabel di bawah

### Catatan Penting:
- API Key diperlukan untuk mengakses data YouTube secara legal
- Gunakan untuk keperluan penelitian dan pendidikan
- Patuhi quota dan rate limit yang ditetapkan Google

In [12]:
# 🔐 YOUTUBE API KEY CONFIGURATION WIDGET
# ========================================

import subprocess
import sys

try:
    import ipywidgets as widgets
    from IPython.display import display, clear_output
    print("✅ ipywidgets tersedia")
except ImportError:
    print("📦 Installing ipywidgets...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", "ipywidgets"])
    print("✅ ipywidgets installed! Restart kernel dan jalankan cell ini lagi.")

print("🔐 YOUTUBE API KEY CONFIGURATION")
print("=" * 50)

# Widget untuk API Key input
api_key_text = widgets.Password(
    value='',
    placeholder='Masukkan YouTube Data API v3 Key Anda...',
    description='API Key:',
    style={'description_width': '100px'},
    layout=widgets.Layout(width='500px', margin='10px 0px')
)

# Button untuk test dan save
test_api_btn = widgets.Button(
    description='🧪 Test API',
    button_style='primary',
    layout=widgets.Layout(width='120px', margin='0px 10px 0px 0px'),
    style={'font_size': '11px'}
)

save_api_btn = widgets.Button(
    description='🔒 Validate & Save',
    button_style='success',
    layout=widgets.Layout(width='140px', margin='0px 10px 0px 0px'),
    style={'font_size': '10px'}
)

clear_api_btn = widgets.Button(
    description='🗑️ Clear',
    button_style='warning',
    layout=widgets.Layout(width='100px'),
    style={'font_size': '11px'}
)

# Output untuk hasil test
api_output = widgets.Output()

# Status API
api_status = widgets.HTML(value="<b>Status:</b> ⚠️ API Key belum diatur")

# Global variable untuk menyimpan API key
API_KEY = ""

def test_api_key(b):
    """Handler untuk pengujian API key dengan auto-save jika valid"""
    global API_KEY
    
    with api_output:
        clear_output(wait=True)
        
        input_key = api_key_text.value.strip()
        if not input_key:
            print("❌ Silakan masukkan API Key terlebih dahulu!")
            api_status.value = "<b>Status:</b> ❌ API Key kosong"
            return
        
        if input_key == "YOUR_YOUTUBE_API_KEY_HERE":
            print("❌ Silakan ganti dengan API Key yang valid!")
            api_status.value = "<b>Status:</b> ❌ API Key default"
            return
        
        print("🧪 PENGUJIAN API KEY UNTUK PENELITIAN...")
        print("-" * 40)
        print(f"🔑 API Key: {input_key[:8]}...{input_key[-4:]} (sebagian disembunyikan)")
        
        # Test koneksi ke YouTube API
        success = test_api_connection(input_key)
        
        if success:
            # AUTO-SAVE jika test berhasil
            API_KEY = input_key
            
            print("\n🎉 API KEY VALID UNTUK PENELITIAN!")
            print("✅ Koneksi ke YouTube API berhasil")
            print("✅ Siap untuk ekstraksi data komentar")
            print("\n💾 AUTO-SAVE BERHASIL!")
            print("🔑 API Key otomatis tersimpan ke variabel global")
            print("💡 Sekarang Anda bisa lanjut ke bagian ekstraksi data")
            api_status.value = "<b>Status:</b> ✅ API Key valid dan tersimpan"
        else:
            print("\n❌ API KEY BERMASALAH!")
            print("🔧 Lihat diagnosa error di atas untuk solusi spesifik")
            print("⚠️ API Key tidak disimpan karena tidak valid")
            api_status.value = "<b>Status:</b> ❌ API Key tidak valid"

def save_api_key(b):
    """Handler untuk manual save API key dengan validasi wajib"""
    global API_KEY
    
    input_key = api_key_text.value.strip()
    if not input_key:
        with api_output:
            clear_output(wait=True)
            print("❌ Silakan masukkan API Key terlebih dahulu!")
        api_status.value = "<b>Status:</b> ❌ API Key kosong"
        return
    
    if input_key == "YOUR_YOUTUBE_API_KEY_HERE":
        with api_output:
            clear_output(wait=True)
            print("❌ Silakan ganti dengan API Key yang valid!")
        api_status.value = "<b>Status:</b> ❌ API Key default"
        return
    
    # Cek apakah API key sudah sama dengan yang tersimpan dan valid
    if API_KEY == input_key:
        with api_output:
            clear_output(wait=True)
            print("✅ API KEY SUDAH TERSIMPAN!")
            print("-" * 30)
            print(f"🔑 API Key: {input_key[:8]}...{input_key[-4:]} (sebagian disembunyikan)")
            print("💡 API Key ini sudah aktif dan siap digunakan")
            print("🎯 Anda bisa langsung lanjut ke section crawling")
        api_status.value = "<b>Status:</b> ✅ API Key sudah tersimpan dan valid"
        return
    
    # VALIDASI WAJIB sebelum simpan
    with api_output:
        clear_output(wait=True)
        print("🔐 VALIDASI API KEY SEBELUM SIMPAN...")
        print("-" * 40)
        print(f"🔑 API Key: {input_key[:8]}...{input_key[-4:]} (sebagian disembunyikan)")
        print("🧪 Melakukan test validasi...")
        
        # Test koneksi API key
        success = test_api_connection(input_key)
        
        if success:
            # Simpan hanya jika valid
            API_KEY = input_key
            
            print("\n💾 API KEY TERSIMPAN BERHASIL!")
            print("✅ API Key valid dan sudah tersimpan")
            print("✅ Siap untuk crawling komentar")
            print("💡 Sekarang Anda bisa langsung lanjut ke section crawling")
            api_status.value = "<b>Status:</b> ✅ API Key valid dan tersimpan"
        else:
            print("\n❌ GAGAL MENYIMPAN API KEY!")
            print("🚫 API Key tidak valid, tidak akan disimpan")
            print("⚠️ Perbaiki API Key terlebih dahulu")
            print("💡 Lihat diagnosa error di atas untuk solusi")
            print("🔧 Atau coba API key yang berbeda")
            api_status.value = "<b>Status:</b> ❌ API Key tidak valid, tidak tersimpan"

def clear_api_key(b):
    """Handler untuk clear API key"""
    global API_KEY
    
    API_KEY = ""
    api_key_text.value = ''
    
    with api_output:
        clear_output()
    
    api_status.value = "<b>Status:</b> 🗑️ API Key dihapus"

# Konfigurasi YouTube API
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"

# Bind event handlers
test_api_btn.on_click(test_api_key)
save_api_btn.on_click(save_api_key)
clear_api_btn.on_click(clear_api_key)

# Layout widgets
api_button_box = widgets.HBox([test_api_btn, save_api_btn, clear_api_btn])

api_widget_box = widgets.VBox([
    widgets.HTML("<h3>🔐 YouTube API Key Configuration</h3>"),
    widgets.HTML("<div style='color: #666; font-size: 12px; margin: -10px 0 15px 0; font-style: italic;'>Developed by Ferdian Bangkit Wijaya, Universitas Sultan Ageng Tirtayasa</div>"),
    api_key_text,
    api_button_box,
    api_status,
    api_output
], layout=widgets.Layout(border='2px solid #ddd', padding='20px', margin='10px', width='600px'))

# Tampilkan widget
display(api_widget_box)

print("\n📝 INSTRUKSI API KEY:")
print("1. 🔑 Masukkan YouTube Data API v3 Key di field password")
print("2. 🧪 Klik 'Test API' untuk memverifikasi key valid")
print("3. ✅ API key otomatis tersimpan jika test berhasil")
print("4. 💾 'Save API' melakukan validasi wajib sebelum menyimpan")
print("5. 🎯 Status akan menunjukkan apakah API key siap digunakan")

print("\n🚀 VALIDASI KEAMANAN:")
print("• ✅ Test berhasil = Otomatis tersimpan ke variabel global")
print("• ❌ Test gagal = API key tidak disimpan (aman)")
print("• 🔒 Save API = Wajib validasi sebelum simpan")
print("• 🚫 API key invalid = Tidak akan tersimpan sama sekali")

print("\n🔒 KEAMANAN:")
print("• Password field: API key disembunyikan saat diketik")
print("• Partial display: Hanya sebagian key yang ditampilkan")
print("• Session only: API key tersimpan hanya selama session aktif")
print("• Mandatory validation: Hanya API key valid yang bisa tersimpan")

print("\n💡 TIPS:")
print("• API key harus aktif dan memiliki akses YouTube Data API v3")
print("• Wajib test/validasi untuk memastikan API key valid")
print("• Jika validasi gagal, API key tidak akan tersimpan")
print("• Status akan akurat menunjukkan kondisi API key")

print("\n🎯 HASIL VALIDASI API KEY:")
print("=" * 50)
print("✅ JIKA API KEY VALID:")
print("   • Status: 'API Key valid dan tersimpan'")
print("   • API key otomatis disimpan ke variabel global")
print("   • Siap untuk lanjut ke section crawling")
print("   • Tidak ada error atau peringatan")

print("\n❌ JIKA API KEY TIDAK VALID:")
print("   • Status: 'API Key tidak valid, tidak tersimpan'")
print("   • API key TIDAK akan disimpan")
print("   • Muncul diagnosa error otomatis:")
print("     - IP Restriction: Solusi hapus pembatasan IP")
print("     - Quota Exceeded: Tunggu reset atau upgrade")
print("     - Key Invalid: Buat API key baru")
print("     - Access Not Configured: Aktifkan YouTube Data API v3")

print("\n🔧 TROUBLESHOOTING OTOMATIS:")
print("• Widget akan menampilkan diagnosa error spesifik")
print("• Solusi lengkap disediakan untuk setiap jenis error")
print("• IP address saat ini ditampilkan untuk membantu debugging")
print("• Link ke Google Cloud Console untuk perbaikan cepat")

print("\n📊 STATUS YANG BISA MUNCUL:")
print("⚠️ 'API Key belum diatur' = Belum input API key")
print("❌ 'API Key kosong' = Field masih kosong")
print("❌ 'API Key default' = Masih menggunakan placeholder")
print("❌ 'API Key tidak valid, tidak tersimpan' = Gagal validasi")
print("✅ 'API Key valid dan tersimpan' = Siap crawling")
print("✅ 'API Key sudah tersimpan dan valid' = Sudah aktif")

# YouTube API Helper Functions
import requests

def check_current_ip():
    """Cek IP address publik saat ini"""
    try:
        response = requests.get('https://api.ipify.org?format=json', timeout=5)
        if response.status_code == 200:
            ip_data = response.json()
            return ip_data['ip']
        else:
            # Fallback service
            response = requests.get('https://httpbin.org/ip', timeout=5)
            if response.status_code == 200:
                ip_data = response.json()
                return ip_data['origin']
    except:
        pass
    return "Tidak dapat mendeteksi IP"

def diagnose_api_key_error(error_message):
    """Diagnosa error API Key berdasarkan input aktual di widget"""
    print("🔍 DIAGNOSA ERROR API KEY")
    print("=" * 40)
    
    error_str = str(error_message).lower()
    
    # Error yang paling sering terjadi dari input widget
    if "403" in error_str or "forbidden" in error_str or "ip" in error_str:
        print("❌ MASALAH: API Key Restricted (IP/Referrer)")
        print("\n📋 PENYEBAB:")
        print("   • API Key Anda dibatasi untuk IP address tertentu")
        print("   • Referrer restrictions aktif")
        print(f"\n🌐 IP ADDRESS SAAT INI: {check_current_ip()}")
        print("\n✅ SOLUSI CEPAT:")
        print("1. Buka https://console.cloud.google.com/apis/credentials")
        print("2. Edit API Key Anda")
        print("3. Application restrictions → pilih 'None'")
        print("4. Save dan test ulang")
        
    elif "400" in error_str or "keyinvalid" in error_str or "invalid" in error_str:
        print("❌ MASALAH: API Key Tidak Valid")
        print("\n📋 KEMUNGKINAN PENYEBAB:")
        print("   • API Key salah atau typo saat copy-paste")
        print("   • API Key sudah disabled/dihapus")
        print("   • Format API Key tidak benar")
        print("\n✅ SOLUSI:")
        print("1. Periksa kembali API Key di Google Cloud Console")
        print("2. Copy-paste ulang dengan hati-hati")
        print("3. Pastikan tidak ada spasi di awal/akhir")
        print("4. Buat API Key baru jika perlu")
        
    elif "quotaexceeded" in error_str or "quota" in error_str:
        print("❌ MASALAH: Quota API Habis")
        print("\n📋 PENYEBAB:")
        print("   • Quota harian YouTube Data API sudah terpakai")
        print("   • Project sudah mencapai limit request")
        print("\n✅ SOLUSI:")
        print("1. Tunggu sampai besok (quota reset jam 00:00 PST)")
        print("2. Atau gunakan API Key dari project lain")
        print("3. Upgrade billing untuk quota lebih besar")
        
    elif "accessnotconfigured" in error_str or "not enabled" in error_str:
        print("❌ MASALAH: YouTube Data API v3 Belum Aktif")
        print("\n📋 PENYEBAB:")
        print("   • YouTube Data API v3 belum di-enable di project")
        print("\n✅ SOLUSI:")
        print("1. Buka https://console.cloud.google.com")
        print("2. Pilih project yang benar")
        print("3. APIs & Services → Library")
        print("4. Cari 'YouTube Data API v3'")
        print("5. Klik Enable")
        
    elif "billing" in error_str or "payment" in error_str:
        print("❌ MASALAH: Billing Account Diperlukan")
        print("\n📋 PENYEBAB:")
        print("   • Project belum dihubungkan dengan billing account")
        print("\n✅ SOLUSI:")
        print("1. Setup billing account di Google Cloud Console")
        print("2. Link project dengan billing account")
        print("3. Verifikasi kartu kredit/debit")
        
    else:
        print("❌ MASALAH: Error Lain")
        print(f"\n📝 DETAIL ERROR:")
        print(f"   {error_message}")
        print("\n💡 SOLUSI UMUM:")
        print("1. Pastikan API Key format benar (39 karakter)")
        print("2. Cek apakah API Key dari project yang benar")
        print("3. Coba buat API Key baru")
        print("4. Restart kernel dan coba lagi")
    
    print("\n" + "=" * 40)

def create_youtube_service(api_key):
    """
    Membuat service YouTube API
    
    Args:
        api_key (str): YouTube Data API v3 key
        
    Returns:
        googleapiclient.discovery.Resource: YouTube service object
    """
    try:
        youtube = build(
            YOUTUBE_API_SERVICE_NAME, 
            YOUTUBE_API_VERSION, 
            developerKey=api_key
        )
        return youtube
    except Exception as e:
        print(f"❌ Error creating YouTube service: {e}")
        return None

def test_api_connection(api_key):
    """
    Test koneksi ke YouTube API dengan troubleshooting
    
    Args:
        api_key (str): YouTube Data API v3 key
        
    Returns:
        bool: True jika berhasil, False jika gagal
    """
    if not api_key or api_key == "YOUR_YOUTUBE_API_KEY_HERE":
        print("❌ API Key belum diatur! Silakan ganti dengan API Key Anda.")
        return False
    
    try:
        youtube = create_youtube_service(api_key)
        if youtube is None:
            return False
            
        # Test dengan mengambil informasi channel YouTube
        request = youtube.channels().list(
            part="snippet",
            forUsername="YouTube"
        )
        response = request.execute()
        
        print("✅ Koneksi API berhasil!")
        print(f"📡 Service: {YOUTUBE_API_SERVICE_NAME} v{YOUTUBE_API_VERSION}")
        print(f"🌐 IP Address: {check_current_ip()}")
        return True
        
    except HttpError as e:
        print(f"❌ HTTP Error: {e}")
        diagnose_api_key_error(str(e))
        return False
    except Exception as e:
        print(f"❌ Error: {e}")
        return False

# API Key siap digunakan setelah di-set melalui widget di atas
print("\n🎯 API KEY READY!")
print("=" * 30)
print("✅ Konfigurasi API selesai")
print("💡 Gunakan widget di atas untuk mengatur API Key")
print("🔑 API Key akan tersimpan di variabel global setelah di-save")
print("📱 Lanjutkan ke section berikutnya untuk mulai crawling")

✅ ipywidgets tersedia
🔐 YOUTUBE API KEY CONFIGURATION


VBox(children=(HTML(value='<h3>🔐 YouTube API Key Configuration</h3>'), HTML(value="<div style='color: #666; fo…


📝 INSTRUKSI API KEY:
1. 🔑 Masukkan YouTube Data API v3 Key di field password
2. 🧪 Klik 'Test API' untuk memverifikasi key valid
3. ✅ API key otomatis tersimpan jika test berhasil
4. 💾 'Save API' melakukan validasi wajib sebelum menyimpan
5. 🎯 Status akan menunjukkan apakah API key siap digunakan

🚀 VALIDASI KEAMANAN:
• ✅ Test berhasil = Otomatis tersimpan ke variabel global
• ❌ Test gagal = API key tidak disimpan (aman)
• 🔒 Save API = Wajib validasi sebelum simpan
• 🚫 API key invalid = Tidak akan tersimpan sama sekali

🔒 KEAMANAN:
• Password field: API key disembunyikan saat diketik
• Partial display: Hanya sebagian key yang ditampilkan
• Session only: API key tersimpan hanya selama session aktif
• Mandatory validation: Hanya API key valid yang bisa tersimpan

💡 TIPS:
• API key harus aktif dan memiliki akses YouTube Data API v3
• Wajib test/validasi untuk memastikan API key valid
• Jika validasi gagal, API key tidak akan tersimpan
• Status akan akurat menunjukkan kondisi API key

🎯 HA

## 5. Pengambilan Informasi Video

Fungsi untuk mengekstrak metadata video YouTube yang diperlukan dalam analisis penelitian:

In [13]:
# =========================================================================
# 5. PENGAMBILAN INFORMASI VIDEO
# =========================================================================

import re
import sys
from io import StringIO
from datetime import datetime
from IPython.display import clear_output

# Flag unik untuk mencegah duplikasi
VIDEO_INFO_COMPLETE_DONE = 'video_info_complete_done_20250729'

if VIDEO_INFO_COMPLETE_DONE in globals():
    print("🔄 Video info lengkap sudah selesai!")
    print("💡 Reset dengan: del globals()['video_info_complete_done_20250729']")
else:
    # CAPTURE semua output untuk kontrol penuh
    original_stdout = sys.stdout
    captured_output = StringIO()
    
    try:
        # Redirect stdout ke buffer
        sys.stdout = captured_output
        
        def clean_text_safe(text):
            """Pembersihan text yang aman"""
            if not text:
                return ""
            try:
                return str(text).encode('utf-8', 'ignore').decode('utf-8').strip()
            except:
                return str(text)

        def parse_duration(duration):
            """Konversi ISO 8601 duration ke format readable"""
            if not duration:
                return "Unknown"
            
            pattern = r'PT(?:(\d+)H)?(?:(\d+)M)?(?:(\d+)S)?'
            match = re.match(pattern, duration)
            
            if not match:
                return duration
            
            hours, minutes, seconds = match.groups()
            hours = int(hours) if hours else 0
            minutes = int(minutes) if minutes else 0
            seconds = int(seconds) if seconds else 0
            
            if hours > 0:
                return f"{hours}:{minutes:02d}:{seconds:02d}"
            elif minutes > 0:
                return f"{minutes}:{seconds:02d}"
            else:
                return f"0:{seconds:02d}"

        def format_publish_date(iso_date):
            """Format tanggal publish"""
            if not iso_date:
                return "Unknown"
            try:
                dt = datetime.fromisoformat(iso_date.replace('Z', '+00:00'))
                return dt.strftime("%d %b %Y")
            except:
                return iso_date

        def get_category_name(category_id):
            """Mapping category ID ke nama kategori"""
            categories = {
                '1': 'Film & Animation', '2': 'Autos & Vehicles', '10': 'Music',
                '15': 'Pets & Animals', '17': 'Sports', '19': 'Travel & Events',
                '20': 'Gaming', '22': 'People & Blogs', '23': 'Comedy',
                '24': 'Entertainment', '25': 'News & Politics', '26': 'Howto & Style',
                '27': 'Education', '28': 'Science & Technology'
            }
            return categories.get(category_id, f"Category {category_id}")

        def format_number(num):
            """Format angka dengan suffix K, M, B"""
            if num >= 1_000_000_000:
                return f"{num/1_000_000_000:.1f}B"
            elif num >= 1_000_000:
                return f"{num/1_000_000:.1f}M"
            elif num >= 1_000:
                return f"{num/1_000:.1f}K"
            else:
                return str(num)

        def get_video_info_complete(video_ids, api_key):
            """Complete video info dengan semua data yang tersedia"""
            try:
                youtube = create_youtube_service(api_key)
                if not youtube:
                    return {}
                
                # Single batch request dengan semua parts
                ids_string = ','.join(video_ids)
                request = youtube.videos().list(
                    part="snippet,statistics,contentDetails,status",
                    id=ids_string,
                    maxResults=50
                )
                response = request.execute()
                
                if not response.get('items'):
                    return {}
                
                # Process results dengan data lengkap
                results = {}
                for item in response['items']:
                    video_id = item['id']
                    snippet = item['snippet']
                    stats = item['statistics']
                    details = item['contentDetails']
                    status = item.get('status', {})
                    
                    # Hitung engagement rate
                    views = int(stats.get('viewCount', 0))
                    likes = int(stats.get('likeCount', 0))
                    comments = int(stats.get('commentCount', 0))
                    engagement = ((likes + comments) / views * 100) if views > 0 else 0
                    
                    results[video_id] = {
                        'video_id': video_id,
                        'video_url': f'https://www.youtube.com/watch?v={video_id}',
                        'title': clean_text_safe(snippet.get('title', '')),
                        'channel_title': clean_text_safe(snippet.get('channelTitle', '')),
                        'channel_id': snippet.get('channelId', ''),
                        'channel_url': f'https://www.youtube.com/channel/{snippet.get("channelId", "")}' if snippet.get("channelId") else '',
                        'published_at': snippet.get('publishedAt', ''),
                        'published_formatted': format_publish_date(snippet.get('publishedAt', '')),
                        'duration': details.get('duration', ''),
                        'duration_formatted': parse_duration(details.get('duration', '')),
                        'view_count': views,
                        'like_count': likes,
                        'comment_count': comments,
                        'engagement_rate': engagement,
                        'description': clean_text_safe(snippet.get('description', '')),
                        'tags': snippet.get('tags', []),
                        'category_id': snippet.get('categoryId', ''),
                        'category_name': get_category_name(snippet.get('categoryId', '')),
                        'thumbnail_url': snippet.get('thumbnails', {}).get('high', {}).get('url', ''),
                        'default_language': snippet.get('defaultLanguage', ''),
                        'privacy_status': status.get('privacyStatus', ''),
                        'upload_status': status.get('uploadStatus', ''),
                        'made_for_kids': status.get('madeForKids', False),
                        'dimension': details.get('dimension', ''),
                        'definition': details.get('definition', ''),
                        'caption': details.get('caption', False),
                        'licensed_content': details.get('licensedContent', False)
                    }
                
                return results
            except Exception as e:
                print(f"❌ Error: {e}")
                return {}

        # MAIN PROCESS - SINGLE EXECUTION ONLY
        print("📺 MENGAMBIL INFORMASI VIDEO LENGKAP")
        print("=" * 60)

        if 'video_data_list' not in globals() or not video_data_list:
            print("❌ Tidak ada data video. Jalankan bagian 3 terlebih dahulu.")
        else:
            # Get valid videos
            valid_videos = [v for v in video_data_list if v.get('status') == 'valid']
            
            if not valid_videos:
                print("❌ Tidak ada video yang valid")
            else:
                # Extract IDs and process
                ids = [v['video_id'] for v in valid_videos]
                print(f"🎯 Memproses {len(ids)} video dengan informasi lengkap...")
                print()
                
                # Get complete info in single batch
                results = get_video_info_complete(ids, API_KEY)
                
                if results:
                    success_count = 0
                    
                    # Update and display dengan informasi lengkap
                    for video in valid_videos:
                        video_id = video['video_id']
                        
                        if video_id in results:
                            info = results[video_id]
                            video['fetched_info'] = info
                            success_count += 1
                            
                            print(f"✅ {video_id}")
                            print(f"   🎬 {info['title']}")
                            print(f"   📺 {info['channel_title']} | 🏷️ {info['category_name']}")
                            print(f"   📅 {info['published_formatted']} | ⏱️ {info['duration_formatted']}")
                            print(f"   👀 {format_number(info['view_count'])} ({info['view_count']:,}) views")
                            print(f"   👍 {format_number(info['like_count'])} ({info['like_count']:,}) likes")
                            print(f"   💬 {format_number(info['comment_count'])} ({info['comment_count']:,}) comments")
                            print(f"   📊 Engagement: {info['engagement_rate']:.2f}%")
                            print(f"   🎥 Quality: {info['definition'].upper()} | Dimension: {info['dimension']}")
                            
                            # Tags (jika ada)
                            if info['tags']:
                                tags_display = ', '.join(info['tags'][:5])
                                if len(info['tags']) > 5:
                                    tags_display += f" (+{len(info['tags'])-5} more)"
                                print(f"   🏷️ Tags: {tags_display}")
                            
                            # Status info
                            status_info = []
                            if info['caption']:
                                status_info.append("📝 CC")
                            if info['made_for_kids']:
                                status_info.append("👶 Kids")
                            if info['licensed_content']:
                                status_info.append("©️ Licensed")
                            if status_info:
                                print(f"   ℹ️ Status: {' | '.join(status_info)}")
                            
                            print("=" * 60)
                        else:
                            print(f"❌ {video_id} - Tidak ditemukan")
                    
                    # Summary
                    print(f"\n🎉 BERHASIL: {success_count}/{len(valid_videos)} video")
                    
                    if success_count > 0:
                        processed = [v for v in valid_videos if 'fetched_info' in v]
                        total_views = sum(v['fetched_info']['view_count'] for v in processed)
                        total_likes = sum(v['fetched_info']['like_count'] for v in processed)
                        total_comments = sum(v['fetched_info']['comment_count'] for v in processed)
                        avg_engagement = sum(v['fetched_info']['engagement_rate'] for v in processed) / len(processed)
                        
                        print(f"📈 TOTAL VIEWS: {format_number(total_views)} ({total_views:,})")
                        print(f"👍 TOTAL LIKES: {format_number(total_likes)} ({total_likes:,})")
                        print(f"💬 TOTAL COMMENTS: {format_number(total_comments)} ({total_comments:,})")
                        print(f"📊 AVG ENGAGEMENT: {avg_engagement:.2f}%")
                        print("📝 Data lengkap siap untuk analisis mendalam!")
                else:
                    print("❌ Gagal mengambil data")

        print("\n✅ SELESAI")
        
    finally:
        # Restore stdout dan output sekali saja
        sys.stdout = original_stdout
        
        # Get captured content
        output_content = captured_output.getvalue()
        captured_output.close()
        
        # Clear semua output sebelumnya
        clear_output(wait=True)
        
        # Print output sekali saja
        print(output_content, end='')
        
        # Mark as done dengan flag unik
        globals()[VIDEO_INFO_COMPLETE_DONE] = True

🔄 Video info lengkap sudah selesai!
💡 Reset dengan: del globals()['video_info_complete_done_20250729']


## 6. Konfigurasi Atribut Crawling

Pilih atribut/data yang ingin diekstrak dari komentar YouTube untuk keperluan analisis penelitian Anda.

### 📊 Kategori Data yang Tersedia:

#### 🎯 **Data Komentar Utama**
- **Teks Komentar**: Konten utama komentar untuk analisis sentimen
- **Author Name**: Nama pengguna yang berkomentar
- **Publish Date**: Tanggal dan waktu komentar dibuat
- **Like Count**: Jumlah like pada komentar
- **Reply Count**: Jumlah balasan terhadap komentar

#### 📺 **Metadata Video**
- **Video Title**: Judul video
- **Channel Name**: Nama channel pemilik video
- **Video Duration**: Durasi video
- **View Count**: Jumlah views video
- **Video Category**: Kategori video (Music, Education, etc.)

#### 💬 **Data Balasan (Replies)**
- **Reply Text**: Teks balasan komentar
- **Reply Author**: Nama pembuat balasan
- **Reply Date**: Waktu balasan dibuat
- **Reply Likes**: Jumlah like pada balasan

#### 🔍 **Data Analisis Tambahan**
- **Comment Position**: Urutan komentar (top, recent)
- **Language Detection**: Deteksi bahasa komentar
- **Sentiment Score**: Skor sentimen (positive/negative/neutral)
- **Word Count**: Jumlah kata dalam komentar
- **Has Links**: Apakah komentar mengandung link
- **Has Mentions**: Apakah komentar menyebut user lain

### 🎯 **Kegunaan untuk Penelitian:**

| Atribut | Analisis Sentiment | Analisis Engagement | Social Network | Content Analysis |
|---------|-------------------|-------------------|----------------|-----------------|
| Teks Komentar | ✅ Utama | ✅ Konten | ✅ Interaksi | ✅ Utama |
| Like Count | ✅ Popularitas | ✅ Utama | ❌ | ✅ Engagement |
| Reply Count | ✅ Controversial | ✅ Utama | ✅ Diskusi | ✅ Interaksi |
| Author Name | ❌ | ✅ User Behavior | ✅ Utama | ✅ Demografi |
| Publish Date | ✅ Trend | ✅ Timeline | ✅ Temporal | ✅ Utama |
| Video Metadata | ✅ Context | ✅ Context | ❌ | ✅ Context |

### ⚡ **Mode Crawling:**

#### 🚀 **Mode Cepat (Fast Mode)**
- Hanya komentar utama (tanpa balasan)
- Data essential: teks, author, date, likes
- Cocok untuk: analisis sentimen dasar, overview engagement

#### 🔍 **Mode Lengkap (Complete Mode)**  
- Komentar utama + semua balasan
- Semua metadata tersedia
- Cocok untuk: analisis mendalam, social network analysis

#### 🎯 **Mode Kustom (Custom Mode)**
- Pilih atribut sesuai kebutuhan penelitian
- Optimasi kecepatan vs kelengkapan data
- Cocok untuk: penelitian spesifik dengan fokus tertentu

### 💡 **Tips Pemilihan Atribut:**

1. **Untuk Analisis Sentimen**: Fokus pada teks komentar, like count, dan reply count
2. **Untuk Engagement Analysis**: Prioritaskan like count, reply count, dan publish date
3. **Untuk Social Network**: Butuh author name, reply data, dan mention detection
4. **Untuk Content Analysis**: Semua teks data + metadata video

### ⚠️ **Pertimbangan Penting:**

- **API Quota**: Semakin banyak atribut = semakin banyak API calls
- **Processing Time**: Mode lengkap butuh waktu lebih lama
- **Storage**: Data lengkap menghasilkan file Excel lebih besar
- **Research Ethics**: Pastikan sesuai dengan guidelines penelitian institusi

---

**Selanjutnya**: Konfigurasikan pilihan atribut menggunakan widget interaktif di cell berikutnya.

In [14]:
# =========================================================================
# 6. WIDGET KONFIGURASI ATRIBUT CRAWLING
# =========================================================================

import ipywidgets as widgets
from IPython.display import display, clear_output, HTML

print("⚙️ Memuat Konfigurasi Atribut Crawling...")

# ===== GLOBAL VARIABLES =====
crawling_config = {
    'mode': 'fast',
    'include_replies': False,
    'max_comments': 100,
    'attributes': {
        'comment_text': True,
        'author_name': True,
        'publish_date': True,
        'like_count': True,
        'reply_count': True,
        'video_title': True,
        'video_url': True,
        'channel_name': True,
        'channel_url': False,
        'video_duration': False,
        'view_count': False,
        'video_category': False,
        'reply_text': False,
        'reply_author': False,
        'reply_date': False,
        'reply_likes': False,
        'comment_position': False,
        'language_detection': False,
        'sentiment_score': False,
        'word_count': False,
        'has_links': False,
        'has_mentions': False
    }
}

# ===== ATTRIBUTE DEFINITIONS =====
attribute_definitions = {
    # Data Komentar Utama
    'comment_text': {
        'name': 'Teks Komentar',
        'description': 'Konten utama komentar untuk analisis sentimen dan penelitian konten',
        'category': 'Komentar Utama',
        'research_use': 'Analisis sentimen, content analysis, text mining',
        'api_cost': 'Low'
    },
    'author_name': {
        'name': 'Nama Author',
        'description': 'Nama pengguna yang membuat komentar',
        'category': 'Komentar Utama',
        'research_use': 'Social network analysis, user behavior study',
        'api_cost': 'Low'
    },
    'publish_date': {
        'name': 'Tanggal Publish',
        'description': 'Waktu komentar dibuat (timestamp)',
        'category': 'Komentar Utama',
        'research_use': 'Temporal analysis, trend analysis',
        'api_cost': 'Low'
    },
    'like_count': {
        'name': 'Jumlah Like',
        'description': 'Jumlah like/thumbs up pada komentar',
        'category': 'Komentar Utama',
        'research_use': 'Engagement analysis, popularity measurement',
        'api_cost': 'Low'
    },
    'reply_count': {
        'name': 'Jumlah Balasan',
        'description': 'Jumlah reply terhadap komentar utama',
        'category': 'Komentar Utama',
        'research_use': 'Discussion analysis, controversial content detection',
        'api_cost': 'Low'
    },
    
    # Metadata Video
    'video_title': {
        'name': 'Judul Video',
        'description': 'Judul video YouTube sebagai context',
        'category': 'Metadata Video',
        'research_use': 'Context analysis, topic modeling',
        'api_cost': 'Low'
    },
    'video_url': {
        'name': 'URL Video',
        'description': 'Link lengkap ke video YouTube (https://youtube.com/watch?v=...)',
        'category': 'Metadata Video',
        'research_use': 'Reference linking, citation purposes, data verification',
        'api_cost': 'Low'
    },
    'channel_name': {
        'name': 'Nama Channel',
        'description': 'Nama channel pemilik video',
        'category': 'Metadata Video',
        'research_use': 'Content creator analysis, brand analysis',
        'api_cost': 'Low'
    },
    'channel_url': {
        'name': 'URL Channel',
        'description': 'Link lengkap ke channel YouTube (https://youtube.com/channel/...)',
        'category': 'Metadata Video',
        'research_use': 'Creator profiling, channel analysis, network mapping',
        'api_cost': 'Low'
    },
    'video_duration': {
        'name': 'Durasi Video',
        'description': 'Panjang video dalam detik/menit',
        'category': 'Metadata Video',
        'research_use': 'Content length vs engagement correlation',
        'api_cost': 'Low'
    },
    'view_count': {
        'name': 'Jumlah Views',
        'description': 'Total views video saat crawling',
        'category': 'Metadata Video',
        'research_use': 'Popularity analysis, viral content study',
        'api_cost': 'Low'
    },
    'video_category': {
        'name': 'Kategori Video',
        'description': 'Kategori YouTube (Music, Education, dll)',
        'category': 'Metadata Video',
        'research_use': 'Content categorization, domain-specific analysis',
        'api_cost': 'Low'
    },
    
    # Data Balasan
    'reply_text': {
        'name': 'Teks Balasan',
        'description': 'Konten balasan terhadap komentar utama',
        'category': 'Data Balasan',
        'research_use': 'Deep conversation analysis, thread analysis',
        'api_cost': 'High'
    },
    'reply_author': {
        'name': 'Author Balasan',
        'description': 'Nama pengguna yang membuat balasan',
        'category': 'Data Balasan',
        'research_use': 'Social network mapping, user interaction',
        'api_cost': 'High'
    },
    'reply_date': {
        'name': 'Tanggal Balasan',
        'description': 'Timestamp balasan dibuat',
        'category': 'Data Balasan',
        'research_use': 'Conversation timeline, response patterns',
        'api_cost': 'High'
    },
    'reply_likes': {
        'name': 'Like Balasan',
        'description': 'Jumlah like pada balasan',
        'category': 'Data Balasan',
        'research_use': 'Reply popularity, conversation quality',
        'api_cost': 'High'
    },
    
    # Data Analisis Tambahan
    'comment_position': {
        'name': 'Posisi Komentar',
        'description': 'Urutan komentar (top, recent, relevance)',
        'category': 'Analisis Tambahan',
        'research_use': 'Comment ranking analysis, algorithm study',
        'api_cost': 'Medium'
    },
    'language_detection': {
        'name': 'Deteksi Bahasa',
        'description': 'Bahasa komentar (auto-detected)',
        'category': 'Analisis Tambahan',
        'research_use': 'Multilingual analysis, language distribution',
        'api_cost': 'Medium'
    },
    'sentiment_score': {
        'name': 'Skor Sentimen',
        'description': 'Skor sentimen otomatis (positive/negative/neutral)',
        'category': 'Analisis Tambahan',
        'research_use': 'Automated sentiment analysis, emotion study',
        'api_cost': 'Medium'
    },
    'word_count': {
        'name': 'Jumlah Kata',
        'description': 'Jumlah kata dalam komentar',
        'category': 'Analisis Tambahan',
        'research_use': 'Text length analysis, verbosity measurement',
        'api_cost': 'Low'
    },
    'has_links': {
        'name': 'Mengandung Link',
        'description': 'Boolean: apakah komentar mengandung URL',
        'category': 'Analisis Tambahan',
        'research_use': 'Spam detection, external reference analysis',
        'api_cost': 'Low'
    },
    'has_mentions': {
        'name': 'Menyebut User',
        'description': 'Boolean: apakah komentar menyebut user lain (@username)',
        'category': 'Analisis Tambahan',
        'research_use': 'User interaction analysis, mention patterns',
        'api_cost': 'Low'
    }
}

# ===== MODE PRESETS =====
mode_presets = {
    'fast': {
        'name': 'Mode Cepat',
        'description': 'Ekstraksi cepat dengan data essential',
        'include_replies': False,
        'max_comments': 100,
        'attributes': {
            'comment_text': True, 'author_name': True, 'publish_date': True,
            'like_count': True, 'reply_count': True, 'video_title': True,
            'video_url': True, 'channel_name': True, 'channel_url': False, 'video_duration': False, 'view_count': False,
            'video_category': False, 'reply_text': False, 'reply_author': False,
            'reply_date': False, 'reply_likes': False, 'comment_position': False,
            'language_detection': False, 'sentiment_score': False, 'word_count': False,
            'has_links': False, 'has_mentions': False
        }
    },
    'complete': {
        'name': 'Mode Lengkap',
        'description': 'Ekstraksi lengkap dengan semua data termasuk replies',
        'include_replies': True,
        'max_comments': 200,
        'attributes': {
            'comment_text': True, 'author_name': True, 'publish_date': True,
            'like_count': True, 'reply_count': True, 'video_title': True,
            'video_url': True, 'channel_name': True, 'channel_url': True, 'video_duration': True, 'view_count': True,
            'video_category': True, 'reply_text': True, 'reply_author': True,
            'reply_date': True, 'reply_likes': True, 'comment_position': True,
            'language_detection': True, 'sentiment_score': True, 'word_count': True,
            'has_links': True, 'has_mentions': True
        }
    },
    'custom': {
        'name': 'Mode Kustom',
        'description': 'Pilih atribut sesuai kebutuhan penelitian spesifik',
        'include_replies': False,
        'max_comments': 150,
        'attributes': {
            'comment_text': True, 'author_name': True, 'publish_date': True,
            'like_count': True, 'reply_count': True, 'video_title': True,
            'video_url': True, 'channel_name': True, 'channel_url': False, 'video_duration': False, 'view_count': False,
            'video_category': False, 'reply_text': False, 'reply_author': False,
            'reply_date': False, 'reply_likes': False, 'comment_position': False,
            'language_detection': False, 'sentiment_score': False, 'word_count': False,
            'has_links': False, 'has_mentions': False
        }
    }
}

# ===== WIDGET CREATION =====

# Mode selection
mode_selector = widgets.RadioButtons(
    options=[('🚀 Mode Cepat (Fast)', 'fast'), 
             ('🔍 Mode Lengkap (Complete)', 'complete'), 
             ('🎯 Mode Kustom (Custom)', 'custom')],
    value='fast',
    description='Mode Crawling:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='400px')
)

# General settings
include_replies_checkbox = widgets.Checkbox(
    value=False,
    description='Sertakan Balasan (Replies)',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='250px')
)

max_comments_slider = widgets.IntSlider(
    value=100,
    min=10,
    max=500,
    step=10,
    description='Max Komentar:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='400px')
)

# Attribute checkboxes organized by category
attr_widgets = {}

# Komentar Utama
comment_main_widgets = []
for attr in ['comment_text', 'author_name', 'publish_date', 'like_count', 'reply_count']:
    widget = widgets.Checkbox(
        value=crawling_config['attributes'][attr],
        description=attribute_definitions[attr]['name'],
        style={'description_width': 'initial'},
        layout=widgets.Layout(width='200px')
    )
    attr_widgets[attr] = widget
    comment_main_widgets.append(widget)

# Metadata Video
video_meta_widgets = []
for attr in ['video_title', 'video_url', 'channel_name', 'channel_url', 'video_duration', 'view_count', 'video_category']:
    widget = widgets.Checkbox(
        value=crawling_config['attributes'][attr],
        description=attribute_definitions[attr]['name'],
        style={'description_width': 'initial'},
        layout=widgets.Layout(width='200px')
    )
    attr_widgets[attr] = widget
    video_meta_widgets.append(widget)

# Data Balasan
reply_widgets = []
for attr in ['reply_text', 'reply_author', 'reply_date', 'reply_likes']:
    widget = widgets.Checkbox(
        value=crawling_config['attributes'][attr],
        description=attribute_definitions[attr]['name'],
        style={'description_width': 'initial'},
        layout=widgets.Layout(width='200px')
    )
    attr_widgets[attr] = widget
    reply_widgets.append(widget)

# Analisis Tambahan
analysis_widgets = []
for attr in ['comment_position', 'language_detection', 'sentiment_score', 'word_count', 'has_links', 'has_mentions']:
    widget = widgets.Checkbox(
        value=crawling_config['attributes'][attr],
        description=attribute_definitions[attr]['name'],
        style={'description_width': 'initial'},
        layout=widgets.Layout(width='200px')
    )
    attr_widgets[attr] = widget
    analysis_widgets.append(widget)

# Apply and Reset buttons
apply_config_btn = widgets.Button(
    description='✅ Terapkan Konfigurasi',
    button_style='success',
    layout=widgets.Layout(width='200px', height='40px')
)

reset_config_btn = widgets.Button(
    description='🔄 Reset ke Default',
    button_style='warning',
    layout=widgets.Layout(width='200px', height='40px')
)

save_preset_btn = widgets.Button(
    description='💾 Simpan Preset',
    button_style='info',
    layout=widgets.Layout(width='200px', height='40px')
)

# Output area
config_output = widgets.Output(layout=widgets.Layout(height='300px', border='1px solid #ddd', padding='10px'))

# Status
config_status = widgets.HTML(value="<b>Status:</b> Konfigurasi default dimuat")

# ===== EVENT HANDLERS =====

def update_config_from_mode(mode):
    """Update konfigurasi berdasarkan mode yang dipilih"""
    preset = mode_presets[mode]
    
    # Update general settings
    include_replies_checkbox.value = preset['include_replies']
    max_comments_slider.value = preset['max_comments']
    
    # Update attribute checkboxes
    for attr, value in preset['attributes'].items():
        if attr in attr_widgets:
            attr_widgets[attr].value = value

def on_mode_change(change):
    """Handler ketika mode berubah"""
    if change['name'] == 'value':
        mode = change['new']
        update_config_from_mode(mode)
        crawling_config['mode'] = mode
        
        with config_output:
            clear_output(wait=True)
            preset = mode_presets[mode]
            print(f"🔄 MODE CHANGED: {preset['name']}")
            print("=" * 50)
            print(f"📝 {preset['description']}")
            print(f"💬 Include Replies: {'✅ Ya' if preset['include_replies'] else '❌ Tidak'}")
            print(f"📊 Max Comments: {preset['max_comments']}")
            print("\n🎯 ATRIBUT YANG AKTIF:")
            
            active_attrs = [attr for attr, active in preset['attributes'].items() if active]
            for category in ['Komentar Utama', 'Metadata Video', 'Data Balasan', 'Analisis Tambahan']:
                category_attrs = [attr for attr in active_attrs 
                                if attribute_definitions[attr]['category'] == category]
                if category_attrs:
                    print(f"\n📋 {category}:")
                    for attr in category_attrs:
                        attr_def = attribute_definitions[attr]
                        print(f"   ✅ {attr_def['name']} - {attr_def['description']}")

def apply_configuration(b):
    """Handler apply konfigurasi"""
    global crawling_config
    
    # Update crawling_config
    crawling_config['mode'] = mode_selector.value
    crawling_config['include_replies'] = include_replies_checkbox.value
    crawling_config['max_comments'] = max_comments_slider.value
    
    for attr, widget in attr_widgets.items():
        crawling_config['attributes'][attr] = widget.value
    
    with config_output:
        clear_output(wait=True)
        print("✅ KONFIGURASI BERHASIL DITERAPKAN!")
        print("=" * 60)
        
        # General info
        print(f"🎯 Mode: {mode_presets[crawling_config['mode']]['name']}")
        print(f"💬 Include Replies: {'✅ Ya' if crawling_config['include_replies'] else '❌ Tidak'}")
        print(f"📊 Max Comments: {crawling_config['max_comments']}")
        
        # Active attributes by category
        active_attrs = [attr for attr, active in crawling_config['attributes'].items() if active]
        total_active = len(active_attrs)
        
        print(f"\n📊 TOTAL ATRIBUT AKTIF: {total_active}/20")
        
        for category in ['Komentar Utama', 'Metadata Video', 'Data Balasan', 'Analisis Tambahan']:
            category_attrs = [attr for attr in active_attrs 
                            if attribute_definitions[attr]['category'] == category]
            if category_attrs:
                print(f"\n📋 {category} ({len(category_attrs)} atribut):")
                for attr in category_attrs:
                    attr_def = attribute_definitions[attr]
                    print(f"   ✅ {attr_def['name']}")
                    print(f"      📝 {attr_def['description']}")
                    print(f"      🎯 Kegunaan: {attr_def['research_use']}")
                    print(f"      💰 API Cost: {attr_def['api_cost']}")
                    print()
        
        # Estimate API usage
        api_cost_estimate = 'Low'
        high_cost_attrs = [attr for attr in active_attrs 
                          if attribute_definitions[attr]['api_cost'] == 'High']
        medium_cost_attrs = [attr for attr in active_attrs 
                           if attribute_definitions[attr]['api_cost'] == 'Medium']
        
        if high_cost_attrs:
            api_cost_estimate = 'High'
        elif medium_cost_attrs:
            api_cost_estimate = 'Medium'
        
        print("💰 ESTIMASI PENGGUNAAN API:")
        print(f"   📊 Cost Level: {api_cost_estimate}")
        if crawling_config['include_replies']:
            print("   ⚠️ Include replies akan meningkatkan penggunaan API secara signifikan")
        print(f"   📈 Estimasi API calls per video: {crawling_config['max_comments'] // 50 + 1}")
        
        print("\n🎉 Konfigurasi siap! Lanjutkan ke tahap crawling.")
    
    config_status.value = f"<b>Status:</b> ✅ Konfigurasi aktif - {total_active} atribut dipilih"

def reset_configuration(b):
    """Handler reset konfigurasi"""
    mode_selector.value = 'fast'
    update_config_from_mode('fast')
    
    with config_output:
        clear_output(wait=True)
        print("🔄 Konfigurasi direset ke Mode Cepat default")
    
    config_status.value = "<b>Status:</b> 🔄 Konfigurasi direset ke default"

def save_preset(b):
    """Handler simpan preset"""
    with config_output:
        clear_output(wait=True)
        print("💾 PRESET KONFIGURASI TERSIMPAN!")
        print("=" * 40)
        print("📋 Konfigurasi saat ini:")
        print(f"   Mode: {crawling_config['mode']}")
        print(f"   Include Replies: {crawling_config['include_replies']}")
        print(f"   Max Comments: {crawling_config['max_comments']}")
        
        active_attrs = [attr for attr, active in crawling_config['attributes'].items() if active]
        print(f"   Active Attributes: {len(active_attrs)}")
        
        print("\n💡 Preset tersimpan di variabel global 'crawling_config'")
        print("🎯 Siap digunakan untuk crawling!")

# ===== EVENT BINDING =====
mode_selector.observe(on_mode_change, names='value')
apply_config_btn.on_click(apply_configuration)
reset_config_btn.on_click(reset_configuration)
save_preset_btn.on_click(save_preset)

# ===== LAYOUT CONSTRUCTION =====

# Mode and general settings
mode_section = widgets.VBox([
    widgets.HTML("<h4>🎯 Pilih Mode Crawling</h4>"),
    mode_selector,
    widgets.HTML("<h4>⚙️ Pengaturan Umum</h4>"),
    widgets.HBox([include_replies_checkbox, widgets.HTML("<span style='margin-left: 20px;'></span>")]),
    max_comments_slider
], layout=widgets.Layout(margin='10px 0px'))

# Attribute selection sections
comment_section = widgets.VBox([
    widgets.HTML("<h4>🎯 Data Komentar Utama</h4>"),
    widgets.HBox(comment_main_widgets[:3]),
    widgets.HBox(comment_main_widgets[3:])
])

video_section = widgets.VBox([
    widgets.HTML("<h4>📺 Metadata Video</h4>"),
    widgets.HBox(video_meta_widgets[:3]),
    widgets.HBox(video_meta_widgets[3:])
])

reply_section = widgets.VBox([
    widgets.HTML("<h4>💬 Data Balasan</h4>"),
    widgets.HBox(reply_widgets)
])

analysis_section = widgets.VBox([
    widgets.HTML("<h4>🔍 Analisis Tambahan</h4>"),
    widgets.HBox(analysis_widgets[:3]),
    widgets.HBox(analysis_widgets[3:])
])

# Action buttons
action_buttons = widgets.HBox([
    apply_config_btn,
    reset_config_btn,
    save_preset_btn
], layout=widgets.Layout(justify_content='space-between', width='100%'))

# Main container
main_config_container = widgets.VBox([
    widgets.HTML("<h3 style='color: #1976D2; margin-bottom: 20px;'>⚙️ Konfigurasi Atribut Crawling</h3>"),
    widgets.HTML("<div style='color: #666; font-size: 12px; margin: -15px 0 20px 0; font-style: italic;'>Developed by Ferdian Bangkit Wijaya, Universitas Sultan Ageng Tirtayasa</div>"),
    mode_section,
    widgets.HTML("<hr style='margin: 20px 0;'>"),
    comment_section,
    video_section,
    reply_section,
    analysis_section,
    widgets.HTML("<hr style='margin: 20px 0;'>"),
    action_buttons,
    config_status,
    widgets.HTML("<h4 style='margin: 15px 0 10px 0;'>📋 Output Konfigurasi</h4>"),
    config_output
], layout=widgets.Layout(
    border='2px solid #1976D2',
    padding='25px',
    margin='10px',
    width='100%',
    max_width='1200px',
    background_color='#fafafa'
))

# Display the widget
display(main_config_container)

# Initialize with default mode
on_mode_change({'name': 'value', 'new': 'fast'})

print("✅ Widget Konfigurasi Atribut Crawling berhasil dimuat!")
print("\n🎯 FITUR YANG TERSEDIA:")
print("   • 🚀 3 Mode preset (Fast, Complete, Custom)")
print("   • ⚙️ Konfigurasi atribut detail per kategori")
print("   • 💬 Opsi include/exclude replies")
print("   • 📊 Slider max comments (10-500)")
print("   • 💰 Estimasi penggunaan API otomatis")
print("   • 📋 Penjelasan detail setiap atribut")
print("   • 💾 Save/reset konfigurasi")
print("\n💡 CARA PENGGUNAAN:")
print("1. 🎯 Pilih mode sesuai kebutuhan penelitian")
print("2. ⚙️ Sesuaikan pengaturan umum jika perlu")
print("3. ✅ Centang/uncheck atribut sesuai keinginan (untuk Custom mode)")
print("4. ✅ Klik 'Terapkan Konfigurasi' untuk menyimpan")
print("5. 🎉 Lanjutkan ke tahap crawling komentar!")

⚙️ Memuat Konfigurasi Atribut Crawling...


VBox(children=(HTML(value="<h3 style='color: #1976D2; margin-bottom: 20px;'>⚙️ Konfigurasi Atribut Crawling</h…

✅ Widget Konfigurasi Atribut Crawling berhasil dimuat!

🎯 FITUR YANG TERSEDIA:
   • 🚀 3 Mode preset (Fast, Complete, Custom)
   • ⚙️ Konfigurasi atribut detail per kategori
   • 💬 Opsi include/exclude replies
   • 📊 Slider max comments (10-500)
   • 💰 Estimasi penggunaan API otomatis
   • 📋 Penjelasan detail setiap atribut
   • 💾 Save/reset konfigurasi

💡 CARA PENGGUNAAN:
1. 🎯 Pilih mode sesuai kebutuhan penelitian
2. ⚙️ Sesuaikan pengaturan umum jika perlu
3. ✅ Centang/uncheck atribut sesuai keinginan (untuk Custom mode)
4. ✅ Klik 'Terapkan Konfigurasi' untuk menyimpan
5. 🎉 Lanjutkan ke tahap crawling komentar!


## 7. Proses Crawling Komentar YouTube

Tahap utama ekstraksi komentar dari video YouTube berdasarkan konfigurasi yang telah dipilih.

### 🎯 **Proses Crawling:**

#### 📥 **Input yang Diperlukan:**
- ✅ **Video URLs**: Dari bagian 3 (URLs Manager)
- ✅ **API Key**: Dari bagian 4 (Konfigurasi API)
- ✅ **Video Info**: Dari bagian 5 (Informasi Video)
- ✅ **Atribut Config**: Dari bagian 6 (Konfigurasi Atribut)

#### 🔄 **Alur Crawling:**
1. **Validasi Prerequisites**: Cek semua komponen yang diperlukan
2. **Iterasi Videos**: Loop melalui setiap video yang valid
3. **Extract Comments**: Ambil komentar sesuai konfigurasi
4. **Process Replies**: Ekstrak balasan jika diaktifkan
5. **Apply Attributes**: Proses atribut tambahan yang dipilih
6. **Real-time Progress**: Monitor progress dan error handling

### 📊 **Fitur Crawling:**

#### ⚡ **Smart Crawling:**
- **Batch Processing**: Optimasi API calls dengan batch requests
- **Error Recovery**: Auto-retry untuk request yang gagal
- **Rate Limiting**: Respect YouTube API rate limits
- **Progress Tracking**: Real-time progress bar dan statistik

#### 🛡️ **Error Handling:**
- **API Quota Management**: Monitor dan alert quota usage
- **Video Availability**: Handle private/deleted videos
- **Network Issues**: Retry mechanism untuk koneksi gagal
- **Data Validation**: Validasi data yang diekstrak

#### 📈 **Real-time Monitoring:**
- **Progress Bar**: Visual progress untuk setiap video
- **Statistics**: Live count komentar, replies, errors
- **Time Estimates**: ETA completion berdasarkan speed
- **API Usage**: Monitor penggunaan quota real-time

### 🎮 **Kontrol Crawling:**

#### ▶️ **Start Crawling:**
- Mulai proses crawling dengan konfigurasi yang dipilih
- Option untuk melanjutkan crawling yang terputus
- Pause/Resume functionality

#### ⏸️ **Pause/Stop:**
- Pause sementara untuk menghemat quota
- Stop total dengan opsi save progress
- Emergency stop untuk situasi darurat

#### 🔄 **Resume/Retry:**
- Lanjutkan crawling dari titik yang terputus
- Retry video yang gagal secara selektif
- Skip video bermasalah dan lanjutkan

### 📊 **Output Real-time:**

#### 📈 **Live Statistics:**
```
🎬 Video Progress: 3/10 (30%)
💬 Comments Extracted: 2,847
🔄 Replies Processed: 1,205
⏱️ Time Elapsed: 05:32
🕐 ETA Completion: 12:45
💰 API Calls Used: 127/10,000
```

#### 🎯 **Per-Video Status:**
```
✅ Video 1: "Tutorial Python" - 256 comments (Done)
⏳ Video 2: "Data Science Tips" - 198/300 comments (66%)
⏸️ Video 3: "Machine Learning" - Paused at 45 comments
❌ Video 4: "Private Video" - Skipped (Private)
⏳ Video 5: "AI Tutorial" - Starting...
```

### 🔍 **Quality Control:**

#### ✅ **Data Validation:**
- **Text Encoding**: Proper UTF-8 handling
- **Date Formats**: Consistent timestamp formatting
- **Numeric Data**: Validation untuk likes, views, counts
- **Missing Data**: Graceful handling data yang hilang

#### 🚨 **Error Detection:**
- **Spam Comments**: Basic spam detection
- **Duplicate Removal**: Auto-deduplication
- **Invalid Characters**: Clean problematic characters
- **API Response Validation**: Verify response integrity

### ⚠️ **Considerations:**

#### 💰 **API Quota Management:**
- **Daily Limits**: YouTube API memiliki quota harian 10,000 units
- **Cost per Request**: Comment extraction ~1-5 units per request
- **Optimization**: Batch requests untuk efisiensi
- **Monitoring**: Real-time quota usage tracking

#### 🕐 **Time Considerations:**
- **Processing Speed**: ~50-100 comments per menit
- **Video Length Impact**: Video populer butuh waktu lebih lama
- **Network Speed**: Kecepatan internet mempengaruhi speed
- **API Response Time**: Server YouTube bervariasi

#### 📊 **Data Size:**
- **Memory Usage**: Large datasets butuh RAM cukup
- **Storage Space**: Excel files bisa mencapai 50MB+
- **Processing Time**: Post-processing data membutuhkan waktu
- **Export Format**: Multiple format output (Excel, CSV, JSON)

### 🎯 **Best Practices:**

1. **🧪 Test Kecil**: Mulai dengan 1-2 video untuk testing
2. **⏰ Monitor Quota**: Pantau penggunaan API secara berkala
3. **💾 Save Progress**: Simpan progress secara berkala
4. **🔄 Batch Processing**: Proses dalam batch untuk efisiensi
5. **📊 Validate Data**: Selalu cek kualitas data hasil crawling

### 🚀 **Ready to Crawl:**

Pastikan semua prerequisite telah dipenuhi:
- ✅ Video URLs sudah divalidasi
- ✅ API Key sudah ditest dan valid
- ✅ Konfigurasi atribut sudah diterapkan
- ✅ Koneksi internet stabil

---

**Selanjutnya**: Gunakan widget crawling di cell berikutnya untuk memulai proses ekstraksi komentar.

In [15]:
# =========================================================================
# 7. YOUTUBE COMMENTS CRAWLER - MAIN CRAWLING ENGINE
# =========================================================================

import ipywidgets as widgets
from IPython.display import display, clear_output, HTML
import pandas as pd
import time
import threading
from datetime import datetime, timedelta
import json
import re
from textblob import TextBlob

print("🚀 Memuat YouTube Comments Crawler...")

# ===== GLOBAL VARIABLES =====
crawl_data = {
    'results': [],
    'stats': {
        'total_videos': 0,
        'completed_videos': 0,
        'total_comments': 0,
        'total_replies': 0,
        'api_calls_used': 0,
        'errors': 0,
        'start_time': None,
        'current_video': None
    },
    'status': 'ready',  # ready, running, paused, completed, error
    'current_video_index': 0,
    'thread': None
}

# ===== PREREQUISITE VALIDATION =====
def validate_prerequisites():
    """Validasi semua prerequisite sebelum crawling"""
    errors = []
    warnings = []
    
    # Check API Key
    if not globals().get('API_KEY') or API_KEY == "":
        errors.append("❌ API Key belum diset. Jalankan bagian 4 terlebih dahulu.")
    
    # Check video data
    if not globals().get('video_data_list') or not video_data_list:
        errors.append("❌ Data video tidak ditemukan. Jalankan bagian 3 terlebih dahulu.")
    else:
        valid_videos = [v for v in video_data_list if v.get('status') == 'valid']
        if not valid_videos:
            errors.append("❌ Tidak ada video yang valid untuk di-crawl.")
        elif len(valid_videos) > 50:
            warnings.append(f"⚠️ {len(valid_videos)} video akan di-crawl. Ini akan menggunakan banyak API quota.")
    
    # Check crawling config
    if not globals().get('crawling_config'):
        errors.append("❌ Konfigurasi crawling belum diset. Jalankan bagian 6 terlebih dahulu.")
    else:
        active_attrs = sum(1 for v in crawling_config['attributes'].values() if v)
        if active_attrs == 0:
            errors.append("❌ Tidak ada atribut yang dipilih untuk di-crawl.")
    
    # Check API service
    try:
        if API_KEY:
            youtube_service = create_youtube_service(API_KEY)
            if not youtube_service:
                errors.append("❌ Tidak dapat membuat YouTube service. Periksa API Key.")
    except Exception as e:
        errors.append(f"❌ Error testing API: {str(e)}")
    
    return errors, warnings

# ===== CRAWLING FUNCTIONS =====
def extract_comments_from_video(video_id, max_comments=100, include_replies=False):
    """Ekstrak komentar dari satu video"""
    try:
        youtube = create_youtube_service(API_KEY)
        if not youtube:
            return [], 0
        
        comments = []
        api_calls = 0
        next_page_token = None
        
        while len(comments) < max_comments:
            try:
                # Request comments
                request = youtube.commentThreads().list(
                    part='snippet,replies',
                    videoId=video_id,
                    maxResults=min(100, max_comments - len(comments)),
                    order='relevance',
                    pageToken=next_page_token,
                    textFormat='plainText'
                )
                
                response = request.execute()
                api_calls += 1
                
                if not response.get('items'):
                    break
                
                for item in response['items']:
                    comment_data = process_comment_item(item, include_replies)
                    comments.append(comment_data)
                    
                    # Update real-time stats
                    crawl_data['stats']['total_comments'] += 1
                    if include_replies and 'replies' in comment_data:
                        crawl_data['stats']['total_replies'] += len(comment_data['replies'])
                
                next_page_token = response.get('nextPageToken')
                if not next_page_token:
                    break
                    
                # Small delay to respect rate limits
                time.sleep(0.1)
                
            except Exception as e:
                print(f"⚠️ Error getting comments: {e}")
                break
        
        return comments, api_calls
        
    except Exception as e:
        print(f"❌ Error in extract_comments_from_video: {e}")
        return [], 0

def process_comment_item(item, include_replies=False):
    """Process single comment item dengan atribut yang dipilih"""
    snippet = item['snippet']['topLevelComment']['snippet']
    comment_data = {}
    
    # Get selected attributes from config
    attrs = crawling_config['attributes']
    
    # Basic comment data
    if attrs.get('comment_text'):
        comment_data['comment_text'] = clean_text_safe(snippet.get('textDisplay', ''))
    
    if attrs.get('author_name'):
        comment_data['author_name'] = clean_text_safe(snippet.get('authorDisplayName', ''))
    
    if attrs.get('publish_date'):
        comment_data['publish_date'] = snippet.get('publishedAt', '')
    
    if attrs.get('like_count'):
        comment_data['like_count'] = snippet.get('likeCount', 0)
    
    if attrs.get('reply_count'):
        comment_data['reply_count'] = item['snippet'].get('totalReplyCount', 0)
    
    # Additional analysis attributes
    if attrs.get('word_count') and comment_data.get('comment_text'):
        comment_data['word_count'] = len(comment_data['comment_text'].split())
    
    if attrs.get('has_links') and comment_data.get('comment_text'):
        comment_data['has_links'] = bool(re.search(r'http[s]?://|www\.', comment_data['comment_text']))
    
    if attrs.get('has_mentions') and comment_data.get('comment_text'):
        comment_data['has_mentions'] = bool(re.search(r'@\w+', comment_data['comment_text']))
    
    if attrs.get('sentiment_score') and comment_data.get('comment_text'):
        try:
            blob = TextBlob(comment_data['comment_text'])
            comment_data['sentiment_score'] = blob.sentiment.polarity
            comment_data['sentiment_label'] = 'positive' if blob.sentiment.polarity > 0.1 else 'negative' if blob.sentiment.polarity < -0.1 else 'neutral'
        except:
            comment_data['sentiment_score'] = 0
            comment_data['sentiment_label'] = 'neutral'
    
    if attrs.get('language_detection') and comment_data.get('comment_text'):
        try:
            blob = TextBlob(comment_data['comment_text'])
            comment_data['detected_language'] = blob.detect_language()
        except:
            comment_data['detected_language'] = 'unknown'
    
    # Process replies if enabled
    if include_replies and 'replies' in item and attrs.get('reply_text'):
        replies = []
        for reply_item in item['replies']['comments']:
            reply_snippet = reply_item['snippet']
            reply_data = {}
            
            if attrs.get('reply_text'):
                reply_data['reply_text'] = clean_text_safe(reply_snippet.get('textDisplay', ''))
            if attrs.get('reply_author'):
                reply_data['reply_author'] = clean_text_safe(reply_snippet.get('authorDisplayName', ''))
            if attrs.get('reply_date'):
                reply_data['reply_date'] = reply_snippet.get('publishedAt', '')
            if attrs.get('reply_likes'):
                reply_data['reply_likes'] = reply_snippet.get('likeCount', 0)
            
            replies.append(reply_data)
        
        comment_data['replies'] = replies
    
    return comment_data

def clean_text_safe(text):
    """Safe text cleaning"""
    if not text:
        return ""
    try:
        # Remove excessive whitespace and newlines
        cleaned = re.sub(r'\s+', ' ', str(text)).strip()
        return cleaned
    except:
        return str(text)

def crawl_single_video(video_info, video_index, total_videos):
    """Crawl single video dengan progress update"""
    try:
        video_id = video_info['video_id']
        crawl_data['stats']['current_video'] = video_info.get('fetched_info', {}).get('title', video_id)
        
        # Extract comments
        max_comments = crawling_config['max_comments']
        include_replies = crawling_config['include_replies']
        
        comments, api_calls = extract_comments_from_video(video_id, max_comments, include_replies)
        crawl_data['stats']['api_calls_used'] += api_calls
        
        # Create video result
        video_result = {
            'video_id': video_id,
            'comments': comments,
            'comment_count': len(comments),
            'crawl_timestamp': datetime.now().isoformat()
        }
        
        # Add video metadata if selected
        if video_info.get('fetched_info'):
            video_meta = video_info['fetched_info']
            attrs = crawling_config['attributes']
            
            if attrs.get('video_title'):
                video_result['video_title'] = video_meta.get('title', '')
            if attrs.get('video_url'):
                video_result['video_url'] = video_meta.get('video_url', f'https://www.youtube.com/watch?v={video_id}')
            if attrs.get('channel_name'):
                video_result['channel_name'] = video_meta.get('channel_title', '')
            if attrs.get('channel_url'):
                video_result['channel_url'] = video_meta.get('channel_url', '')
            if attrs.get('video_duration'):
                video_result['video_duration'] = video_meta.get('duration_formatted', '')
            if attrs.get('view_count'):
                video_result['view_count'] = video_meta.get('view_count', 0)
            if attrs.get('video_category'):
                video_result['video_category'] = video_meta.get('category_name', '')
        
        crawl_data['results'].append(video_result)
        crawl_data['stats']['completed_videos'] += 1
        
        return True, len(comments)
        
    except Exception as e:
        crawl_data['stats']['errors'] += 1
        print(f"❌ Error crawling video {video_id}: {e}")
        return False, 0

def crawling_worker():
    """Main crawling worker thread"""
    try:
        crawl_data['status'] = 'running'
        crawl_data['stats']['start_time'] = datetime.now()
        
        # Get valid videos
        valid_videos = [v for v in video_data_list if v.get('status') == 'valid']
        crawl_data['stats']['total_videos'] = len(valid_videos)
        
        for i, video_info in enumerate(valid_videos):
            if crawl_data['status'] != 'running':
                break
            
            crawl_data['current_video_index'] = i
            success, comment_count = crawl_single_video(video_info, i + 1, len(valid_videos))
            
            # Update progress display
            update_progress_display()
            
            # Small delay between videos
            time.sleep(0.5)
        
        crawl_data['status'] = 'completed'
        update_progress_display()
        
    except Exception as e:
        crawl_data['status'] = 'error'
        crawl_data['error_message'] = str(e)
        print(f"❌ Crawling error: {e}")

# ===== WIDGET CREATION =====

# Status and control widgets
crawl_status = widgets.HTML(value="""<b>Status:</b> Siap untuk crawling<br/>
<span style='color: #666; font-size: 11px;'>
💡 <b>Tips:</b> Pastikan semua konfigurasi sudah sesuai sebelum memulai crawling
</span>""")
progress_bar = widgets.IntProgress(value=0, min=0, max=100, description='Progress:', bar_style='info')

# Control buttons
start_crawl_btn = widgets.Button(
    description='🚀 Start Crawling',
    button_style='success',
    layout=widgets.Layout(width='150px', height='40px')
)

pause_crawl_btn = widgets.Button(
    description='⏸️ Pause',
    button_style='warning',
    layout=widgets.Layout(width='100px', height='40px'),
    disabled=True
)

stop_crawl_btn = widgets.Button(
    description='⏹️ Stop',
    button_style='danger',
    layout=widgets.Layout(width='100px', height='40px'),
    disabled=True
)

export_data_btn = widgets.Button(
    description='💾 Export (3 Formats)',
    button_style='info',
    layout=widgets.Layout(width='150px', height='40px'),
    disabled=True
)

# Statistics widgets
video_progress = widgets.HTML(value="<b>Video:</b> 0/0")
comment_stats = widgets.HTML(value="<b>Comments:</b> 0")
api_usage = widgets.HTML(value="<b>API Calls:</b> 0")
time_elapsed = widgets.HTML(value="<b>Time:</b> 00:00")

# Output area
crawl_output = widgets.Output(layout=widgets.Layout(height='300px', border='1px solid #ddd', padding='10px'))

# ===== EVENT HANDLERS =====

def start_crawling(b):
    """Start crawling process"""
    # Validate prerequisites
    errors, warnings = validate_prerequisites()
    
    with crawl_output:
        clear_output(wait=True)
        
        if errors:
            print("❌ CRAWLING TIDAK DAPAT DIMULAI!")
            print("=" * 50)
            for error in errors:
                print(error)
            return
        
        if warnings:
            print("⚠️ PERINGATAN:")
            for warning in warnings:
                print(warning)
            print("\n")
        
        print("🚀 MEMULAI CRAWLING...")
        print("=" * 50)
        
        # Reset stats
        crawl_data['results'] = []
        crawl_data['stats'] = {
            'total_videos': 0, 'completed_videos': 0, 'total_comments': 0,
            'total_replies': 0, 'api_calls_used': 0, 'errors': 0,
            'start_time': None, 'current_video': None
        }
        crawl_data['current_video_index'] = 0
        
        # Update UI
        start_crawl_btn.disabled = True
        pause_crawl_btn.disabled = False
        stop_crawl_btn.disabled = False
        export_data_btn.disabled = True
        
        # Show current config
        print("⚙️ KONFIGURASI CRAWLING:")
        print(f"   Mode: {crawling_config['mode']}")
        print(f"   Max Comments: {crawling_config['max_comments']}")
        print(f"   Include Replies: {crawling_config['include_replies']}")
        
        active_attrs = [k for k, v in crawling_config['attributes'].items() if v]
        print(f"   Active Attributes: {len(active_attrs)}")
        
        print("\n🎬 MEMPROSES VIDEO...")
        
        # Start crawling thread
        crawl_data['thread'] = threading.Thread(target=crawling_worker)
        crawl_data['thread'].start()

def pause_crawling(b):
    """Pause/Resume crawling"""
    if crawl_data['status'] == 'running':
        crawl_data['status'] = 'paused'
        pause_crawl_btn.description = '▶️ Resume'
        crawl_status.value = """<b>Status:</b> ⏸️ Crawling dijeda<br/>
        <span style='color: #FF9800; font-size: 12px;'>
        💡 Klik "▶️ Resume" untuk melanjutkan, atau "⏹️ Stop" untuk mengakhiri
        </span>"""
    elif crawl_data['status'] == 'paused':
        crawl_data['status'] = 'running'
        pause_crawl_btn.description = '⏸️ Pause'
        crawl_status.value = "<b>Status:</b> 🔄 Crawling berlanjut..."

def stop_crawling(b):
    """Stop crawling completely"""
    crawl_data['status'] = 'stopped'
    
    # Reset UI
    start_crawl_btn.disabled = False
    pause_crawl_btn.disabled = True
    stop_crawl_btn.disabled = True
    
    if crawl_data['stats']['completed_videos'] > 0:
        export_data_btn.disabled = False
        crawl_status.value = """<b>Status:</b> ⏹️ Crawling dihentikan<br/>
        <span style='color: #FF9800; font-size: 12px;'>
        💡 <b>Ada data parsial yang bisa di-export!</b><br/>
        • Klik "💾 Export (3 Formats)" untuk menyimpan progress<br/>
        • Untuk crawling ulang: Sesuaikan konfigurasi di <b>Cell 14</b> lalu start lagi<br/>
        • Untuk ganti video: Kembali ke <b>Cell 7 (URLs Manager)</b>
        </span>"""
    else:
        crawl_status.value = """<b>Status:</b> ⏹️ Crawling dihentikan<br/>
        <span style='color: #666; font-size: 12px;'>
        💡 Tidak ada data untuk di-export. Untuk mengulang:<br/>
        • Periksa konfigurasi di <b>Cell 14 (Konfigurasi Atribut)</b><br/>
        • Atau ganti video di <b>Cell 7 (URLs Manager)</b>
        </span>"""
    
    with crawl_output:
        print(f"\n⏹️ CRAWLING DIHENTIKAN!")
        print(f"📊 Progress: {crawl_data['stats']['completed_videos']}/{crawl_data['stats']['total_videos']} video")
        print(f"💬 Comments: {crawl_data['stats']['total_comments']}")

def export_crawl_data(b):
    """Export crawling results to Excel, JSON, and TXT formats"""
    if not crawl_data['results']:
        with crawl_output:
            print("❌ Tidak ada data untuk di-export!")
        return
    
    try:
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        base_filename = f"youtube_comments_crawl_{timestamp}"
        
        with crawl_output:
            print(f"\n💾 MENGEKSPOR DATA KE 3 FORMAT...")
            print("=" * 60)
        
        # ===== 1. EXCEL FORMAT (.xlsx) =====
        excel_data = []
        
        for video_result in crawl_data['results']:
            video_base = {
                'video_id': video_result['video_id'],
                'video_title': video_result.get('video_title', ''),
                'video_url': video_result.get('video_url', ''),
                'channel_name': video_result.get('channel_name', ''),
                'channel_url': video_result.get('channel_url', ''),
                'video_duration': video_result.get('video_duration', ''),
                'view_count': video_result.get('view_count', 0),
                'video_category': video_result.get('video_category', ''),
                'crawl_timestamp': video_result['crawl_timestamp']
            }
            
            for comment in video_result['comments']:
                row = {**video_base, **comment}
                
                # Handle replies
                if 'replies' in comment and comment['replies']:
                    for reply in comment['replies']:
                        reply_row = {**row, **reply}
                        reply_row['is_reply'] = True
                        excel_data.append(reply_row)
                else:
                    row['is_reply'] = False
                    excel_data.append(row)
        
        # Export to Excel
        df = pd.DataFrame(excel_data)
        excel_filename = f"{base_filename}.xlsx"
        df.to_excel(excel_filename, index=False, engine='openpyxl')
        
        with crawl_output:
            print(f"✅ EXCEL (.xlsx): {excel_filename}")
            print(f"   📊 Rows: {len(df):,} | Columns: {len(df.columns)}")
        
        # ===== 2. JSON FORMAT (.json) =====
        # Prepare structured JSON data
        json_data = {
            'export_info': {
                'timestamp': timestamp,
                'total_videos': len(crawl_data['results']),
                'total_comments': crawl_data['stats']['total_comments'],
                'total_replies': crawl_data['stats']['total_replies'],
                'crawling_config': crawling_config,
                'api_calls_used': crawl_data['stats']['api_calls_used']
            },
            'videos': []
        }
        
        for video_result in crawl_data['results']:
            video_data = {
                'video_info': {
                    'video_id': video_result['video_id'],
                    'video_title': video_result.get('video_title', ''),
                    'video_url': video_result.get('video_url', ''),
                    'channel_name': video_result.get('channel_name', ''),
                    'channel_url': video_result.get('channel_url', ''),
                    'video_duration': video_result.get('video_duration', ''),
                    'view_count': video_result.get('view_count', 0),
                    'video_category': video_result.get('video_category', ''),
                    'crawl_timestamp': video_result['crawl_timestamp'],
                    'comment_count': video_result['comment_count']
                },
                'comments': video_result['comments']
            }
            json_data['videos'].append(video_data)
        
        # Export to JSON
        json_filename = f"{base_filename}.json"
        with open(json_filename, 'w', encoding='utf-8') as f:
            json.dump(json_data, f, ensure_ascii=False, indent=2, default=str)
        
        with crawl_output:
            print(f"✅ JSON (.json): {json_filename}")
            print(f"   🎬 Videos: {len(json_data['videos'])} | Structure: Hierarchical")
        
        # ===== 3. TEXT FORMAT (.txt) =====
        txt_filename = f"{base_filename}.txt"
        
        with open(txt_filename, 'w', encoding='utf-8') as f:
            # Header
            f.write("=" * 80 + "\n")
            f.write("YOUTUBE COMMENTS CRAWLING RESULTS\n")
            f.write("=" * 80 + "\n")
            f.write(f"Export Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
            f.write(f"Total Videos: {len(crawl_data['results'])}\n")
            f.write(f"Total Comments: {crawl_data['stats']['total_comments']:,}\n")
            f.write(f"Total Replies: {crawl_data['stats']['total_replies']:,}\n")
            f.write(f"API Calls Used: {crawl_data['stats']['api_calls_used']:,}\n")
            f.write(f"Crawling Mode: {crawling_config['mode']}\n")
            f.write("=" * 80 + "\n\n")
            
            # Data for each video
            for i, video_result in enumerate(crawl_data['results'], 1):
                f.write(f"VIDEO {i}: {video_result.get('video_title', 'Unknown Title')}\n")
                f.write("-" * 60 + "\n")
                f.write(f"Video ID: {video_result['video_id']}\n")
                f.write(f"Video URL: {video_result.get('video_url', 'N/A')}\n")
                f.write(f"Channel: {video_result.get('channel_name', 'Unknown')}\n")
                f.write(f"Channel URL: {video_result.get('channel_url', 'N/A')}\n")
                f.write(f"Duration: {video_result.get('video_duration', 'Unknown')}\n")
                f.write(f"Views: {video_result.get('view_count', 0):,}\n")
                f.write(f"Category: {video_result.get('video_category', 'Unknown')}\n")
                f.write(f"Comments Extracted: {len(video_result['comments'])}\n")
                f.write(f"Crawl Time: {video_result['crawl_timestamp']}\n")
                f.write("\n")
                
                # Comments
                f.write("COMMENTS:\n")
                f.write("-" * 40 + "\n")
                
                for j, comment in enumerate(video_result['comments'], 1):
                    f.write(f"\nComment #{j}:\n")
                    f.write(f"Author: {comment.get('author_name', 'Unknown')}\n")
                    f.write(f"Date: {comment.get('publish_date', 'Unknown')}\n")
                    f.write(f"Likes: {comment.get('like_count', 0)}\n")
                    f.write(f"Replies: {comment.get('reply_count', 0)}\n")
                    
                    if comment.get('sentiment_label'):
                        f.write(f"Sentiment: {comment['sentiment_label']} ({comment.get('sentiment_score', 0):.2f})\n")
                    
                    f.write(f"Text: {comment.get('comment_text', 'No text')}\n")
                    
                    # Replies if available
                    if 'replies' in comment and comment['replies']:
                        f.write(f"\n  REPLIES ({len(comment['replies'])}):\n")
                        for k, reply in enumerate(comment['replies'], 1):
                            f.write(f"    Reply #{k}:\n")
                            f.write(f"    Author: {reply.get('reply_author', 'Unknown')}\n")
                            f.write(f"    Date: {reply.get('reply_date', 'Unknown')}\n")
                            f.write(f"    Likes: {reply.get('reply_likes', 0)}\n")
                            f.write(f"    Text: {reply.get('reply_text', 'No text')}\n\n")
                    
                    f.write("-" * 40 + "\n")
                
                f.write("\n" + "=" * 80 + "\n\n")
        
        with crawl_output:
            print(f"✅ TEXT (.txt): {txt_filename}")
            print(f"   📝 Format: Human-readable | Full content with structure")
        
        # ===== SUMMARY =====
        with crawl_output:
            print("\n" + "=" * 60)
            print("🎉 EXPORT BERHASIL - 3 FORMAT FILE!")
            print("=" * 60)
            print(f"📁 Base filename: {base_filename}")
            print(f"📊 Total data points: {len(excel_data):,}")
            print("\n📋 FILES CREATED:")
            print(f"   📗 {excel_filename} - For statistical analysis (Excel/Pandas)")
            print(f"   📙 {json_filename} - For programming/API integration")
            print(f"   📝 {txt_filename} - For manual reading/review")
            print("\n💡 KEGUNAAN SETIAP FORMAT:")
            print("   📗 XLSX: Import ke Excel, SPSS, R untuk analisis statistik")
            print("   📙 JSON: Programming, database import, API integration")
            print("   📝 TXT: Manual review, qualitative analysis, documentation")
            print("\n✅ Semua file siap untuk analisis dan penelitian!")
            
    except Exception as e:
        with crawl_output:
            print(f"❌ Error exporting data: {e}")
            print("🔧 Periksa disk space dan permissions folder")

def update_progress_display():
    """Update progress display"""
    stats = crawl_data['stats']
    
    # Progress bar
    if stats['total_videos'] > 0:
        progress_percentage = int((stats['completed_videos'] / stats['total_videos']) * 100)
        progress_bar.value = progress_percentage
    
    # Stats
    video_progress.value = f"<b>Video:</b> {stats['completed_videos']}/{stats['total_videos']}"
    comment_stats.value = f"<b>Comments:</b> {stats['total_comments']} (+{stats['total_replies']} replies)"
    api_usage.value = f"<b>API Calls:</b> {stats['api_calls_used']}"
    
    # Time elapsed
    if stats['start_time']:
        elapsed = datetime.now() - stats['start_time']
        elapsed_str = str(elapsed).split('.')[0]  # Remove microseconds
        time_elapsed.value = f"<b>Time:</b> {elapsed_str}"
    
    # Status
    if crawl_data['status'] == 'running':
        current_video = stats.get('current_video', 'Unknown')
        crawl_status.value = f"<b>Status:</b> 🔄 Processing: {current_video}"
    elif crawl_data['status'] == 'completed':
        crawl_status.value = """<b>Status:</b> ✅ Crawling selesai!<br/>
        <span style='color: #2196F3; font-size: 12px;'>
        💡 <b>Langkah selanjutnya:</b><br/>
        • Klik "💾 Export (3 Formats)" untuk menyimpan data<br/>
        • Untuk mengubah konfigurasi mode: Kembali ke <b>Cell 14 (Konfigurasi Atribut)</b><br/>
        • Untuk crawling video lain: Kembali ke <b>Cell 7 (URLs Manager)</b><br/>
        • Untuk analisis data: Gunakan file yang sudah di-export
        </span>"""
        start_crawl_btn.disabled = False
        pause_crawl_btn.disabled = True
        stop_crawl_btn.disabled = True
        export_data_btn.disabled = False

# ===== EVENT BINDING =====
start_crawl_btn.on_click(start_crawling)
pause_crawl_btn.on_click(pause_crawling)
stop_crawl_btn.on_click(stop_crawling)
export_data_btn.on_click(export_crawl_data)

# ===== LAYOUT CONSTRUCTION =====

# Control section
control_section = widgets.VBox([
    widgets.HTML("<h4>🎮 Kontrol Crawling</h4>"),
    widgets.HBox([start_crawl_btn, pause_crawl_btn, stop_crawl_btn, export_data_btn]),
    crawl_status
])

# Progress section
progress_section = widgets.VBox([
    widgets.HTML("<h4>📊 Progress Monitoring</h4>"),
    progress_bar,
    widgets.HBox([video_progress, comment_stats]),
    widgets.HBox([api_usage, time_elapsed])
])

# Main container
main_crawl_container = widgets.VBox([
    widgets.HTML("<h3 style='color: #1976D2; margin-bottom: 20px;'>🚀 YouTube Comments Crawler</h3>"),
    widgets.HTML("<div style='color: #666; font-size: 12px; margin: -15px 0 20px 0; font-style: italic;'>Developed by Ferdian Bangkit Wijaya, Universitas Sultan Ageng Tirtayasa</div>"),
    control_section,
    widgets.HTML("<hr style='margin: 15px 0;'>"),
    progress_section,
    widgets.HTML("<h4 style='margin: 15px 0 10px 0;'>📋 Crawling Output</h4>"),
    crawl_output
], layout=widgets.Layout(
    border='2px solid #1976D2',
    padding='25px',
    margin='10px',
    width='100%',
    max_width='1200px',
    background_color='#fafafa'
))

# Display the widget
display(main_crawl_container)

# Initial validation
errors, warnings = validate_prerequisites()
with crawl_output:
    print("🔍 VALIDASI PREREQUISITE")
    print("=" * 50)
    
    if errors:
        print("❌ MASALAH YANG HARUS DIPERBAIKI:")
        for error in errors:
            print(f"   {error}")
        print("\n💡 Perbaiki masalah di atas sebelum memulai crawling.")
        start_crawl_btn.disabled = True
    else:
        print("✅ SEMUA PREREQUISITE TERPENUHI!")
        print("🎯 Siap untuk memulai crawling.")
        
        if warnings:
            print("\n⚠️ PERINGATAN:")
            for warning in warnings:
                print(f"   {warning}")
        
        # Show summary
        valid_videos = [v for v in video_data_list if v.get('status') == 'valid']
        active_attrs = sum(1 for v in crawling_config['attributes'].values() if v)
        
        print(f"\n📊 RINGKASAN:")
        print(f"   🎬 Video valid: {len(valid_videos)}")
        print(f"   ⚙️ Atribut aktif: {active_attrs}")
        print(f"   🔧 Mode: {crawling_config['mode']}")
        print(f"   💬 Max comments per video: {crawling_config['max_comments']}")
        print(f"   🔄 Include replies: {crawling_config['include_replies']}")
        
        estimated_api_calls = len(valid_videos) * (crawling_config['max_comments'] // 50 + 1)
        print(f"   💰 Estimasi API calls: ~{estimated_api_calls}")

print("\n✅ YouTube Comments Crawler berhasil dimuat!")
print("\n🎯 FITUR YANG TERSEDIA:")
print("   • 🚀 Start/Pause/Stop crawling dengan kontrol penuh")
print("   • 📊 Real-time progress monitoring dan statistik")
print("   • 💰 API usage tracking dan quota management")
print("   • ⚡ Multi-threaded crawling untuk performa optimal")
print("   • 🛡️ Error handling dan recovery mechanism")
print("   • 💾 Export data ke 3 format: Excel (.xlsx), JSON (.json), Text (.txt)")
print("   • 🎯 Konfigurasi atribut sesuai kebutuhan penelitian")
print("\n💡 SIAP UNTUK CRAWLING!")

🚀 Memuat YouTube Comments Crawler...


VBox(children=(HTML(value="<h3 style='color: #1976D2; margin-bottom: 20px;'>🚀 YouTube Comments Crawler</h3>"),…


✅ YouTube Comments Crawler berhasil dimuat!

🎯 FITUR YANG TERSEDIA:
   • 🚀 Start/Pause/Stop crawling dengan kontrol penuh
   • 📊 Real-time progress monitoring dan statistik
   • 💰 API usage tracking dan quota management
   • ⚡ Multi-threaded crawling untuk performa optimal
   • 🛡️ Error handling dan recovery mechanism
   • 💾 Export data ke 3 format: Excel (.xlsx), JSON (.json), Text (.txt)
   • 🎯 Konfigurasi atribut sesuai kebutuhan penelitian

💡 SIAP UNTUK CRAWLING!


## 8. DATA PREVIEW & ANALYSIS DASHBOARD

Setelah proses crawling selesai, bagian ini akan menampilkan preview hasil data yang telah dikumpulkan dalam format yang mudah dibaca dan dianalisis.

### 🔍 **Fitur Preview Data**

#### **📊 DataFrame Display**
- **Tampilan Tabular**: Data ditampilkan dalam format DataFrame pandas yang terstruktur
- **Filtering & Sorting**: Kemampuan untuk memfilter dan mengurutkan data berdasarkan kolom tertentu
- **Pagination**: Navigasi data dengan sistem halaman untuk dataset besar
- **Export Preview**: Preview data sebelum melakukan export ke berbagai format

#### **📈 Statistik Ringkas**
- **Video Statistics**: Jumlah total video, rata-rata komentar per video
- **Comment Analytics**: Total komentar, distribusi sentiment, engagement metrics
- **Time Analysis**: Waktu crawling, efisiensi API usage
- **Data Quality**: Persentase data lengkap, missing values, error rate

#### **🎯 Filter & Analisis**
- **Date Range**: Filter komentar berdasarkan rentang tanggal
- **Sentiment Filter**: Tampilkan komentar berdasarkan sentiment (positif/negatif/netral)
- **Engagement Level**: Filter berdasarkan jumlah likes, replies
- **Video Metadata**: Filter berdasarkan channel, durasi video, kategori

#### **💡 Use Cases**
- **Data Validation**: Verifikasi kualitas dan kelengkapan data crawling
- **Quick Analysis**: Analisis cepat pola dan tren dalam data
- **Sample Export**: Export subset data untuk pengujian
- **Research Preview**: Preview data untuk keperluan penelitian akademik

### ⚠️ **Catatan Penting**
- Preview akan muncul otomatis setelah crawling selesai
- Data dapat difilter dan disortir sesuai kebutuhan penelitian
- Gunakan fitur export untuk menyimpan hasil analisis
- Data preview membantu validasi sebelum analisis mendalam

In [16]:
# ===== Preview Hasil Crawling =====

print("📊 Memuat Data Preview Dashboard...")

import pandas as pd
from IPython.display import display, HTML

# ===== GLOBAL PREVIEW DATA =====
preview_data = {
    'df': None,
    'filtered_df': None,
    'current_page': 0,
    'page_size': 10,
    'total_pages': 0
}

# ===== FAST PREVIEW FUNCTIONS =====

def create_fast_preview():
    """Create fast preview using existing export data structure"""
    if not crawl_data or not crawl_data.get('results'):
        return None
    
    # Re-use the exact same logic as Excel export for consistency
    excel_data = []
    
    for video_result in crawl_data['results']:
        video_base = {
            'video_id': video_result['video_id'],
            'video_title': video_result.get('video_title', ''),
            'video_url': video_result.get('video_url', ''),
            'channel_name': video_result.get('channel_name', ''),
            'channel_url': video_result.get('channel_url', ''),
            'video_duration': video_result.get('video_duration', ''),
            'view_count': video_result.get('view_count', 0),
            'video_category': video_result.get('video_category', ''),
            'crawl_timestamp': video_result['crawl_timestamp']
        }
        
        for comment in video_result['comments']:
            row = {**video_base, **comment}
            
            # Handle replies
            if 'replies' in comment and comment['replies']:
                for reply in comment['replies']:
                    reply_row = {**row, **reply}
                    reply_row['is_reply'] = True
                    excel_data.append(reply_row)
            else:
                row['is_reply'] = False
                excel_data.append(row)
    
    # Create DataFrame
    df = pd.DataFrame(excel_data)
    return df

def apply_filters():
    """Apply filters to the data"""
    if preview_data['df'] is None:
        return None
    
    df = preview_data['df'].copy()
    
    # Text filter
    if text_filter.value.strip():
        if 'comment_text' in df.columns:
            df = df[df['comment_text'].astype(str).str.contains(text_filter.value, case=False, na=False)]
    
    # Video title filter
    if video_filter.value != 'All Videos':
        if 'video_title' in df.columns:
            df = df[df['video_title'] == video_filter.value]
    
    # Sentiment filter
    if sentiment_filter.value != 'All':
        if 'sentiment_label' in df.columns:
            df = df[df['sentiment_label'] == sentiment_filter.value]
    
    # Likes filter
    if 'like_count' in df.columns:
        # Convert to numeric first
        df['like_count'] = pd.to_numeric(df['like_count'], errors='coerce').fillna(0)
        df = df[df['like_count'] >= likes_filter.value]
    
    # Reply filter
    if reply_filter.value != 'All':
        if 'is_reply' in df.columns:
            show_replies = reply_filter.value == 'Replies Only'
            df = df[df['is_reply'] == show_replies]
    
    return df

def show_preview_data():
    """Show preview data with current filters and pagination"""
    # Apply filters
    filtered_df = apply_filters()
    if filtered_df is None:
        with output_area:
            output_area.clear_output()
            print("❌ Tidak ada data. Klik 'Refresh Data' terlebih dahulu.")
        return
    
    preview_data['filtered_df'] = filtered_df
    
    if len(filtered_df) == 0:
        with output_area:
            output_area.clear_output()
            print("❌ Tidak ada data yang sesuai dengan filter")
            print(f"🔍 Filter aktif:")
            print(f"   • Text: '{text_filter.value}'")
            print(f"   • Video: '{video_filter.value}'")
            print(f"   • Sentiment: '{sentiment_filter.value}'")
            print(f"   • Min Likes: {likes_filter.value}")
            print(f"   • Type: '{reply_filter.value}'")
        return
    
    # Calculate pagination
    total_rows = len(filtered_df)
    start_idx = preview_data['current_page'] * preview_data['page_size']
    end_idx = min(start_idx + preview_data['page_size'], total_rows)
    preview_data['total_pages'] = (total_rows - 1) // preview_data['page_size'] + 1
    
    # Adjust current page if needed
    if preview_data['current_page'] >= preview_data['total_pages']:
        preview_data['current_page'] = 0
        start_idx = 0
        end_idx = min(preview_data['page_size'], total_rows)
    
    # Get page data
    page_data = filtered_df.iloc[start_idx:end_idx]
    
    with output_area:
        output_area.clear_output()
        
        # Show summary
        print(f"📊 PREVIEW DATA - Page {preview_data['current_page'] + 1}/{preview_data['total_pages']}")
        print(f"📋 Showing {len(page_data)} of {total_rows} filtered records (from {len(preview_data['df'])} total)")
        
        # Show active filters
        active_filters = []
        if text_filter.value.strip():
            active_filters.append(f"Text: '{text_filter.value}'")
        if video_filter.value != 'All Videos':
            active_filters.append(f"Video: '{video_filter.value[:30]}...'")
        if sentiment_filter.value != 'All':
            active_filters.append(f"Sentiment: {sentiment_filter.value}")
        if likes_filter.value > 0:
            active_filters.append(f"Min Likes: {likes_filter.value}")
        if reply_filter.value != 'All':
            active_filters.append(f"Type: {reply_filter.value}")
        
        if active_filters:
            print(f"🔍 Active filters: {' | '.join(active_filters)}")
        
        print("=" * 80)
        
        # Select and display columns
        display_cols = []
        all_cols = filtered_df.columns.tolist()
        
        # Priority columns including URLs
        priority = ['video_title', 'video_url', 'channel_name', 'channel_url', 
                   'comment_text', 'author_name', 'publish_date', 'like_count',
                   'reply_count', 'sentiment_label', 'word_count', 'view_count', 'is_reply']
        
        for col in priority:
            if col in all_cols:
                display_cols.append(col)
        
        # Add remaining important columns
        for col in all_cols:
            if col not in display_cols and col not in ['video_id', 'crawl_timestamp', 'replies']:
                display_cols.append(col)
        
        # Create display dataframe
        display_df = page_data[display_cols].copy()
        
        # Truncate long text
        for col in ['video_title', 'comment_text']:
            if col in display_df.columns:
                display_df[col] = display_df[col].astype(str).str[:50] + '...'
        
        # Show as HTML table with scroll
        html_table = display_df.to_html(
            index=False, 
            escape=False,
            classes='preview-table',
            table_id='preview-table'
        )
        
        # Style and display
        styled_html = f"""
        <div style="overflow-x: auto; max-height: 400px; border: 1px solid #ddd; border-radius: 4px;">
            <style>
                .preview-table {{
                    width: 100%;
                    border-collapse: collapse;
                    font-size: 11px;
                }}
                .preview-table th {{
                    background: #f8f9fa;
                    padding: 6px;
                    text-align: left;
                    border-bottom: 2px solid #dee2e6;
                    position: sticky;
                    top: 0;
                    white-space: nowrap;
                }}
                .preview-table td {{
                    padding: 4px 6px;
                    border-bottom: 1px solid #dee2e6;
                    max-width: 200px;
                    word-wrap: break-word;
                }}
                .preview-table tr:hover {{
                    background: #f5f5f5;
                }}
            </style>
            {html_table}
        </div>
        """
        
        display(HTML(styled_html))
        
        # Show column info
        print(f"\n📋 Kolom ditampilkan: {', '.join(display_cols)}")
        url_cols = [col for col in display_cols if 'url' in col.lower()]
        print(f"🔗 URL Columns: {', '.join(url_cols) if url_cols else 'Tidak ada'}")

# ===== UI WIDGETS =====

# Refresh button
refresh_btn = widgets.Button(
    description='🔄 Refresh Data',
    button_style='primary',
    layout=widgets.Layout(width='120px')
)

# Filter widgets
text_filter = widgets.Text(
    placeholder='Cari dalam komentar...',
    description='🔍 Text:',
    layout=widgets.Layout(width='250px')
)

video_filter = widgets.Dropdown(
    options=[('All Videos', 'All Videos')],
    value='All Videos',
    description='🎬 Video:',
    layout=widgets.Layout(width='250px')
)

sentiment_filter = widgets.Dropdown(
    options=[('All', 'All'), ('Positive', 'positive'), ('Neutral', 'neutral'), ('Negative', 'negative')],
    value='All',
    description='😊 Sentiment:',
    layout=widgets.Layout(width='180px')
)

likes_filter = widgets.IntSlider(
    value=0,
    min=0,
    max=100,
    step=1,
    description='👍 Min Likes:',
    layout=widgets.Layout(width='250px')
)

reply_filter = widgets.Dropdown(
    options=[('All', 'All'), ('Comments Only', 'Comments Only'), ('Replies Only', 'Replies Only')],
    value='All',
    description='💬 Type:',
    layout=widgets.Layout(width='180px')
)

# Page size selector
page_size_select = widgets.Dropdown(
    options=[('10 rows', 10), ('25 rows', 25), ('50 rows', 50)],
    value=10,
    description='📄 Show:',
    layout=widgets.Layout(width='120px')
)

# Navigation buttons
prev_btn = widgets.Button(description='◀ Prev', layout=widgets.Layout(width='80px'))
next_btn = widgets.Button(description='Next ▶', layout=widgets.Layout(width='80px'))
page_info = widgets.HTML(value="<b>Page: 0 / 0</b>")

# Output area
output_area = widgets.Output()

# ===== EVENT HANDLERS =====

def refresh_data(b=None):
    """Refresh preview data"""
    with output_area:
        output_area.clear_output()
        print("🔄 Loading data...")
        
        try:
            if not crawl_data or not crawl_data.get('results'):
                print("❌ No crawl data available. Please run crawling first.")
                return
            
            # Create DataFrame
            df = create_fast_preview()
            if df is None or len(df) == 0:
                print("❌ Failed to create preview data.")
                return
            
            preview_data['df'] = df
            preview_data['current_page'] = 0
            preview_data['page_size'] = page_size_select.value
            
            # Update filter options
            if 'video_title' in df.columns:
                video_titles = ['All Videos'] + sorted(df['video_title'].unique().tolist())
                video_filter.options = [(title[:60] + '...', title) if len(title) > 60 else (title, title) for title in video_titles]
            
            if 'like_count' in df.columns and df['like_count'].dtype in ['int64', 'float64']:
                max_likes = int(df['like_count'].max())
                likes_filter.max = max_likes
            
            print(f"✅ Data loaded successfully!")
            print(f"📊 Total: {len(df)} records from {df['video_id'].nunique()} video(s)")
            print(f"🔗 URL fields: {'✅ Available' if any('url' in col for col in df.columns) else '❌ Not found'}")
            print("\n")
            
            # Show first page
            show_preview_data()
            update_navigation()
            
        except Exception as e:
            print(f"❌ Error: {str(e)}")

def apply_filters_and_update(change=None):
    """Apply filters and update display"""
    if preview_data['df'] is not None:
        preview_data['current_page'] = 0  # Reset to first page when filtering
        show_preview_data()
        update_navigation()

def prev_page(b):
    """Go to previous page"""
    if preview_data['current_page'] > 0:
        preview_data['current_page'] -= 1
        show_preview_data()
        update_navigation()

def next_page(b):
    """Go to next page"""
    if preview_data['current_page'] < preview_data['total_pages'] - 1:
        preview_data['current_page'] += 1
        show_preview_data()
        update_navigation()

def page_size_changed(change):
    """Handle page size change"""
    if change['name'] == 'value' and preview_data['df'] is not None:
        preview_data['page_size'] = change['new']
        preview_data['current_page'] = 0
        show_preview_data()
        update_navigation()

def update_navigation():
    """Update navigation buttons and info"""
    if preview_data.get('filtered_df') is not None:
        page_info.value = f"<b>Page: {preview_data['current_page'] + 1} / {preview_data['total_pages']}</b>"
        prev_btn.disabled = preview_data['current_page'] <= 0
        next_btn.disabled = preview_data['current_page'] >= preview_data['total_pages'] - 1
    else:
        page_info.value = "<b>Page: 0 / 0</b>"
        prev_btn.disabled = True
        next_btn.disabled = True

# ===== BIND EVENTS =====
refresh_btn.on_click(refresh_data)
text_filter.observe(apply_filters_and_update, names='value')
video_filter.observe(apply_filters_and_update, names='value')
sentiment_filter.observe(apply_filters_and_update, names='value')
likes_filter.observe(apply_filters_and_update, names='value')
reply_filter.observe(apply_filters_and_update, names='value')
prev_btn.on_click(prev_page)
next_btn.on_click(next_page)
page_size_select.observe(page_size_changed, names='value')

# ===== UI LAYOUT =====

# Filter section
filter_row1 = widgets.HBox([text_filter, video_filter])
filter_row2 = widgets.HBox([sentiment_filter, reply_filter, likes_filter])
filter_section = widgets.VBox([
    widgets.HTML("<h4 style='margin: 10px 0;'>🔍 Filter Data</h4>"),
    filter_row1,
    filter_row2
])

# Control row
control_row = widgets.HBox([
    refresh_btn,
    widgets.HTML("&nbsp;" * 3),
    page_size_select,
    widgets.HTML("&nbsp;" * 3),
    prev_btn,
    page_info,
    next_btn
])

# Main container
main_container = widgets.VBox([
    widgets.HTML("<h3 style='color: #1976D2;'>📊 Preview Hasil Crawling</h3>"),
    widgets.HTML("""
    <div style='background: #e3f2fd; padding: 12px; border-radius: 6px; margin-bottom: 15px; border-left: 4px solid #1976D2;'>
        <p style='margin: 0 0 8px 0; color: #1565C0; font-weight: bold;'>💡 Preview & Analysis Dashboard</p>
        <p style='margin: 0; color: #1565C0; font-size: 14px;'>
        Analisis data crawling dengan filter multi-kriteria dan navigasi halaman. 
        Semua field termasuk URL video & channel ditampilkan sesuai konfigurasi mode crawling yang dipilih.
        Data preview 100% konsisten dengan hasil export Excel, JSON, dan Text.
        </p>
    </div>
    """),
    widgets.HTML("""
    <div style='background: #f3e5f5; padding: 8px 12px; border-radius: 4px; margin-bottom: 10px; border-left: 3px solid #9c27b0;'>
        <p style='margin: 0; color: #7b1fa2; font-size: 12px; font-style: italic;'>
        👨‍💻 Dikembangkan oleh: <strong>Ferdian Bangkit Wijaya</strong> | 🏫 Universitas Sultan Ageng Tirtayasa
        </p>
    </div>
    """),
    filter_section,
    widgets.HTML("<hr style='margin: 15px 0;'>"),
    control_row,
    output_area
])

# Display
display(main_container)

print("✅ Fast Preview Dashboard with Filters loaded!")
print("\n🎯 FEATURES:")
print("   • 📊 Fast preview using Excel export data structure")
print("   • 🔍 Multiple filters: text, video, sentiment, likes, reply type")
print("   • 🔗 All configured fields including video_url & channel_url")
print("   • 📄 Smart pagination for large datasets")
print("   • ⚡ Optimized for speed with real-time filtering")

print("\n💡 FILTERS AVAILABLE:")
print("   • 🔍 Text: Search in comment text")
print("   • 🎬 Video: Filter by specific video title")
print("   • 😊 Sentiment: Filter by sentiment (positive/neutral/negative)")
print("   • 👍 Min Likes: Minimum number of likes")
print("   • 💬 Type: Comments only, replies only, or all")

print("\n💡 USAGE:")
print("   1. Click '🔄 Refresh Data' to load crawling results")
print("   2. Use filters to narrow down data")
print("   3. Navigate pages to browse large datasets")
print("   4. All filters update results in real-time")

print("\n💡 READY FOR FILTERED PREVIEW!")

# ===== DEVELOPER INFO =====
print("\n" + "="*60)
print("👨‍💻 Dikembangkan oleh: Ferdian Bangkit Wijaya")  
print("🏫 Universitas Sultan Ageng Tirtayasa")
print("="*60)

📊 Memuat Data Preview Dashboard...


VBox(children=(HTML(value="<h3 style='color: #1976D2;'>📊 Preview Hasil Crawling</h3>"), HTML(value="\n    <div…

✅ Fast Preview Dashboard with Filters loaded!

🎯 FEATURES:
   • 📊 Fast preview using Excel export data structure
   • 🔍 Multiple filters: text, video, sentiment, likes, reply type
   • 🔗 All configured fields including video_url & channel_url
   • 📄 Smart pagination for large datasets
   • ⚡ Optimized for speed with real-time filtering

💡 FILTERS AVAILABLE:
   • 🔍 Text: Search in comment text
   • 🎬 Video: Filter by specific video title
   • 😊 Sentiment: Filter by sentiment (positive/neutral/negative)
   • 👍 Min Likes: Minimum number of likes
   • 💬 Type: Comments only, replies only, or all

💡 USAGE:
   1. Click '🔄 Refresh Data' to load crawling results
   2. Use filters to narrow down data
   3. Navigate pages to browse large datasets
   4. All filters update results in real-time

💡 READY FOR FILTERED PREVIEW!

👨‍💻 Dikembangkan oleh: Ferdian Bangkit Wijaya
🏫 Universitas Sultan Ageng Tirtayasa


---

## 📋 **Informasi Peneliti & Kebijakan Penggunaan**

### 👨‍💻 **Informasi Author**

**Author:** Ferdian Bangkit Wijaya  
**Afiliasi:** Universitas Sultan Ageng Tirtayasa  
**Email:** ferdian.bangkit@untirta.ac.id  
**Versi:** 1.0  
**Tanggal Rilis:** 29 Juli 2025  

---

### 📖 **Tentang Tool Ini**

**YouTube Comment Crawler v1.0** adalah tool penelitian yang dikembangkan untuk membantu akademisi, peneliti, dan mahasiswa dalam mengumpulkan dan menganalisis data komentar YouTube untuk keperluan penelitian akademik yang etis dan bertanggung jawab.

#### 🔧 **Fitur Utama:**
- ✅ Crawling komentar YouTube dengan berbagai mode (Cepat, Lengkap, Kustom)
- ✅ Analisis sentiment otomatis
- ✅ Export data dalam format Excel, JSON, dan Text
- ✅ Dashboard preview dengan filter interaktif
- ✅ Manajemen API quota yang efisien
- ✅ URL tracking untuk video dan channel

---

### ⚖️ **Kebijakan & Etika Penggunaan**

#### 📝 **Syarat & Ketentuan:**

1. **Penggunaan Akademik:** Tool ini diperuntukkan **HANYA** untuk keperluan penelitian akademik, pendidikan, dan analisis data yang etis.

2. **Privasi & Data:** 
   - Hormati privasi pengguna YouTube
   - Jangan menyalahgunakan data personal yang dikumpulkan
   - Patuhi GDPR dan regulasi privasi data yang berlaku

3. **Compliance YouTube API:**
   - Wajib menggunakan **API Key YouTube yang valid dan legal**
   - Patuhi [YouTube API Terms of Service](https://developers.google.com/youtube/terms/api-services-terms-of-service)
   - Hormati rate limiting dan quota yang ditetapkan Google

4. **Penggunaan Data:**
   - Data yang dikumpulkan hanya untuk analisis dan tidak untuk komersial
   - Tidak menyebarkan informasi pribadi pengguna
   - Mengamankan data hasil crawling dengan baik

#### ❌ **Larangan Penggunaan:**

- ⛔ Spam atau harassment terhadap content creator
- ⛔ Mengumpulkan data untuk keperluan komersial tanpa izin
- ⛔ Melanggar terms of service platform YouTube
- ⛔ Menyebarkan informasi pribadi atau sensitif

#### ✅ **Penggunaan yang Dianjurkan:**

- 📊 Analisis sentiment untuk riset komunikasi
- 📈 Studi perilaku pengguna media sosial
- 🎓 Penelitian akademik dan skripsi/tesis
- 📝 Analisis konten untuk keperluan edukasi

---

### 🛡️ **Disclaimer & Batasan Tanggung Jawab**

1. **Keakuratan Data:** Penulis tidak menjamin 100% keakuratan data yang dikumpulkan dari YouTube API.

2. **Perubahan API:** Tool ini bergantung pada YouTube Data API v3 yang dapat berubah sewaktu-waktu oleh Google.

3. **Tanggung Jawab Pengguna:** Pengguna bertanggung jawab penuh atas penggunaan tool dan data yang dikumpulkan.

4. **Legal Compliance:** Pengguna wajib mematuhi hukum dan regulasi yang berlaku di wilayah masing-masing.

---

### 📞 **Kontak & Dukungan**

**Email:** ferdian.bangkit@untirta.ac.id  
**Institusi:** Universitas Sultan Ageng Tirtayasa  
**Fakultas:** Teknik - Statistika

Untuk pertanyaan, saran, atau laporan bug, silakan hubungi melalui email di atas.

---

### 🔄 **Update & Maintenance**

**Versi Saat Ini:** 1.0 (29 Juli 2025)  
**Status:** Aktif dan Maintenance  
**Update Terakhir:** 29 Juli 2025  

#### 📝 **Changelog v1.0:**
- ✅ Sistem crawling multi-mode (Fast, Complete, Custom)
- ✅ Dashboard konfigurasi atribut interaktif
- ✅ Export multi-format (Excel, JSON, Text)
- ✅ Preview data dengan filter dan pagination
- ✅ Analisis sentiment terintegrasi
- ✅ URL tracking untuk video dan channel
- ✅ Manajemen API quota otomatis

---

### 🤝 **Kontribusi & Pengembangan**

Tool ini dikembangkan sebagai bagian dari upaya mendukung penelitian akademik di Indonesia. Kontribusi, saran, dan feedback sangat diterima untuk pengembangan versi selanjutnya.

---

**© 2025 Ferdian Bangkit Wijaya - Universitas Sultan Ageng Tirtayasa**  
**Lisensi:** Akademik dan Penelitian | **Versi:** 1.0

---

> **⚠️ PENTING:** Dengan menggunakan tool ini, Anda menyetujui semua kebijakan dan etika penggunaan di atas. Penggunaan tool ini sepenuhnya merupakan tanggung jawab pengguna.