# Analisis Sentimen TikTok: Persepsi Publik terhadap Isu Sosial Web3

Implementasi lengkap metodologi penelitian sesuai proposal.

**Tahapan:**
1. Setup dan Import Libraries
2. Pengumpulan Data (TikTok Scraping)
3. Pra-Pemrosesan Data
4. Analisis Sentimen (Rule-based)
5. Analisis Trending Topic
6. Visualisasi Wordcloud
7. Analisis Kritis dan Kesimpulan

**Semua output akan disimpan ke folder `output/`**

In [2]:
# Install required packages
# !pip install pandas numpy matplotlib seaborn wordcloud TikTokApi playwright requests

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud
import re
import json
import os
from datetime import datetime
from collections import Counter
import warnings
warnings.filterwarnings('ignore')

# Setup matplotlib
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')
%matplotlib inline

# Create output directories
os.makedirs('output', exist_ok=True)
os.makedirs('output/wordclouds', exist_ok=True)
os.makedirs('output/graphs', exist_ok=True)
os.makedirs('output/data', exist_ok=True)

print('✓ Setup selesai')
print('✓ Folder output dibuat')

✓ Setup selesai
✓ Folder output dibuat


## TAHAP 1: Pengumpulan Data dari TikTok

**Target Hashtags:**
- #AIethics
- #blockchain
- #sustainability
- #web3
- #digitalfreedom
- #cryptocurrency
- #NFT
- #metaverse
- #privacy

In [None]:
# Fungsi scraping TikTok
def scrape_tiktok_data(hashtags, videos_per_hashtag=100):
    """
    Scrape data dari TikTok berdasarkan hashtag
    
    Parameters:
    - hashtags: list of strings
    - videos_per_hashtag: int, jumlah video per hashtag
    
    Returns:
    - DataFrame dengan kolom: video_id, username, caption, comment, likes, hashtags, date
    """
    
    # IMPLEMENTASI SCRAPING
    # Pilih salah satu metode:
    # 1. TikTokApi (unofficial)
    # 2. RapidAPI TikTok Scraper
    # 3. Apify TikTok Scraper
    # 4. Playwright/Selenium manual scraping
    
    all_data = []
    
    for hashtag in hashtags:
        print(f'Scraping #{hashtag}...')
        
        # TODO: Implementasi scraping sesuai API yang dipilih
        # Contoh struktur:
        # from TikTokApi import TikTokApi
        # api = TikTokApi()
        # tag = api.hashtag(name=hashtag)
        # videos = tag.videos(count=videos_per_hashtag)
        # 
        # for video in videos:
        #     video_data = {
        #         'video_id': video.id,
        #         'username': video.author.username,
        #         'caption': video.desc,
        #         'likes': video.stats.diggCount,
        #         'hashtags': ' '.join([f'#{tag}' for tag in video.challenges]),
        #         'date': datetime.fromtimestamp(video.createTime)
        #     }
        #     
        #     # Get comments
        #     comments = video.comments(count=50)
        #     for comment in comments:
        #         comment_data = video_data.copy()
        #         comment_data['comment'] = comment.text
        #         all_data.append(comment_data)
        
        pass
    
    df = pd.DataFrame(all_data)
    return df

# Target hashtags
hashtags = [
    'AIethics', 'blockchain', 'sustainability', 'web3',
    'digitalfreedom', 'cryptocurrency', 'NFT', 'metaverse', 'privacy'
]

# Scrape data
print('Memulai scraping...')
df_raw = scrape_tiktok_data(hashtags, videos_per_hashtag=100)

# Save raw data
df_raw.to_csv('output/data/raw_data.csv', index=False)
print(f'\n✓ Data berhasil di-scrape: {len(df_raw)} baris')
print(f'✓ Raw data disimpan: output/data/raw_data.csv')
print('\nPreview data:')
df_raw.head()

## TAHAP 2: Pra-Pemrosesan Data

**Langkah:**
1. Case Folding
2. Tokenisasi
3. Stopword Removal
4. Normalisasi (slang → formal)
5. Filtering (emoji, URL, mention, hashtag)

In [None]:
# Stopwords bahasa Indonesia
stopwords_id = [
    'yang', 'dan', 'di', 'ke', 'dari', 'untuk', 'pada', 'dengan', 'adalah',
    'ini', 'itu', 'atau', 'juga', 'dalam', 'tidak', 'ada', 'akan', 'oleh',
    'saya', 'kamu', 'dia', 'kita', 'mereka', 'kami', 'anda', 'nya', 'mu',
    'ku', 'si', 'para', 'sang', 'pak', 'bu', 'bapak', 'ibu', 'mas', 'mbak',
    'sudah', 'belum', 'telah', 'masih', 'dapat', 'bisa', 'harus', 'ingin',
    'mau', 'bisa', 'jadi', 'jangan', 'kalau', 'kalo', 'bila', 'ketika',
    'saat', 'waktu', 'hari', 'tahun', 'bulan', 'minggu', 'jam', 'menit'
]

# Kamus normalisasi slang TikTok
slang_dict = {
    'gak': 'tidak', 'ga': 'tidak', 'gk': 'tidak',
    'banget': 'sangat', 'bgt': 'sangat', 'bingit': 'sangat',
    'keren': 'bagus', 'mantap': 'bagus', 'mantul': 'bagus',
    'gimana': 'bagaimana', 'gmn': 'bagaimana', 'gmana': 'bagaimana',
    'udah': 'sudah', 'dah': 'sudah', 'udh': 'sudah',
    'emang': 'memang', 'emg': 'memang',
    'doang': 'saja', 'aja': 'saja', 'aj': 'saja',
    'cuma': 'hanya', 'cm': 'hanya',
    'hype': 'populer', 'viral': 'populer',
    'scam': 'penipuan', 'tipu': 'penipuan',
    'bingung': 'membingungkan', 'ribet': 'rumit',
    'susah': 'sulit', 'gampang': 'mudah',
    'mahal': 'mahal', 'murah': 'murah',
    'jelek': 'buruk', 'bagus': 'baik',
    'kece': 'bagus', 'oke': 'baik', 'ok': 'baik'
}

def preprocess_text(text):
    """
    Preprocessing teks sesuai metodologi proposal
    """
    if pd.isna(text) or text == '':
        return ''
    
    # 1. Case folding
    text = text.lower()
    
    # 2. Hapus URL
    text = re.sub(r'http\S+|www\S+|https\S+', '', text, flags=re.MULTILINE)
    
    # 3. Hapus mention (@username)
    text = re.sub(r'@\w+', '', text)
    
    # 4. Hapus hashtag (sudah disimpan terpisah)
    text = re.sub(r'#\w+', '', text)
    
    # 5. Hapus emoji dan emoticon
    emoji_pattern = re.compile("["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        "]+", flags=re.UNICODE)
    text = emoji_pattern.sub(r'', text)
    
    # 6. Hapus tanda baca berlebihan
    text = re.sub(r'[^\w\s]', ' ', text)
    
    # 7. Hapus angka
    text = re.sub(r'\d+', '', text)
    
    # 8. Hapus spasi berlebihan
    text = re.sub(r'\s+', ' ', text).strip()
    
    # 9. Tokenisasi
    tokens = text.split()
    
    # 10. Normalisasi slang
    tokens = [slang_dict.get(word, word) for word in tokens]
    
    # 11. Stopword removal
    tokens = [word for word in tokens if word not in stopwords_id and len(word) > 2]
    
    return ' '.join(tokens)

# Apply preprocessing
print('Memproses caption...')
df_raw['processed_caption'] = df_raw['caption'].apply(preprocess_text)

print('Memproses comment...')
df_raw['processed_comment'] = df_raw['comment'].apply(preprocess_text)

# Gabungkan caption dan comment untuk analisis
df_raw['processed_text'] = df_raw['processed_caption'] + ' ' + df_raw['processed_comment']
df_raw['processed_text'] = df_raw['processed_text'].str.strip()

# Remove empty rows
df_clean = df_raw[df_raw['processed_text'] != ''].copy()

# Save preprocessed data
df_clean.to_csv('output/data/preprocessed_data.csv', index=False)

print(f'\n✓ Preprocessing selesai')
print(f'✓ Data bersih: {len(df_clean)} baris')
print(f'✓ Data disimpan: output/data/preprocessed_data.csv')
print('\nContoh hasil preprocessing:')
df_clean[['caption', 'processed_text']].head()

## TAHAP 3: Analisis Sentimen (Rule-based)

**Metode:** Lexicon-based Sentiment Analysis

**Algoritma:**
```
Skor Sentimen = Σ(kata positif × bobot) - Σ(kata negatif × bobot)
```

**Klasifikasi:**
- Skor > 0 → Positif
- Skor = 0 → Netral
- Skor < 0 → Negatif

In [None]:
# Kamus sentimen positif dengan bobot
positive_words = {
    # Bobot 3 (sangat positif)
    'hebat': 3, 'luar biasa': 3, 'revolusioner': 3, 'sempurna': 3,
    'fantastis': 3, 'menakjubkan': 3, 'brilian': 3,
    
    # Bobot 2 (positif)
    'bagus': 2, 'baik': 2, 'inovatif': 2, 'transparan': 2,
    'terdesentralisasi': 2, 'efisien': 2, 'aman': 2, 'solusi': 2,
    'masa depan': 2, 'maju': 2, 'modern': 2, 'canggih': 2,
    'menarik': 2, 'berguna': 2, 'bermanfaat': 2, 'penting': 2,
    'mudah': 2, 'cepat': 2, 'praktis': 2, 'efektif': 2,
    'terpercaya': 2, 'kredibel': 2, 'legitimate': 2,
    
    # Bobot 1 (sedikit positif)
    'setuju': 1, 'suka': 1, 'senang': 1, 'tertarik': 1,
    'optimis': 1, 'harapan': 1, 'potensi': 1, 'peluang': 1,
    'berkembang': 1, 'tumbuh': 1, 'meningkat': 1
}

# Kamus sentimen negatif dengan bobot
negative_words = {
    # Bobot 3 (sangat negatif)
    'penipuan': 3, 'scam': 3, 'rugi': 3, 'berbahaya': 3,
    'manipulasi': 3, 'tidak aman': 3, 'buruk': 3, 'jelek': 3,
    'gagal': 3, 'hancur': 3, 'crash': 3, 'bohong': 3,
    
    # Bobot 2 (negatif)
    'membingungkan': 2, 'rumit': 2, 'sulit': 2, 'mahal': 2,
    'lambat': 2, 'risiko': 2, 'bahaya': 2, 'ancaman': 2,
    'terancam': 2, 'khawatir': 2, 'takut': 2, 'ragu': 2,
    'tidak jelas': 2, 'tidak transparan': 2, 'spekulasi': 2,
    'volatil': 2, 'tidak stabil': 2,
    
    # Bobot 1 (sedikit negatif)
    'kompleks': 1, 'susah': 1, 'bingung': 1, 'kurang': 1,
    'belum': 1, 'masih': 1, 'terbatas': 1, 'lemah': 1
}

# Save lexicons
with open('output/data/positive_lexicon.json', 'w', encoding='utf-8') as f:
    json.dump(positive_words, f, indent=2, ensure_ascii=False)

with open('output/data/negative_lexicon.json', 'w', encoding='utf-8') as f:
    json.dump(negative_words, f, indent=2, ensure_ascii=False)

print('✓ Kamus sentimen dibuat dan disimpan')
print(f'  - Kata positif: {len(positive_words)}')
print(f'  - Kata negatif: {len(negative_words)}')

In [None]:
def calculate_sentiment(text):
    """
    Hitung skor sentimen berdasarkan lexicon
    
    Returns:
    - score: float, skor sentimen
    - label: str, label sentimen (positif/netral/negatif)
    """
    if pd.isna(text) or text == '':
        return 0, 'netral'
    
    words = text.split()
    
    # Hitung skor positif
    positive_score = sum(positive_words.get(word, 0) for word in words)
    
    # Hitung skor negatif
    negative_score = sum(negative_words.get(word, 0) for word in words)
    
    # Total skor
    total_score = positive_score - negative_score
    
    # Klasifikasi
    if total_score > 0:
        label = 'positif'
    elif total_score < 0:
        label = 'negatif'
    else:
        label = 'netral'
    
    return total_score, label

# Apply sentiment analysis
print('Melakukan analisis sentimen...')
df_clean[['sentiment_score', 'sentiment_label']] = df_clean['processed_text'].apply(
    lambda x: pd.Series(calculate_sentiment(x))
)

# Save results
df_clean.to_csv('output/data/sentiment_results.csv', index=False)

print('\n✓ Analisis sentimen selesai')
print(f'✓ Hasil disimpan: output/data/sentiment_results.csv')
print('\nDistribusi Sentimen:')
print(df_clean['sentiment_label'].value_counts())
print('\nStatistik Skor Sentimen:')
print(df_clean['sentiment_score'].describe())

In [None]:
# 1. Bar Chart - Distribusi Sentimen
fig, ax = plt.subplots(figsize=(10, 6))
sentiment_counts = df_clean['sentiment_label'].value_counts()
colors = {'positif': '#2ecc71', 'netral': '#95a5a6', 'negatif': '#e74c3c'}
bars = ax.bar(sentiment_counts.index, sentiment_counts.values, 
              color=[colors[label] for label in sentiment_counts.index])

ax.set_xlabel('Sentimen', fontsize=12)
ax.set_ylabel('Jumlah', fontsize=12)
ax.set_title('Distribusi Sentimen Publik terhadap Isu Web3', fontsize=14, fontweight='bold')

# Add value labels
for bar in bars:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{int(height)}\n({height/len(df_clean)*100:.1f}%)',
            ha='center', va='bottom')

plt.tight_layout()
plt.savefig('output/graphs/sentiment_distribution.png', dpi=300, bbox_inches='tight')
plt.show()

print('✓ Grafik distribusi sentimen disimpan: output/graphs/sentiment_distribution.png')

## TAHAP 4: Analisis Trending Topic

**Metode:**
1. Frequency Analysis
2. TF-IDF Manual
3. Identifikasi Tren Temporal

In [None]:
# 1. Frequency Analysis
all_words = ' '.join(df_clean['processed_text']).split()
word_freq = Counter(all_words)
top_50_words = word_freq.most_common(50)

# Save frequency data
freq_df = pd.DataFrame(top_50_words, columns=['word', 'frequency'])
freq_df.to_csv('output/data/word_frequency.csv', index=False)

print('✓ Analisis frekuensi selesai')
print(f'✓ Top 50 kata disimpan: output/data/word_frequency.csv')
print('\nTop 20 Kata Paling Sering Muncul:')
print(freq_df.head(20))

In [None]:
# 2. TF-IDF Manual Implementation
import math

def calculate_tf(text):
    """Calculate Term Frequency"""
    words = text.split()
    word_count = Counter(words)
    total_words = len(words)
    tf = {word: count/total_words for word, count in word_count.items()}
    return tf

def calculate_idf(documents):
    """Calculate Inverse Document Frequency"""
    N = len(documents)
    idf = {}
    
    # Get all unique words
    all_words = set()
    for doc in documents:
        all_words.update(doc.split())
    
    # Calculate IDF for each word
    for word in all_words:
        doc_count = sum(1 for doc in documents if word in doc.split())
        idf[word] = math.log(N / doc_count) if doc_count > 0 else 0
    
    return idf

def calculate_tfidf(documents):
    """Calculate TF-IDF scores"""
    idf = calculate_idf(documents)
    tfidf_scores = []
    
    for doc in documents:
        tf = calculate_tf(doc)
        tfidf = {word: tf_score * idf.get(word, 0) for word, tf_score in tf.items()}
        tfidf_scores.append(tfidf)
    
    return tfidf_scores, idf

# Calculate TF-IDF
print('Menghitung TF-IDF...')
documents = df_clean['processed_text'].tolist()
tfidf_scores, idf_scores = calculate_tfidf(documents)

# Get top TF-IDF words across all documents
all_tfidf = {}
for tfidf in tfidf_scores:
    for word, score in tfidf.items():
        all_tfidf[word] = all_tfidf.get(word, 0) + score

top_tfidf = sorted(all_tfidf.items(), key=lambda x: x[1], reverse=True)[:50]
tfidf_df = pd.DataFrame(top_tfidf, columns=['word', 'tfidf_score'])
tfidf_df.to_csv('output/data/tfidf_scores.csv', index=False)

print('✓ TF-IDF selesai')
print(f'✓ Hasil disimpan: output/data/tfidf_scores.csv')
print('\nTop 20 Kata Berdasarkan TF-IDF:')
print(tfidf_df.head(20))

In [None]:
# 3. Visualisasi Top Words
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Frequency bar chart
top_20_freq = freq_df.head(20)
ax1.barh(top_20_freq['word'], top_20_freq['frequency'], color='steelblue')
ax1.set_xlabel('Frekuensi', fontsize=11)
ax1.set_title('Top 20 Kata (Frequency)', fontsize=13, fontweight='bold')
ax1.invert_yaxis()

# TF-IDF bar chart
top_20_tfidf = tfidf_df.head(20)
ax2.barh(top_20_tfidf['word'], top_20_tfidf['tfidf_score'], color='coral')
ax2.set_xlabel('TF-IDF Score', fontsize=11)
ax2.set_title('Top 20 Kata (TF-IDF)', fontsize=13, fontweight='bold')
ax2.invert_yaxis()

plt.tight_layout()
plt.savefig('output/graphs/top_words_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print('✓ Grafik top words disimpan: output/graphs/top_words_comparison.png')

## TAHAP 5: Visualisasi Wordcloud

**Jenis Wordcloud:**
1. Wordcloud Keseluruhan
2. Wordcloud Per Sentimen (Positif, Negatif, Netral)
3. Wordcloud Per Topik

In [None]:
# 1. Wordcloud Keseluruhan
all_text = ' '.join(df_clean['processed_text'])

wordcloud_all = WordCloud(
    width=1600,
    height=800,
    background_color='white',
    colormap='viridis',
    max_words=100,
    relative_scaling=0.5,
    min_font_size=10
).generate(all_text)

plt.figure(figsize=(20, 10))
plt.imshow(wordcloud_all, interpolation='bilinear')
plt.axis('off')
plt.title('Wordcloud Keseluruhan - Isu Web3 di TikTok', fontsize=20, fontweight='bold', pad=20)
plt.savefig('output/wordclouds/wordcloud_overall.png', dpi=300, bbox_inches='tight')
plt.show()

print('✓ Wordcloud keseluruhan disimpan: output/wordclouds/wordcloud_overall.png')

In [None]:
# 2. Wordcloud Per Sentimen
fig, axes = plt.subplots(1, 3, figsize=(24, 8))

sentiments = [
    ('positif', 'Greens', axes[0]),
    ('netral', 'Greys', axes[1]),
    ('negatif', 'Reds', axes[2])
]

for sentiment, colormap, ax in sentiments:
    text_data = ' '.join(df_clean[df_clean['sentiment_label'] == sentiment]['processed_text'])
    
    if text_data.strip():  # Only create if there's data
        wc = WordCloud(
            width=1200,
            height=800,
            background_color='white',
            colormap=colormap,
            max_words=80,
            relative_scaling=0.5,
            min_font_size=10
        ).generate(text_data)
        
        ax.imshow(wc, interpolation='bilinear')
    
    ax.axis('off')
    ax.set_title(f'Sentimen {sentiment.upper()}', fontsize=16, fontweight='bold')

plt.tight_layout()
plt.savefig('output/wordclouds/wordcloud_by_sentiment.png', dpi=300, bbox_inches='tight')
plt.show()

print('✓ Wordcloud per sentimen disimpan: output/wordclouds/wordcloud_by_sentiment.png')

In [None]:
# 3. Wordcloud Per Topik
# Kategorisasi topik berdasarkan hashtag
def categorize_topic(hashtags):
    hashtags_lower = hashtags.lower()
    if 'ai' in hashtags_lower or 'ethics' in hashtags_lower:
        return 'AI Ethics'
    elif 'blockchain' in hashtags_lower or 'crypto' in hashtags_lower:
        return 'Blockchain & Crypto'
    elif 'sustainability' in hashtags_lower or 'green' in hashtags_lower:
        return 'Sustainability'
    elif 'nft' in hashtags_lower or 'metaverse' in hashtags_lower:
        return 'NFT & Metaverse'
    elif 'privacy' in hashtags_lower or 'security' in hashtags_lower:
        return 'Privacy & Security'
    else:
        return 'Web3 General'

df_clean['topic_category'] = df_clean['hashtags'].apply(categorize_topic)

# Save categorized data
df_clean.to_csv('output/data/final_data_with_topics.csv', index=False)

print('✓ Kategorisasi topik selesai')
print('\nDistribusi Topik:')
print(df_clean['topic_category'].value_counts())

In [None]:
# Create wordcloud for each topic
topics = df_clean['topic_category'].unique()
n_topics = len(topics)
cols = 3
rows = (n_topics + cols - 1) // cols

fig, axes = plt.subplots(rows, cols, figsize=(20, 6*rows))
axes = axes.flatten() if n_topics > 1 else [axes]

for idx, topic in enumerate(topics):
    text_data = ' '.join(df_clean[df_clean['topic_category'] == topic]['processed_text'])
    
    if text_data.strip():
        wc = WordCloud(
            width=1200,
            height=600,
            background_color='white',
            colormap='plasma',
            max_words=60,
            relative_scaling=0.5,
            min_font_size=10
        ).generate(text_data)
        
        axes[idx].imshow(wc, interpolation='bilinear')
    
    axes[idx].axis('off')
    axes[idx].set_title(topic, fontsize=14, fontweight='bold')

# Hide unused subplots
for idx in range(n_topics, len(axes)):
    axes[idx].axis('off')

plt.tight_layout()
plt.savefig('output/wordclouds/wordcloud_by_topic.png', dpi=300, bbox_inches='tight')
plt.show()

print('✓ Wordcloud per topik disimpan: output/wordclouds/wordcloud_by_topic.png')

## TAHAP 6: Analisis Kritis

**Fokus Analisis:**
1. Kesadaran Web3
2. Polarisasi Opini
3. Sentimen Per Topik
4. Tren Temporal

In [None]:
# 1. Analisis Kesadaran Web3
web3_keywords = ['web3', 'blockchain', 'crypto', 'cryptocurrency', 'decentralized', 
                 'terdesentralisasi', 'nft', 'metaverse']

# Hitung frekuensi mention
keyword_mentions = {}
for keyword in web3_keywords:
    count = sum(1 for text in df_clean['processed_text'] if keyword in text.lower())
    keyword_mentions[keyword] = count

# Create DataFrame
awareness_df = pd.DataFrame(list(keyword_mentions.items()), 
                           columns=['keyword', 'mentions'])
awareness_df = awareness_df.sort_values('mentions', ascending=False)
awareness_df['percentage'] = (awareness_df['mentions'] / len(df_clean) * 100).round(2)

# Save
awareness_df.to_csv('output/data/web3_awareness.csv', index=False)

print('✓ Analisis kesadaran Web3 selesai')
print(f'✓ Hasil disimpan: output/data/web3_awareness.csv')
print('\nFrekuensi Mention Istilah Web3:')
print(awareness_df)

In [None]:
# 2. Sentimen Per Topik
sentiment_by_topic = pd.crosstab(df_clean['topic_category'], 
                                  df_clean['sentiment_label'], 
                                  normalize='index') * 100

# Save
sentiment_by_topic.to_csv('output/data/sentiment_by_topic.csv')

# Visualisasi
fig, ax = plt.subplots(figsize=(12, 6))
sentiment_by_topic.plot(kind='bar', stacked=True, ax=ax,
                       color=['#e74c3c', '#95a5a6', '#2ecc71'])
ax.set_xlabel('Topik', fontsize=12)
ax.set_ylabel('Persentase (%)', fontsize=12)
ax.set_title('Distribusi Sentimen Per Topik', fontsize=14, fontweight='bold')
ax.legend(title='Sentimen', labels=['Negatif', 'Netral', 'Positif'])
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha='right')

plt.tight_layout()
plt.savefig('output/graphs/sentiment_by_topic.png', dpi=300, bbox_inches='tight')
plt.show()

print('✓ Analisis sentimen per topik selesai')
print(f'✓ Grafik disimpan: output/graphs/sentiment_by_topic.png')
print('\nSentimen Per Topik (%):')
print(sentiment_by_topic.round(2))

In [None]:
# 3. Tren Temporal (jika ada data tanggal)
if 'date' in df_clean.columns:
    df_clean['date'] = pd.to_datetime(df_clean['date'])
    df_clean['week'] = df_clean['date'].dt.to_period('W')
    
    # Sentimen per minggu
    weekly_sentiment = df_clean.groupby(['week', 'sentiment_label']).size().unstack(fill_value=0)
    
    # Plot
    fig, ax = plt.subplots(figsize=(14, 6))
    weekly_sentiment.plot(ax=ax, marker='o', linewidth=2)
    ax.set_xlabel('Minggu', fontsize=12)
    ax.set_ylabel('Jumlah', fontsize=12)
    ax.set_title('Tren Sentimen dari Waktu ke Waktu', fontsize=14, fontweight='bold')
    ax.legend(title='Sentimen')
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('output/graphs/sentiment_trend.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    # Save
    weekly_sentiment.to_csv('output/data/weekly_sentiment_trend.csv')
    
    print('✓ Analisis tren temporal selesai')
    print(f'✓ Grafik disimpan: output/graphs/sentiment_trend.png')
else:
    print('⚠ Data tanggal tidak tersedia, skip analisis temporal')

In [None]:
# 4. Polarisasi Opini
# Hitung tingkat polarisasi per topik
polarization = df_clean.groupby('topic_category').agg({
    'sentiment_score': ['mean', 'std', 'min', 'max']
}).round(2)

polarization.columns = ['mean_score', 'std_score', 'min_score', 'max_score']
polarization['polarization_index'] = polarization['std_score']  # Std sebagai indikator polarisasi
polarization = polarization.sort_values('polarization_index', ascending=False)

# Save
polarization.to_csv('output/data/polarization_analysis.csv')

print('✓ Analisis polarisasi selesai')
print(f'✓ Hasil disimpan: output/data/polarization_analysis.csv')
print('\nIndeks Polarisasi Per Topik:')
print(polarization)

## TAHAP 7: Ringkasan dan Kesimpulan

In [None]:
# Generate Summary Report
summary_report = f"""
=================================================================
LAPORAN ANALISIS SENTIMEN TIKTOK - ISU WEB3
=================================================================

1. RINGKASAN DATA
   - Total data terkumpul: {len(df_raw)} baris
   - Data setelah preprocessing: {len(df_clean)} baris
   - Periode data: {df_clean['date'].min()} s/d {df_clean['date'].max() if 'date' in df_clean.columns else 'N/A'}

2. DISTRIBUSI SENTIMEN
{df_clean['sentiment_label'].value_counts().to_string()}

Persentase:
{(df_clean['sentiment_label'].value_counts() / len(df_clean) * 100).round(2).to_string()}

3. STATISTIK SKOR SENTIMEN
{df_clean['sentiment_score'].describe().to_string()}

4. TOP 10 KATA PALING SERING MUNCUL
{freq_df.head(10).to_string(index=False)}

5. DISTRIBUSI TOPIK
{df_clean['topic_category'].value_counts().to_string()}

6. KESADARAN WEB3
{awareness_df.to_string(index=False)}

7. SENTIMEN PER TOPIK (%)
{sentiment_by_topic.round(2).to_string()}

8. POLARISASI OPINI
{polarization.to_string()}

=================================================================
KESIMPULAN
=================================================================

Berdasarkan analisis data TikTok mengenai isu Web3:

1. Sentimen Dominan:
   - Sentimen {df_clean['sentiment_label'].value_counts().index[0]} mendominasi 
     dengan {df_clean['sentiment_label'].value_counts().values[0]} komentar 
     ({df_clean['sentiment_label'].value_counts().values[0]/len(df_clean)*100:.1f}%)

2. Topik Paling Dibahas:
   - {df_clean['topic_category'].value_counts().index[0]} 
     ({df_clean['topic_category'].value_counts().values[0]} mentions)

3. Kesadaran Web3:
   - Istilah paling sering disebut: {awareness_df.iloc[0]['keyword']} 
     ({awareness_df.iloc[0]['mentions']} kali, {awareness_df.iloc[0]['percentage']:.1f}%)

4. Polarisasi:
   - Topik paling kontroversial: {polarization.index[0]} 
     (std: {polarization.iloc[0]['std_score']})

=================================================================
FILE OUTPUT YANG DIHASILKAN
=================================================================

DATA:
  ✓ output/data/raw_data.csv
  ✓ output/data/preprocessed_data.csv
  ✓ output/data/sentiment_results.csv
  ✓ output/data/final_data_with_topics.csv
  ✓ output/data/word_frequency.csv
  ✓ output/data/tfidf_scores.csv
  ✓ output/data/web3_awareness.csv
  ✓ output/data/sentiment_by_topic.csv
  ✓ output/data/polarization_analysis.csv
  ✓ output/data/positive_lexicon.json
  ✓ output/data/negative_lexicon.json

GRAFIK:
  ✓ output/graphs/sentiment_distribution.png
  ✓ output/graphs/top_words_comparison.png
  ✓ output/graphs/sentiment_by_topic.png
  ✓ output/graphs/sentiment_trend.png (jika ada data temporal)

WORDCLOUD:
  ✓ output/wordclouds/wordcloud_overall.png
  ✓ output/wordclouds/wordcloud_by_sentiment.png
  ✓ output/wordclouds/wordcloud_by_topic.png

=================================================================
"""

# Save summary report
with open('output/SUMMARY_REPORT.txt', 'w', encoding='utf-8') as f:
    f.write(summary_report)

print(summary_report)
print('\n✓ Laporan ringkasan disimpan: output/SUMMARY_REPORT.txt')
print('\n' + '='*65)
print('ANALISIS SELESAI!')
print('='*65)

---

## Catatan Implementasi

### Untuk Scraping Data Real:

1. **TikTokApi (Unofficial)**
   ```bash
   pip install TikTokApi
   playwright install
   ```

2. **RapidAPI TikTok Scraper**
   - Daftar di https://rapidapi.com
   - Subscribe ke TikTok Scraper API
   - Gunakan API key untuk request

3. **Apify TikTok Scraper**
   - Daftar di https://apify.com
   - Gunakan TikTok Scraper actor
   - Export hasil ke CSV/JSON

### Modifikasi yang Diperlukan:

- Ganti fungsi `scrape_tiktok_data()` dengan implementasi API yang dipilih
- Sesuaikan struktur data dengan response API
- Tambahkan error handling dan rate limiting
- Implementasikan data privacy (enkripsi username)

### Referensi:

- Proposal: `tiktok_sentiment_proposal.md`
- TikTokApi Docs: https://github.com/davidteather/TikTok-Api
- WordCloud Docs: https://amueller.github.io/word_cloud/

---

**Dibuat sesuai metodologi penelitian dalam proposal**

**Semua output disimpan di folder `output/`**