In [1]:
# 1. 필요 라이브러리 임포트
import os
import sys
sys.path.append(os.path.abspath("../src"))
import pandas as pd
import time
import altair as alt
import re
import json

# --- GROQ API 키 설정 ---
from openai import OpenAI
os.environ["GROQ_API_KEY"] = "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"

client = OpenAI(
    api_key=os.environ["GROQ_API_KEY"],
    base_url="https://api.groq.com/openai/v1"
)

In [7]:
test_response = client.chat.completions.create(
    model="llama3-70b-8192",
    messages=[{"role": "user", "content": "Hello, who are you?"}],
)
print(test_response.choices[0].message.content)

Nice to meet you! I am LLaMA, an AI assistant developed by Meta AI that can understand and respond to human input in a conversational manner. I'm not a human, but a computer program designed to simulate conversation, answer questions, and even generate text. I'm constantly learning and improving, so bear with me if I make any mistakes! How can I assist you today?


In [8]:
emotion_valence = {
    "joy": 1.00, "love": 0.95, "affection": 0.90, "gratitude": 0.88, "excitement": 0.85,
    "amusement": 0.82, "relief": 0.80, "pride": 0.78, "confidence": 0.75, "ambition": 0.73,
    "protectiveness": 0.70, "determination": 0.68, "anticipation": 0.65, "respect": 0.63,
    "curiosity": 0.60, "surprise": 0.58, "incredulity": 0.55, "authority": 0.52,
    "calm": 0.50, "neutral": 0.50, "neutrality": 0.50, "seriousness": 0.48,
    "caution": 0.45, "concern": 0.42, "nostalgia": 0.40, "awe": 0.38,
}

In [9]:
def parse_srt_file(filepath):
    import srt
    with open(filepath, 'r', encoding='utf-8') as f:
        srt_text = f.read()
    subtitles = list(srt.parse(srt_text))
    return [
        {
            "id": i + 1,
            "start": sub.start.total_seconds(),
            "end": sub.end.total_seconds(),
            "text": sub.content.strip()
        }
        for i, sub in enumerate(subtitles)
    ]

In [10]:
def analyze_subtitle(text, emotion_valence, model="llama3-70b-8192"):
    emotion_labels = list(emotion_valence.keys())
    prompt = f"""
    You are an assistant analyzing movie dialogue.
    You MUST select zero or more emotions from the following list ONLY:
    {emotion_labels}
    Return only a valid JSON in this format:
    {{
    "emotions": [ ... ], 
    "situation": "short summary", 
    "situation_type": "category"
    }}
    Text: "{text}"
    """
    
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7
        )
        response_text = response.choices[0].message.content

        match = re.search(r"\{.*\}", response_text, re.DOTALL)
        if match:
            return json.loads(match.group())

    except Exception as e:
        print(f"[!] API 호출 또는 JSON 파싱 실패: {e}\n→ 입력 텍스트: {text}")
    
    return {"emotions": [], "situation": "unknown", "situation_type": "unknown"}

In [13]:
SAMPLE_START = 9657  
SAMPLE_END = 14207   
# 1. 자막 로드
srt_path = "../data/raw/A.Brighter.Summer.Day.srt"
subs = parse_srt_file(srt_path)

subs = [sub for sub in subs if (sub["start"] >= SAMPLE_START) and (sub["start"] <= SAMPLE_END)]

# 2. 감정/상황 태깅
labeled = []
for i, sub in enumerate(subs):
    print(f"[{i+1}/{len(subs)}] '{sub['text'][:40]}...' 분석 중")  # 진행상황 보임
    res = analyze_subtitle(sub["text"], emotion_valence)
    sub.update(res)
    labeled.append(sub)
    time.sleep(2.1)

subs_df = pd.DataFrame(labeled)
subs_df.to_json("../data/output/brighter_llm_srt.json", force_ascii=False, indent=2)
subs_df.to_csv("../data/output/brighter_llm_srt.csv", index=False)
print("자막 감정 분석 저장 완료!")

[1/738] '<i>You grow so fast.
Hold your breath.</...' 분석 중
[2/738] '<i>Remind me to let it out
for you tonig...' 분석 중
[3/738] '<i>What happened with your old man?</i>...' 분석 중
[4/738] '<i>What do you mean?</i>...' 분석 중
[5/738] '<i>That night after the fight, did he no...' 분석 중
[6/738] 'No, he'd gone to a movie
with my mom....' 분석 중
[7/738] 'Wow, your sister's really cool!...' 분석 중
[8/738] 'Thanks....' 분석 중
[9/738] 'Move....' 분석 중
[10/738] 'I found that picture
with the knife....' 분석 중
[11/738] 'She must have used it
in a love suicide....' 분석 중
[12/738] 'Quit staring at it.
Take it if you want....' 분석 중
[13/738] 'Hey, I talked to Horsecart!
- What for?...' 분석 중
[14/738] 'I told him I'm your buddy.
He'll help us...' 분석 중
[15/738] 'Stand up straight....' 분석 중
[16/738] 'Don't make me repeat myself....' 분석 중
[17/738] 'You got that?...' 분석 중
[18/738] 'There's no one here....' 분석 중
[19/738] 'I knew you'd be here....' 분석 중
[20/738] 'What you told me last time......' 분석 중
[21/738] 'did you real

In [None]:
SAMPLE_START = 9657  
SAMPLE_END = 14207   
# 1. 자막 로드
srt_path = "../data/raw/A.Brighter.Summer.Day.srt"
subs = parse_srt_file(srt_path)

subs = [sub for sub in subs if (sub["start"] >= SAMPLE_START) and (sub["start"] <= SAMPLE_END)]

# 2. 감정/상황 태깅
labeled = []
for i, sub in enumerate(subs):
    print(f"[{i+1}/{len(subs)}] '{sub['text'][:40]}...' 분석 중")  # 진행상황 보임
    res = analyze_subtitle(sub["text"], emotion_valence)
    sub.update(res)
    labeled.append(sub)
    time.sleep(2.1)

subs_df = pd.DataFrame(labeled)
subs_df.to_json("../data/output/brighter_llm_srt.json", force_ascii=False, indent=2)
subs_df.to_csv("../data/output/brighter_llm_srt.csv", index=False)
print("자막 감정 분석 저장 완료!")

[1/738] '<i>You grow so fast.
Hold your breath.</...' 분석 중
[2/738] '<i>Remind me to let it out
for you tonig...' 분석 중
[3/738] '<i>What happened with your old man?</i>...' 분석 중
[4/738] '<i>What do you mean?</i>...' 분석 중
[5/738] '<i>That night after the fight, did he no...' 분석 중
[6/738] 'No, he'd gone to a movie
with my mom....' 분석 중
[7/738] 'Wow, your sister's really cool!...' 분석 중
[8/738] 'Thanks....' 분석 중
[9/738] 'Move....' 분석 중
[10/738] 'I found that picture
with the knife....' 분석 중
[11/738] 'She must have used it
in a love suicide....' 분석 중
[12/738] 'Quit staring at it.
Take it if you want....' 분석 중
[13/738] 'Hey, I talked to Horsecart!
- What for?...' 분석 중
[14/738] 'I told him I'm your buddy.
He'll help us...' 분석 중
[15/738] 'Stand up straight....' 분석 중
[16/738] 'Don't make me repeat myself....' 분석 중
[17/738] 'You got that?...' 분석 중
[18/738] 'There's no one here....' 분석 중
[19/738] 'I knew you'd be here....' 분석 중
[20/738] 'What you told me last time......' 분석 중
[21/738] 'did you real

In [None]:
SAMPLE_START = 9657  
SAMPLE_END = 14207   
# 1. 자막 로드
srt_path = "../data/raw/A.Brighter.Summer.Day.srt"
subs = parse_srt_file(srt_path)

subs = [sub for sub in subs if (sub["start"] >= SAMPLE_START) and (sub["start"] <= SAMPLE_END)]

# 2. 감정/상황 태깅
labeled = []
for i, sub in enumerate(subs):
    print(f"[{i+1}/{len(subs)}] '{sub['text'][:40]}...' 분석 중")  # 진행상황 보임
    res = analyze_subtitle(sub["text"], emotion_valence)
    sub.update(res)
    labeled.append(sub)
    time.sleep(2.1)

subs_df = pd.DataFrame(labeled)
subs_df.to_json("../data/output/brighter_llm_srt.json", force_ascii=False, indent=2)
subs_df.to_csv("../data/output/brighter_llm_srt.csv", index=False)
print("자막 감정 분석 저장 완료!")

[1/738] '<i>You grow so fast.
Hold your breath.</...' 분석 중
[2/738] '<i>Remind me to let it out
for you tonig...' 분석 중
[3/738] '<i>What happened with your old man?</i>...' 분석 중
[4/738] '<i>What do you mean?</i>...' 분석 중
[5/738] '<i>That night after the fight, did he no...' 분석 중
[6/738] 'No, he'd gone to a movie
with my mom....' 분석 중
[7/738] 'Wow, your sister's really cool!...' 분석 중
[8/738] 'Thanks....' 분석 중
[9/738] 'Move....' 분석 중
[10/738] 'I found that picture
with the knife....' 분석 중
[11/738] 'She must have used it
in a love suicide....' 분석 중
[12/738] 'Quit staring at it.
Take it if you want....' 분석 중
[13/738] 'Hey, I talked to Horsecart!
- What for?...' 분석 중
[14/738] 'I told him I'm your buddy.
He'll help us...' 분석 중
[15/738] 'Stand up straight....' 분석 중
[16/738] 'Don't make me repeat myself....' 분석 중
[17/738] 'You got that?...' 분석 중
[18/738] 'There's no one here....' 분석 중
[19/738] 'I knew you'd be here....' 분석 중
[20/738] 'What you told me last time......' 분석 중
[21/738] 'did you real

In [None]:
SAMPLE_START = 9657  
SAMPLE_END = 14207   
# 1. 자막 로드
srt_path = "../data/raw/A.Brighter.Summer.Day.srt"
subs = parse_srt_file(srt_path)

subs = [sub for sub in subs if (sub["start"] >= SAMPLE_START) and (sub["start"] <= SAMPLE_END)]

# 2. 감정/상황 태깅
labeled = []
for i, sub in enumerate(subs):
    print(f"[{i+1}/{len(subs)}] '{sub['text'][:40]}...' 분석 중")  # 진행상황 보임
    res = analyze_subtitle(sub["text"], emotion_valence)
    sub.update(res)
    labeled.append(sub)
    time.sleep(2.1)

subs_df = pd.DataFrame(labeled)
subs_df.to_json("../data/output/brighter_llm_srt.json", force_ascii=False, indent=2)
subs_df.to_csv("../data/output/brighter_llm_srt.csv", index=False)
print("자막 감정 분석 저장 완료!")

[1/738] '<i>You grow so fast.
Hold your breath.</...' 분석 중
[2/738] '<i>Remind me to let it out
for you tonig...' 분석 중
[3/738] '<i>What happened with your old man?</i>...' 분석 중
[4/738] '<i>What do you mean?</i>...' 분석 중
[5/738] '<i>That night after the fight, did he no...' 분석 중
[6/738] 'No, he'd gone to a movie
with my mom....' 분석 중
[7/738] 'Wow, your sister's really cool!...' 분석 중
[8/738] 'Thanks....' 분석 중
[9/738] 'Move....' 분석 중
[10/738] 'I found that picture
with the knife....' 분석 중
[11/738] 'She must have used it
in a love suicide....' 분석 중
[12/738] 'Quit staring at it.
Take it if you want....' 분석 중
[13/738] 'Hey, I talked to Horsecart!
- What for?...' 분석 중
[14/738] 'I told him I'm your buddy.
He'll help us...' 분석 중
[15/738] 'Stand up straight....' 분석 중
[16/738] 'Don't make me repeat myself....' 분석 중
[17/738] 'You got that?...' 분석 중
[18/738] 'There's no one here....' 분석 중
[19/738] 'I knew you'd be here....' 분석 중
[20/738] 'What you told me last time......' 분석 중
[21/738] 'did you real

In [None]:
SAMPLE_START = 9657  
SAMPLE_END = 14207   
# 1. 자막 로드
srt_path = "../data/raw/A.Brighter.Summer.Day.srt"
subs = parse_srt_file(srt_path)

subs = [sub for sub in subs if (sub["start"] >= SAMPLE_START) and (sub["start"] <= SAMPLE_END)]

# 2. 감정/상황 태깅
labeled = []
for i, sub in enumerate(subs):
    print(f"[{i+1}/{len(subs)}] '{sub['text'][:40]}...' 분석 중")  # 진행상황 보임
    res = analyze_subtitle(sub["text"], emotion_valence)
    sub.update(res)
    labeled.append(sub)
    time.sleep(2.1)

subs_df = pd.DataFrame(labeled)
subs_df.to_json("../data/output/brighter_llm_srt.json", force_ascii=False, indent=2)
subs_df.to_csv("../data/output/brighter_llm_srt.csv", index=False)
print("자막 감정 분석 저장 완료!")

[1/738] '<i>You grow so fast.
Hold your breath.</...' 분석 중
[2/738] '<i>Remind me to let it out
for you tonig...' 분석 중
[3/738] '<i>What happened with your old man?</i>...' 분석 중
[4/738] '<i>What do you mean?</i>...' 분석 중
[5/738] '<i>That night after the fight, did he no...' 분석 중
[6/738] 'No, he'd gone to a movie
with my mom....' 분석 중
[7/738] 'Wow, your sister's really cool!...' 분석 중
[8/738] 'Thanks....' 분석 중
[9/738] 'Move....' 분석 중
[10/738] 'I found that picture
with the knife....' 분석 중
[11/738] 'She must have used it
in a love suicide....' 분석 중
[12/738] 'Quit staring at it.
Take it if you want....' 분석 중
[13/738] 'Hey, I talked to Horsecart!
- What for?...' 분석 중
[14/738] 'I told him I'm your buddy.
He'll help us...' 분석 중
[15/738] 'Stand up straight....' 분석 중
[16/738] 'Don't make me repeat myself....' 분석 중
[17/738] 'You got that?...' 분석 중
[18/738] 'There's no one here....' 분석 중
[19/738] 'I knew you'd be here....' 분석 중
[20/738] 'What you told me last time......' 분석 중
[21/738] 'did you real

In [None]:
SAMPLE_START = 9657  
SAMPLE_END = 14207   
# 1. 자막 로드
srt_path = "../data/raw/A.Brighter.Summer.Day.srt"
subs = parse_srt_file(srt_path)

subs = [sub for sub in subs if (sub["start"] >= SAMPLE_START) and (sub["start"] <= SAMPLE_END)]

# 2. 감정/상황 태깅
labeled = []
for i, sub in enumerate(subs):
    print(f"[{i+1}/{len(subs)}] '{sub['text'][:40]}...' 분석 중")  # 진행상황 보임
    res = analyze_subtitle(sub["text"], emotion_valence)
    sub.update(res)
    labeled.append(sub)
    time.sleep(2.1)

subs_df = pd.DataFrame(labeled)
subs_df.to_json("../data/output/brighter_llm_srt.json", force_ascii=False, indent=2)
subs_df.to_csv("../data/output/brighter_llm_srt.csv", index=False)
print("자막 감정 분석 저장 완료!")

[1/738] '<i>You grow so fast.
Hold your breath.</...' 분석 중
[2/738] '<i>Remind me to let it out
for you tonig...' 분석 중
[3/738] '<i>What happened with your old man?</i>...' 분석 중
[4/738] '<i>What do you mean?</i>...' 분석 중
[5/738] '<i>That night after the fight, did he no...' 분석 중
[6/738] 'No, he'd gone to a movie
with my mom....' 분석 중
[7/738] 'Wow, your sister's really cool!...' 분석 중
[8/738] 'Thanks....' 분석 중
[9/738] 'Move....' 분석 중
[10/738] 'I found that picture
with the knife....' 분석 중
[11/738] 'She must have used it
in a love suicide....' 분석 중
[12/738] 'Quit staring at it.
Take it if you want....' 분석 중
[13/738] 'Hey, I talked to Horsecart!
- What for?...' 분석 중
[14/738] 'I told him I'm your buddy.
He'll help us...' 분석 중
[15/738] 'Stand up straight....' 분석 중
[16/738] 'Don't make me repeat myself....' 분석 중
[17/738] 'You got that?...' 분석 중
[18/738] 'There's no one here....' 분석 중
[19/738] 'I knew you'd be here....' 분석 중
[20/738] 'What you told me last time......' 분석 중
[21/738] 'did you real

In [None]:
SAMPLE_START = 9657  
SAMPLE_END = 14207   
# 1. 자막 로드
srt_path = "../data/raw/A.Brighter.Summer.Day.srt"
subs = parse_srt_file(srt_path)

subs = [sub for sub in subs if (sub["start"] >= SAMPLE_START) and (sub["start"] <= SAMPLE_END)]

# 2. 감정/상황 태깅
labeled = []
for i, sub in enumerate(subs):
    print(f"[{i+1}/{len(subs)}] '{sub['text'][:40]}...' 분석 중")  # 진행상황 보임
    res = analyze_subtitle(sub["text"], emotion_valence)
    sub.update(res)
    labeled.append(sub)
    time.sleep(2.1)

subs_df = pd.DataFrame(labeled)
subs_df.to_json("../data/output/brighter_llm_srt.json", force_ascii=False, indent=2)
subs_df.to_csv("../data/output/brighter_llm_srt.csv", index=False)
print("자막 감정 분석 저장 완료!")

[1/738] '<i>You grow so fast.
Hold your breath.</...' 분석 중
[2/738] '<i>Remind me to let it out
for you tonig...' 분석 중
[3/738] '<i>What happened with your old man?</i>...' 분석 중
[4/738] '<i>What do you mean?</i>...' 분석 중
[5/738] '<i>That night after the fight, did he no...' 분석 중
[6/738] 'No, he'd gone to a movie
with my mom....' 분석 중
[7/738] 'Wow, your sister's really cool!...' 분석 중
[8/738] 'Thanks....' 분석 중
[9/738] 'Move....' 분석 중
[10/738] 'I found that picture
with the knife....' 분석 중
[11/738] 'She must have used it
in a love suicide....' 분석 중
[12/738] 'Quit staring at it.
Take it if you want....' 분석 중
[13/738] 'Hey, I talked to Horsecart!
- What for?...' 분석 중
[14/738] 'I told him I'm your buddy.
He'll help us...' 분석 중
[15/738] 'Stand up straight....' 분석 중
[16/738] 'Don't make me repeat myself....' 분석 중
[17/738] 'You got that?...' 분석 중
[18/738] 'There's no one here....' 분석 중
[19/738] 'I knew you'd be here....' 분석 중
[20/738] 'What you told me last time......' 분석 중
[21/738] 'did you real

In [None]:
SAMPLE_START = 9657  
SAMPLE_END = 14207   
# 1. 자막 로드
srt_path = "../data/raw/A.Brighter.Summer.Day.srt"
subs = parse_srt_file(srt_path)

subs = [sub for sub in subs if (sub["start"] >= SAMPLE_START) and (sub["start"] <= SAMPLE_END)]

# 2. 감정/상황 태깅
labeled = []
for i, sub in enumerate(subs):
    print(f"[{i+1}/{len(subs)}] '{sub['text'][:40]}...' 분석 중")  # 진행상황 보임
    res = analyze_subtitle(sub["text"], emotion_valence)
    sub.update(res)
    labeled.append(sub)
    time.sleep(2.1)

subs_df = pd.DataFrame(labeled)
subs_df.to_json("../data/output/brighter_llm_srt.json", force_ascii=False, indent=2)
subs_df.to_csv("../data/output/brighter_llm_srt.csv", index=False)
print("자막 감정 분석 저장 완료!")

[1/738] '<i>You grow so fast.
Hold your breath.</...' 분석 중
[2/738] '<i>Remind me to let it out
for you tonig...' 분석 중
[3/738] '<i>What happened with your old man?</i>...' 분석 중
[4/738] '<i>What do you mean?</i>...' 분석 중
[5/738] '<i>That night after the fight, did he no...' 분석 중
[6/738] 'No, he'd gone to a movie
with my mom....' 분석 중
[7/738] 'Wow, your sister's really cool!...' 분석 중
[8/738] 'Thanks....' 분석 중
[9/738] 'Move....' 분석 중
[10/738] 'I found that picture
with the knife....' 분석 중
[11/738] 'She must have used it
in a love suicide....' 분석 중
[12/738] 'Quit staring at it.
Take it if you want....' 분석 중
[13/738] 'Hey, I talked to Horsecart!
- What for?...' 분석 중
[14/738] 'I told him I'm your buddy.
He'll help us...' 분석 중
[15/738] 'Stand up straight....' 분석 중
[16/738] 'Don't make me repeat myself....' 분석 중
[17/738] 'You got that?...' 분석 중
[18/738] 'There's no one here....' 분석 중
[19/738] 'I knew you'd be here....' 분석 중
[20/738] 'What you told me last time......' 분석 중
[21/738] 'did you real

In [8]:
def calc_valence(emotions):
    if not emotions:
        return 0.5
    scores = [emotion_valence.get(e.lower(), 0.5) for e in emotions]
    return sum(scores) / len(scores)

subs_df["valence"] = subs_df["emotions"].apply(calc_valence)

In [9]:
chart = alt.Chart(subs_df).mark_circle(size=70, opacity=0.8).encode(
    x=alt.X('start', title='Time (s)'),
    y=alt.Y('valence', title='Emotion Valence', scale=alt.Scale(domain=[0,1])),
    color=alt.Color('valence:Q', scale=alt.Scale(scheme='turbo')),
    tooltip=['id','text','emotions','valence','situation','situation_type']
).properties(width=850, height=350, title="Subtitle Emotion Timeline (No Scene)")

chart.interactive().show()