# linkinthe.video - Open Source Pipeline

**Modeller:**
- Whisper - Transcription (fallback)
- Mistral 7B - √úr√ºn √ßƒ±karma
- LLaVA 1.6 - Frame analizi

**Akƒ±≈ü:**
1. Transcript al (YouTube altyazƒ±sƒ± veya Whisper)
2. LLM ile √ºr√ºn √ßƒ±kar ‚Üí found[] + lost[]
3. Lost varsa ‚Üí Video indir ‚Üí Frame √ßƒ±kar ‚Üí Vision ile tanƒ±

**Lazy Evaluation:** Video sadece gerektiƒüinde indirilir.

---

<cell_type>markdown</cell_type>## 0. Kurulum

**√ñNEMLƒ∞:** Runtime ‚Üí Change runtime type ‚Üí T4 GPU se√ß!

In [None]:
# GPU kontrol
!nvidia-smi

import torch
if torch.cuda.is_available():
    print(f"\n‚úÖ GPU: {torch.cuda.get_device_name(0)}")
else:
    print("‚ùå GPU YOK! Runtime ‚Üí Change runtime type ‚Üí T4 GPU se√ß!")

In [None]:
# Paketleri kur
!pip install -q youtube-transcript-api openai-whisper
!pip install -q transformers accelerate bitsandbytes sentencepiece protobuf

print("‚úÖ Kurulum tamamlandƒ±!")

In [None]:
import os
import json
import time
import requests

# √áalƒ±≈üma klas√∂r√º
WORK_DIR = "/content/linkinthe_test"
os.makedirs(WORK_DIR, exist_ok=True)

# Test videosu
VIDEO_URL = "https://www.youtube.com/watch?v=WlgjElhWD-U"
VIDEO_ID = VIDEO_URL.split("v=")[-1].split("&")[0]

# State
local_video_path = None
transcript_segments = None  # timestamp'li
transcript_text = None

print(f"Video ID: {VIDEO_ID}")
print(f"Video URL: {VIDEO_URL}")

<cell_type>markdown</cell_type>---
## 1. Transcript Al

In [None]:
from youtube_transcript_api import YouTubeTranscriptApi

# √ñnce YouTube altyazƒ±sƒ±nƒ± dene
try:
    transcript_segments = YouTubeTranscriptApi.get_transcript(VIDEO_ID, languages=['tr', 'en'])
    
    # Timestamp'li format: "[0:35] text"
    def format_timestamp(seconds):
        m, s = divmod(int(seconds), 60)
        return f"{m}:{s:02d}"
    
    transcript_with_ts = "\n".join([
        f"[{format_timestamp(s['start'])}] {s['text']}" 
        for s in transcript_segments
    ])
    transcript_text = " ".join([s['text'] for s in transcript_segments])
    
    print(f"‚úÖ YouTube altyazƒ±sƒ± bulundu! ({len(transcript_segments)} segment)")
    print("\n" + "="*60)
    print(transcript_with_ts[:2000])
    print("="*60)
except Exception as e:
    print(f"‚ùå YouTube altyazƒ±sƒ± yok: {e}")
    print("‚Üí Video y√ºkle ve Whisper kullan")
    transcript_segments = None
    transcript_with_ts = None
    transcript_text = None

<cell_type>markdown</cell_type>---
## 2. Video (sadece altyazƒ± yoksa)

YouTube altyazƒ±sƒ± varsa bu adƒ±mƒ± **ATLA**.

In [None]:
# Video dosyasƒ± - Drive'a koy veya Colab'a y√ºkle
local_video_path = f"{WORK_DIR}/sample.webm"

if os.path.exists(local_video_path):
    size_mb = os.path.getsize(local_video_path) / (1024 * 1024)
    print(f"‚úÖ Video mevcut: {local_video_path} ({size_mb:.1f} MB)")
else:
    print(f"‚ùå Video bulunamadƒ±: {local_video_path}")
    print("   Drive'dan kopyala veya y√ºkle")

In [None]:
# Transcript yoksa ‚Üí indir + Whisper
if not transcript_text:
    local_video_path = download_video(VIDEO_URL, VIDEO_ID)
    
    if local_video_path:
        # Ses √ßƒ±kar
        audio_path = f"{WORK_DIR}/audio.wav"
        !ffmpeg -i "{local_video_path}" -vn -acodec pcm_s16le -ar 16000 -ac 1 "{audio_path}" -y -loglevel error
        print(f"‚úÖ Ses √ßƒ±karƒ±ldƒ±")
        
        # Whisper
        import whisper
        print("Whisper y√ºkleniyor...")
        model = whisper.load_model("medium")
        
        print("Transkript alƒ±nƒ±yor...")
        result = model.transcribe(audio_path)
        
        transcript_text = result['text']
        transcript_segments = result['segments']
        print(f"‚úÖ Whisper tamamlandƒ±! ({len(transcript_text)} karakter)")
        
        del model
        torch.cuda.empty_cache()
else:
    print("‚è≠Ô∏è YouTube altyazƒ±sƒ± var, video indirmeye gerek yok.")

<cell_type>markdown</cell_type>---
## 3. LLM ile √úr√ºn √áƒ±kar (found/lost)

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# Mistral 7B - iyi instruction following
MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.3"

print("Mistral 7B y√ºkleniyor...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
llm_model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto",
)
print("‚úÖ Mistral y√ºklendi!")

In [None]:
def ask_llm(prompt, max_tokens=1500):
    """Mistral ile soru sor."""
    # Mistral system role desteklemiyor, sadece user
    messages = [
        {"role": "user", "content": prompt}
    ]
    input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(llm_model.device)
    outputs = llm_model.generate(input_ids, max_new_tokens=max_tokens, do_sample=True, temperature=0.1, pad_token_id=tokenizer.eos_token_id)
    return tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)

# √úr√ºn √ßƒ±karma prompt'u - sƒ±kƒ± kurallar
PRODUCT_PROMPT = f"""Bu bir YouTube videosunun transkripti. Videoda bahsedilen √ºr√ºnleri bul.

== FOUND (SADECE marka/model varsa) ==
FOUND'a koymak i√ßin MARKA ≈ûART:
‚úì "Logitech G Pro X" ‚Üí FOUND (marka + model var)
‚úì "Nintendo Switch" ‚Üí FOUND (marka + √ºr√ºn adƒ± var)
‚úì "iPad 8th generation" ‚Üí FOUND (marka + nesil var)
‚úì "Funko Pop Yoda" ‚Üí FOUND (marka + karakter var)

== LOST (marka/model YOK) ==
Marka yoksa LOST'a koy, istisnasƒ±z:
‚úó "gaming mouse" ‚Üí LOST (hangi marka?)
‚úó "tripod" ‚Üí LOST (hangi tripod?)
‚úó "headset" ‚Üí LOST (marka yok)
‚úó "wireless earbuds" ‚Üí LOST (model yok)
‚úó "stress ball" ‚Üí LOST (marka yok)
‚úó "water bottle" ‚Üí LOST (marka yok)

== Hƒ∞√á EKLEME ==
- Fiyatlar: $100, "150 dollar item"
- Yerler: Japan, Tokyo, Amazon
- Kutular: box, mystery box, returns box
- Soyut kavramlar: disadvantage, stumbling block
- Servisler: YouTube, Underdog
- ƒ∞nsanlar: isimler, YouTuber'lar

KURAL: Marka yoksa ‚Üí LOST. ƒ∞stisna yok.

JSON formatƒ±:
{{"found": [{{"name": "...", "category": "...", "timestamp": "..."}}], "lost": [...]}}

Transkript:
---
{transcript_with_ts[:6000] if transcript_with_ts else transcript_text[:6000]}
---

SADECE JSON d√∂n."""

print("√úr√ºnler √ßƒ±karƒ±lƒ±yor...")
start = time.time()
llm_response = ask_llm(PRODUCT_PROMPT)
print(f"‚úÖ Tamamlandƒ±! ({time.time()-start:.1f} sn)")
print(llm_response)

In [None]:
# JSON parse et
def parse_llm_json(text):
    clean = text
    if "```json" in clean:
        clean = clean.split("```json")[1].split("```")[0]
    elif "```" in clean:
        clean = clean.split("```")[1].split("```")[0]
    start_idx = clean.find("{")
    end_idx = clean.rfind("}") + 1
    if start_idx != -1 and end_idx > start_idx:
        clean = clean[start_idx:end_idx]
    return json.loads(clean)

try:
    data = parse_llm_json(llm_response)
    found = data.get("found", [])
    lost = data.get("lost", [])
    
    print(f"‚úÖ FOUND ({len(found)} √ºr√ºn):")
    for p in found:
        print(f"   üü¢ {p['name']} ({p.get('category', '?')}) @ {p.get('timestamp', '?')}s")
    
    print(f"\n‚ö†Ô∏è LOST ({len(lost)} √ºr√ºn) - Vision ile √ß√∂z√ºlecek:")
    for p in lost:
        print(f"   üü° {p['name']} ({p.get('category', '?')}) @ {p.get('timestamp', '?')}s")
except Exception as e:
    print(f"‚ùå Parse hatasƒ±: {e}")
    found, lost = [], []

# LLM'i temizle
del llm_model, tokenizer
torch.cuda.empty_cache()
print("\n‚úÖ Mistral bellekten silindi.")

In [None]:
<cell_type>markdown</cell_type>---
## 4. Lost √úr√ºnleri Vision ile √á√∂z

Video sadece lost varsa indirilir.

In [None]:
if not lost:
    print("‚úÖ Lost yok, Vision'a gerek yok!")
    vision_model = None
else:
    # Video indir (hen√ºz indirilmediyse)
    if not local_video_path:
        print("Lost var, video indiriliyor...")
        local_video_path = download_video(VIDEO_URL, VIDEO_ID)
    
    if local_video_path:
        # LLaVA y√ºkle
        from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
        from PIL import Image as PILImage
        
        print("LLaVA y√ºkleniyor...")
        VISION_MODEL = "llava-hf/llava-v1.6-mistral-7b-hf"
        vision_processor = LlavaNextProcessor.from_pretrained(VISION_MODEL)
        vision_model = LlavaNextForConditionalGeneration.from_pretrained(
            VISION_MODEL,
            torch_dtype=torch.float16,
            device_map="auto",
            load_in_4bit=True,
        )
        print("‚úÖ LLaVA y√ºklendi!")
    else:
        print("‚ùå Video indirilemedi, Vision atlanƒ±yor")
        vision_model = None

In [None]:
def extract_frame(video_path, timestamp_sec):
    """Video'dan belirli saniyede frame √ßƒ±kar."""
    frame_path = f"{WORK_DIR}/frame_{timestamp_sec}.jpg"
    !ffmpeg -ss {timestamp_sec} -i "{video_path}" -vframes 1 -q:v 2 "{frame_path}" -y -loglevel error
    return frame_path if os.path.exists(frame_path) else None

def analyze_frame(image_path, product_hint):
    """LLaVA ile frame analiz et."""
    image = PILImage.open(image_path)
    prompt = f"Bu frame'de '{product_hint}' olarak bahsedilen √ºr√ºn√º tanƒ±yabilir misin? Marka ve model adƒ±nƒ± s√∂yle. Tanƒ±yamƒ±yorsan 'UNKNOWN' de."
    
    conversation = [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": prompt}]}]
    formatted = vision_processor.apply_chat_template(conversation, add_generation_prompt=True)
    inputs = vision_processor(images=image, text=formatted, return_tensors="pt").to(vision_model.device)
    output = vision_model.generate(**inputs, max_new_tokens=100, do_sample=False)
    result = vision_processor.decode(output[0], skip_special_tokens=True)
    if "[/INST]" in result:
        result = result.split("[/INST]")[-1].strip()
    return result

# Lost √ºr√ºnleri √ß√∂z
if lost and vision_model:
    print(f"\n{len(lost)} lost √ºr√ºn Vision ile analiz ediliyor...\n")
    resolved = []
    
    for product in lost:
        ts = product.get("timestamp", 0)
        name = product.get("name", "√ºr√ºn")
        print(f"üîç '{name}' @ {ts}s ...", end=" ")
        
        frame_path = extract_frame(local_video_path, ts)
        if frame_path:
            guess = analyze_frame(frame_path, name)
            print(f"‚Üí {guess[:50]}")
            
            if guess and "UNKNOWN" not in guess.upper():
                product["name"] = guess.split("\n")[0][:100]  # ƒ∞lk satƒ±r, max 100 char
                product["resolved_by"] = "vision"
                found.append(product)
                resolved.append(product)
    
    # Resolved olanlarƒ± lost'tan √ßƒ±kar
    for r in resolved:
        if r in lost:
            lost.remove(r)
    
    print(f"\n‚úÖ {len(resolved)} √ºr√ºn Vision ile √ß√∂z√ºld√º!")

In [None]:
<cell_type>markdown</cell_type>---
## 5. Sonu√ßlar

In [ ]:
print("="*60)
print("SONU√áLAR")
print("="*60)

print(f"\n‚úÖ FOUND ({len(found)} √ºr√ºn):")
for i, p in enumerate(found, 1):
    resolved = " [Vision]" if p.get("resolved_by") == "vision" else ""
    print(f"   {i}. {p['name']} ({p.get('category', '?')}){resolved}")

print(f"\n‚ùå LOST ({len(lost)} √ºr√ºn) - Tanƒ±mlanamadƒ±:")
for i, p in enumerate(lost, 1):
    print(f"   {i}. {p['name']} ({p.get('category', '?')})")

print("\n" + "="*60)
print(f"Video indirildi mi: {'Evet' if local_video_path else 'Hayƒ±r'}")
print(f"Transcript kaynaƒüƒ±: {'YouTube' if not local_video_path else 'Whisper'}")
print("="*60)

---
## 6. LLaVA ile Frame Analizi

LLaVA - Open source vision-language model.

In [None]:
def analyze_frame_with_llava(image_path, prompt):
    """LLaVA ile frame analiz et."""
    image = PILImage.open(image_path)
    
    conversation = [
        {
            "role": "user",
            "content": [
                {"type": "image"},
                {"type": "text", "text": prompt},
            ],
        },
    ]
    
    formatted = vision_processor.apply_chat_template(
        conversation, add_generation_prompt=True
    )
    
    inputs = vision_processor(
        images=image,
        text=formatted,
        return_tensors="pt"
    ).to(vision_model.device)
    
    output = vision_model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
    )
    
    result = vision_processor.decode(output[0], skip_special_tokens=True)
    # Sadece cevabƒ± al
    if "[/INST]" in result:
        result = result.split("[/INST]")[-1].strip()
    return result

# Test
if frames:
    print("Test frame analizi...")
    display(Image(filename=frames[0], width=300))
    
    test_result = analyze_frame_with_llava(
        frames[0],
        "What products or devices do you see in this image? List them briefly."
    )
    print(f"\nSonu√ß: {test_result}")

---
## 7. Sonu√ßlarƒ± Birle≈ütir

---
## 8. Performans ve Maliyet √ñzeti

---
## Notlar ve Deƒüerlendirme

### Kalite Deƒüerlendirmesi

| Model | Kalite (/10) | Notlar |
|-------|--------------|--------|
| Whisper | /10 | |
| Llama 3.1 | /10 | |
| LLaVA | /10 | |

### Bulunan √úr√ºnler Doƒüru mu?
- [ ] Evet, √ßoƒüu doƒüru
- [ ] Yarƒ± yarƒ±ya
- [ ] √áoƒüu yanlƒ±≈ü

### Ka√ßan √úr√ºnler:
- ...

### Yanlƒ±≈ü Tespitler:
- ...

### API vs Open Source:
- [ ] Open source yeterli
- [ ] API daha iyi ama open source kabul edilebilir
- [ ] API ≈üart, open source yetersiz

### Sonraki Adƒ±mlar:
- ...