## Day 7 Checkpoint 2: Scheduler & Tasks Test

<img style="float: right;" src="../img/logo.png" width="120"><br>

<div style="text-align: right"> <b>Research Curator Team</b></div>
<div style="text-align: right"> Initial issue : 2025.12.04 </div>
<div style="text-align: right"> last update : 2025.12.04 </div>

개정 이력  
- `2025.12.04` : Scheduler and scheduled tasks test

In [None]:
import sys
from pathlib import Path

# Add project root to path
project_root = Path.cwd().parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

from dotenv import load_dotenv

load_dotenv()

### 1. Scheduler Setup & Status

In [None]:
from app.scheduler.main import (
    start_scheduler,
    stop_scheduler,
    get_scheduler_status,
    trigger_job_manually,
    scheduler,
)

print("Scheduler imported successfully!")

In [None]:
# Check initial status
status = get_scheduler_status()
print(f"Scheduler running: {status['running']}")
print(f"Timezone: {status['timezone']}")
print(f"Current time: {status['current_time']}")
print(f"Jobs: {len(status['jobs'])}")

In [None]:
# Start the scheduler
start_scheduler()

# Check status again
status = get_scheduler_status()
print(f"\n✅ Scheduler started: {status['running']}")
print(f"\nScheduled Jobs:")
for job in status['jobs']:
    print(f"\n{job['name']} (ID: {job['id']})")
    print(f"  Next run: {job['next_run_time']}")
    print(f"  Trigger: {job['trigger']}")

### 2. Test Individual Tasks

Before running scheduled tasks, let's prepare test data.

In [None]:
from app.db import crud
from app.db.session import SessionLocal

db = SessionLocal()

# Create a test user with preferences
test_user = crud.create_user(
    db=db,
    email="test_scheduler@example.com",
    name="Test Scheduler User"
)

# Create preferences
crud.create_user_preference(
    db=db,
    user_id=test_user.id,
    research_fields=["Machine Learning", "Natural Language Processing"],
    keywords=["LLM", "transformer", "GPT"],
    email_time="09:00",
    daily_limit=5,
    email_enabled=False,  # Disable email for testing
)

print(f"✅ Created test user: {test_user.email}")
print(f"User ID: {test_user.id}")

db.close()

#### Task 1: Data Collection

Test the data collection task manually.

In [None]:
from app.scheduler.tasks import collect_data_task

print("Running data collection task...\n")
collect_data_task()
print("\n✅ Data collection task completed!")

In [None]:
# Check collected articles
db = SessionLocal()
articles = crud.list_articles(db, limit=10)
print(f"Total articles collected: {len(articles)}\n")

for i, article in enumerate(articles[:5], 1):
    print(f"{i}. {article.title[:70]}...")
    print(f"   Type: {article.source_type} | URL: {article.source_url[:50]}...")
    print()

db.close()

#### Task 2: Process Articles

Test the article processing task (summarization, importance scoring, etc.).

In [None]:
from app.scheduler.tasks import process_articles_task

print("Running article processing task...\n")
process_articles_task()
print("\n✅ Article processing task completed!")

In [None]:
# Check processed articles
db = SessionLocal()
articles = crud.list_articles(db, limit=5)
print(f"Processed articles:\n")

for i, article in enumerate(articles, 1):
    print(f"{i}. {article.title[:60]}...")
    print(f"   Summary: {article.summary[:80] if article.summary else 'None'}...")
    print(f"   Importance: {article.importance_score}")
    print(f"   Category: {article.category}")
    print(f"   Vector ID: {article.vector_id[:20] if article.vector_id else 'None'}...")
    print()

db.close()

#### Task 3: Send Email Digests

**Note:** This will NOT actually send emails since we set `email_enabled=False` for the test user.

In [None]:
from app.scheduler.tasks import send_digest_task

print("Running email digest task...\n")
send_digest_task()
print("\n✅ Email digest task completed!")

### 3. Test Manual Job Triggering

In [None]:
# List all jobs
status = get_scheduler_status()
print("Available jobs:\n")
for job in status['jobs']:
    print(f"- {job['id']}: {job['name']}")

In [None]:
# Manually trigger a job (uncomment to test)
# WARNING: This will actually run the task!

# success = trigger_job_manually("collect_data")
# print(f"Manual trigger success: {success}")

### 4. Test Scheduler Lifecycle

In [None]:
import time

# Check status
status = get_scheduler_status()
print(f"Scheduler running: {status['running']}")

# Stop scheduler
print("\nStopping scheduler...")
stop_scheduler()
time.sleep(1)

status = get_scheduler_status()
print(f"Scheduler running: {status['running']}")

# Start again
print("\nStarting scheduler again...")
start_scheduler()
time.sleep(1)

status = get_scheduler_status()
print(f"Scheduler running: {status['running']}")
print(f"Jobs: {len(status['jobs'])}")

### 5. Verify Vector DB Integration

In [None]:
from app.vector_db.client import QdrantManager

vector_db = QdrantManager()

# Check collection info
collection_info = vector_db.get_collection_info()
print(f"Collection: {collection_info['collection_name']}")
print(f"Vector count: {collection_info['vectors_count']}")
print(f"Points count: {collection_info['points_count']}")

In [None]:
# Test semantic search
from app.processors.embedder import generate_embedding

query = "Large Language Models and transformers"
query_embedding = generate_embedding(query)

results = vector_db.search(query_vector=query_embedding, limit=3)

print(f"Search results for: '{query}'\n")
for i, result in enumerate(results, 1):
    print(f"{i}. {result['title'][:60]}...")
    print(f"   Score: {result['score']:.4f}")
    print(f"   Category: {result.get('category', 'N/A')}")
    print()

### 6. Cleanup

In [None]:
# Stop the scheduler
stop_scheduler()
print("✅ Scheduler stopped")

In [None]:
# Clean up test data (uncomment to execute)
# db = SessionLocal()
# crud.delete_user(db, test_user.id)  # Cascade deletes preferences
# print("✅ Test user deleted")
# db.close()

### Summary

✅ **Checkpoint 2 완료!**

테스트 완료 항목:
1. Scheduler 초기화 및 상태 확인
2. 3개의 scheduled jobs 등록 확인
   - `collect_data`: 데이터 수집 (01:00 KST)
   - `process_articles`: 아티클 처리 (01:30 KST)
   - `send_digests`: 이메일 발송 (08:00 KST)
3. 개별 태스크 수동 실행 및 검증
4. Scheduler 시작/중지 라이프사이클
5. Vector DB 통합 확인
6. Manual job triggering

모든 스케줄러 기능이 정상적으로 작동합니다!