# News-Watch API Reference

This notebook demonstrates all the key functions in the news-watch Python API with practical examples.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/okkymabruri/news-watch/blob/main/notebook/api-reference.ipynb)

## Installation

First, install news-watch and its dependencies:

In [None]:
# Install news-watch
!pip install news-watch
!playwright install chromium

## Import and Setup

In [None]:
import newswatch as nw
import pandas as pd
from datetime import datetime, timedelta

print("News-watch API Reference")
print("=" * 40)

## 1. Basic Functions

### 1.1 list_scrapers() - Get Available News Sources

In [None]:
# Get list of all available news sources
available_scrapers = nw.list_scrapers()
print("Available news sources:")
for i, scraper in enumerate(available_scrapers, 1):
    print(f"  {i:2d}. {scraper}")

print(f"\nTotal: {len(available_scrapers)} news sources")

### 1.2 scrape() - Basic Article Scraping

In [None]:
# Basic scraping - returns list of article dictionaries
articles = nw.scrape(
    keywords="ekonomi",
    start_date="2025-01-15",
    scrapers="kompas",  # Use single reliable source for demo
    verbose=True,
)

print(f"Found {len(articles)} articles")

# Show structure of first article
if articles:
    print("\nFirst article structure:")
    sample_article = articles[0]
    for key, value in sample_article.items():
        print(f"  {key}: {str(value)[:60]}{'...' if len(str(value)) > 60 else ''}")

### 1.3 scrape_to_dataframe() - Get Results as pandas DataFrame

In [None]:
# Get results as pandas DataFrame for analysis
df = nw.scrape_to_dataframe(
    keywords="teknologi,digital", start_date="2025-01-15", scrapers="detik,kompas"
)

print(f"DataFrame shape: {df.shape}")
print(f"Columns: {list(df.columns)}")

if not df.empty:
    print("\nDataFrame info:")
    print(df.info())

    print("\nFirst 3 rows:")
    print(df.head(3)[["title", "source", "publish_date"]].to_string())

    print("\nSource distribution:")
    print(df["source"].value_counts())

### 1.4 scrape_to_file() - Save Results Directly to File

In [None]:
# Save directly to Excel file
nw.scrape_to_file(
    keywords="pendidikan",
    start_date="2025-01-15",
    output_path="education_news.xlsx",
    output_format="xlsx",
    scrapers="tempo,antaranews",
)

print("✅ Education news saved to education_news.xlsx")

# Save to CSV
nw.scrape_to_file(
    keywords="kesehatan",
    start_date="2025-01-15",
    output_path="health_news.csv",
    output_format="csv",
    scrapers="kompas",
)

print("✅ Health news saved to health_news.csv")

# Save to JSON for API integration
nw.scrape_to_file(
    keywords="teknologi",
    start_date="2025-01-15",
    output_path="tech_news.json",
    output_format="json",
    scrapers="detik",
)

print("✅ Tech news saved to tech_news.json")

### 1.5 Output Format Options

news-watch supports three output formats:

- **XLSX (Excel)**: Best for human-readable reports and data analysis in spreadsheet applications
- **CSV**: Ideal for data analysis with pandas, R, or other data science tools
- **JSON**: Perfect for API integration, web applications, and programmatic processing

Each format preserves the same data fields but offers different advantages for different use cases.

## 2. Convenience Functions

### 2.1 quick_scrape() - Get Recent News Easily

In [None]:
# Get recent news without specifying exact dates
recent_politics = nw.quick_scrape(
    keywords="politik", days_back=3, scrapers="auto"  # Last 3 days
)

print(f"Found {len(recent_politics)} political articles from last 3 days")

if not recent_politics.empty:
    print("\nMost recent articles:")
    recent_sorted = recent_politics.sort_values("publish_date", ascending=False)
    for _, article in recent_sorted.head(3).iterrows():
        print(f"  • {article['title'][:60]}... ({article['source']})")

### 2.2 scrape_ihsg_news() - Specialized Stock Market News

In [None]:
# Get Indonesian stock market (IHSG) specific news
stock_news = nw.scrape_ihsg_news(days_back=5)

print(f"Found {len(stock_news)} IHSG-related articles from last 5 days")

if not stock_news.empty:
    # Analyze sentiment words in titles
    positive_words = ["naik", "menguat", "positif", "bullish"]
    negative_words = ["turun", "melemah", "negatif", "bearish"]

    positive_count = (
        stock_news["title"].str.contains("|".join(positive_words), case=False).sum()
    )
    negative_count = (
        stock_news["title"].str.contains("|".join(negative_words), case=False).sum()
    )

    print(f"\nSentiment analysis:")
    print(f"  Positive sentiment indicators: {positive_count} articles")
    print(f"  Negative sentiment indicators: {negative_count} articles")

    print("\nDaily IHSG news volume:")
    daily_counts = stock_news.groupby(stock_news["publish_date"].dt.date).size()
    print(daily_counts)