# News Watch Scraping Notebook

This notebook demonstrates how to use the "news-watch" package to scrape Indonesian news articles. The package supports various command-line arguments that can be used to customize your scraping process.

## Overview

The news-watch package enables you to:
- Scrape articles from various Indonesian news websites.
- Filter articles based on keywords.
- Specify a start date for scraping.
- Choose output formats (CSV or XLSX).
- Silently run scraping operations if desired.

This notebook is designed for Google Colab. Any output files generated (CSV or XLSX) will appear under the "Files" tab on the left side.


### Key command-line arguments
Key command-line arguments include:

- `-k` or `--keywords`: Specifies a comma-separated list of keywords. For example, `-k ihsg,bank,keuangan` or `--keywords ihsg,bank,keuangan`.
- `-sd` or `--start_date`: Specifies the start date for scraping in the format YYYY-MM-DD.
- `-s` or `--scrapers`: (Optional) Specifies a comma-separated list of news websites to scrape. When not provided, all supported scrapers will be used.
- `-of` or `--output_format`: (Optional) Specifies the output format (e.g., csv, xlsx).
- `--silent`: (Optional) Runs the scraper without printing output to the console.

#### Running Shell Commands in Colab
In Google Colab or Jupyter notebooks, the "!" is used to run shell commands directly from a notebook cell. Since news-watch is a command-line tool, the commands must be prefixed with "!" so that they are executed in the notebook's shell environment rather than interpreted as Python code.

note:
- `pip`       # used to install and manage Python packages.
- `news-watch` # the command-line interface provided by the news-watch package for scraping news.

To install the news-watch package, follow one of these methods:

In [None]:
!pip install news-watch --upgrade

### Example Commands

- Display help information:

In [None]:
!newswatch --help

- Scrape articles with the keyword "ihsg" starting from February 1st, 2025:

In [None]:
!newswatch --keywords ekonomi --start_date 2025-01-01

- Scrape articles for multiple keywords (ihsg, bank, keuangan) and disable logging:

In [None]:
!newswatch -k "ihsg,bank,keuangan" -sd 2025-01-01 --silent

- Scrape articles for specific news website (bisnisindonesia and detik) with excel output format and disable logging:

In [None]:
!newswatch -k "ihsg" -s "bisnisindonesia,detik" --output_format xlsx -S