Extract, Transform and Load Articles from News Websites

How to use it:

1️⃣ Download repository.

2️⃣ Install required libraries:

pip install -r requirements.txt

3️⃣ To start scraping and the ETL process, just type on the terminal:

python pipeline.py

✅ It's done!

The script will:

Extract: Scrap articles from the front page of the websites:
1. El Universal
2. CNN en Español
Transform: Clean the data from empty values and enrich them with tokenization, i.e. separate the words within the title and the body for a posterior analysis.
Load: Load the data to a local SQLite database.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
extract		extract
load		load
transform		transform
README.md		README.md
pipeline.py		pipeline.py
requirements.txt		requirements.txt

Provide feedback