Skip to content

♻️ Pipeline for Extract, Transform and Load articles from news websites into an SQLite database.

Notifications You must be signed in to change notification settings

mariajosemv/ETL-for-news-websites

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extract, Transform and Load Articles from News Websites

How to use it:

1️⃣ Download repository.

2️⃣ Install required libraries:

pip install -r requirements.txt

3️⃣ To start scraping and the ETL process, just type on the terminal:

python pipeline.py

✅ It's done!

The script will:

  • Extract: Scrap articles from the front page of the websites:
    1. El Universal
    2. CNN en Español
  • Transform: Clean the data from empty values and enrich them with tokenization, i.e. separate the words within the title and the body for a posterior analysis.
  • Load: Load the data to a local SQLite database.

About

♻️ Pipeline for Extract, Transform and Load articles from news websites into an SQLite database.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages