Skip to content

Automated datapipeline connected to an AWS cloud database

Notifications You must be signed in to change notification settings

its-philipp/datapipeline_project

Repository files navigation

Data Engineering - Automated Data Pipeline

Project Objective

In this project I used web scraping and API calls to get the desired data from websites like wikipedia, openweathermap or rapidapi. This pulled data I cleaned and transformed into tables, which I pushed to a SQL database, which I created beforehand and have connected to the AWS Cloud. In the last step I created AWS lambda functions and scheduled them within the AWS EventBridge manager to get this datapipeline triggered automatically once a day. For more detailed steps and information please have a look at my article on medium.

Libraries and Dependencies

  • pandas
  • requests
  • bs4
  • json
  • datetime
  • sqlalchemy
  • os
  • re
  • pytz