Create a serverless scraping architecture

This is the code for the tutorial Create a serverless scraping architecture, with Scaleway Messaging and Queuing SQS, Serverless Functions and Managed Database.

In this tutorial we show how to set up a simple application which reads Hacker News and processes the articles it finds there asynchronously. To do so, we use Scaleway serverless products and deploy two functions:

A producer function, activated by a recurrent cron trigger, that scrapes HackerNews for articles published in the last 15 minutes and pushes the title and URL of the articles to an SQS queue created with Scaleway Messaging and Queuing.
A consumer function, triggered by each new message on the SQS queue, that consumes messages published to the queue, scrapes some data from the linked article, and then writes the data into a Scaleway Managed Database.

Requirements

This example assumes you are familiar with how serverless functions work. If needed, you can check Scaleway official documentation

You will also need Python and Terraform.

Running

cd scraper
pip install -r requirements.txt --target ./package
zip -r functions.zip handlers/ package/
cd ../consumer
pip install -r requirements.txt --target ./package
zip -r functions.zip handlers/ package/
cd ../terraform 
terraform init
terraform apply

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
consumer		consumer
scraper		scraper
terraform		terraform
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Create a serverless scraping architecture

Requirements

Running

About

Uh oh!

Releases

Packages

Uh oh!

Languages

scaleway/serverless-scraping-tutorial

Folders and files

Latest commit

History

Repository files navigation

Create a serverless scraping architecture

Requirements

Running

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages