Fake scrapper

Fake service that scrapes data from web pages, normally an intensive task, dependent on system resources and a reliable network connection. The goal here is to be able to request a number of URLs to be scraped, and have a GenStage data pipeline take care of the workload.

This is the result of following the book Concurrent Data Processing in Elixir Fast, Resilient Applications with OTP, GenStage, Flow, and Broadway by Svilen Gospodinov

Producer

The scraper will scrape data from web pages. At the beginning of our pipeline there's a producer stage that will be responsible for what web pages will be scraped. In other words, it will produce URLs for consumers interested in getting web pages to scrape.

Handling demand

Demand can be changed on the PageConsumer module.

By configuring demand two things can be controlled: max_demand will set the maximum number of events the consumer can ask for, and min_demand will determine the lowest number of events available to the consumer before it can ask for more.

If we set min_demand to 0 and max_demand to 3, we are ensuring that the consumer will take at least one event when it is available, up to three at a time.

If we set max_demand to 1 the events will be requested one by one, so it won't be processed in batches.

It all depends on the configuration that works best for the specific project

This project uses multiple supervised consumers to process events one by one, concurrently.

How to interact with this project

Run the app with iex -S mix. The consumer will send demand as soon as it is initialized.

pages = ["google.com", "facebook.com", "apple.com", "netflix.com", "amazon.com"]

PageProducer.scrape_pages(pages)

We can see that PageConsumer immediately received the first three pages, which took a bit of time to process, judging by the timestamps. Since only two pages were available next, the consumer realized that it has capacity for one more page, so it immediately issued demand for another page, while starting work on the other two.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
lib		lib
test		test
.formatter.exs		.formatter.exs
.gitignore		.gitignore
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake scrapper

Producer

Handling demand

How to interact with this project

About

Languages

hudsonbay/fake_scraper

Folders and files

Latest commit

History

Repository files navigation

Fake scrapper

Producer

Handling demand

How to interact with this project

About

Topics

Resources

Stars

Watchers

Forks

Languages