Feature/benchmark #116

filipevarjao · 2020-07-02T19:14:54Z

This PR adds the possibility to run a local benchmark for web scraping tool performance, the measurement collected is the number of requests and items per minute, memory usage, and the number of reductions on the spider process.

It starts a dummy local HTTP server, generating severals URLs in order to perform concurrent requests per domain

oltarasenko · 2020-07-02T21:01:12Z

test/bench_test.exs

@@ -0,0 +1,39 @@
+defmodule BenchTest do


Why do we need this test?

to keep the test coverage up to 80% and to show it hat is possible to run the benchmark with a different spider

oltarasenko · 2020-07-02T21:02:41Z

test/features/managet_test_spider.ex

@@ -0,0 +1,48 @@
+defmodule Features.Manager.TestSpider do


I like the idea of splitting this code as you're suggesting. But could we address it in a separate PR?

oltarasenko · 2020-07-02T21:04:03Z

test/manager_test.exs

@@ -1,6 +1,8 @@
 defmodule ManagerTest do


Maybe this one is unrelated to bench as well.

oltarasenko · 2020-07-07T19:08:18Z

lib/crawly/bench.ex

+        Logger.info("Adding 10 workers for #{name}")
+
+        Enum.map(1..10, fn _x ->
+          DynamicSupervisor.start_child(name, {Crawly.Worker, [name]})


I would rather prefer to have it inside crawly. To do something like Crawly.Manager.add_worker/spider_name

oltarasenko · 2020-07-07T19:10:21Z

lib/crawly/bench.ex

+
+      {:stored_requests, req_count} = Crawly.RequestsStorage.stats(name)
+
+      {_, pid, :worker, _} =


I would want to extend API to get specific manager using the following semantics:
Crawly.Engine.get_manager(spider_name)

oltarasenko · 2020-07-07T19:12:28Z

lib/crawly/bench.ex

+        Supervisor.which_children(Map.get(spiders, name))
+        |> Enum.find(&({Crawly.Manager, _, :worker, [Crawly.Manager]} = &1))
+
+      {:info, info} = GenServer.call(pid, :collect_metrics)


This could be an API of CrawlyManager as well. Lets extend it as a separate PR please

oltarasenko · 2020-07-07T19:13:53Z

lib/crawly/bench/bench_router.ex

@@ -0,0 +1,31 @@
+defmodule Crawly.Bench.BenchRouter do


I wanted to check if it's possible (or if we should do it) to abstract it to a separate node? E.g. having an exs file, which is started separately from Crawly. So potentially it's possible to run it as a standalone process, to avoid any possible collision.

oltarasenko reviewed Jul 2, 2020

View reviewed changes

filipevarjao force-pushed the feature/benchmark branch 3 times, most recently from c1d32df to d98043c Compare July 6, 2020 19:22

filipevarjao added 16 commits July 7, 2020 11:48

[add] dummy http server

3e53773

formatter code style

f8e53a4

[WIP] collecting spider info

dff15bc

[fix] mix bench command and manager server

27de7b9

[add] benchmark test coverage

ec9ac6e

fix bench and tests

988a19e

[change] timeout to collect metrics

dc48383

[dry] simplifying the bench_router

a01c2ed

[refactoring] bench spider

74fc65f

[change] removing the use of registry for the spider process

fc602fd

[rollback] format

028703d

remove bench_test

f8f9eea

[add] bench and manager test

5cf6862

[change] the request counter

58d35d7

[change] random links per request

a4beda0

[change] adding 10 more workers each 10s

3651809

filipevarjao force-pushed the feature/benchmark branch from d98043c to 3651809 Compare July 7, 2020 14:50

oltarasenko reviewed Jul 7, 2020

View reviewed changes

filipevarjao closed this Jul 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/benchmark #116

Feature/benchmark #116

filipevarjao commented Jul 2, 2020 •

edited

Loading

oltarasenko Jul 2, 2020

filipevarjao Jul 3, 2020

oltarasenko Jul 2, 2020

oltarasenko Jul 2, 2020

oltarasenko Jul 7, 2020

oltarasenko Jul 7, 2020 •

edited

Loading

oltarasenko Jul 7, 2020

filipevarjao Jul 9, 2020

oltarasenko Jul 7, 2020


		{:stored_requests, req_count} = Crawly.RequestsStorage.stats(name)

		{_, pid, :worker, _} =

Feature/benchmark #116

Feature/benchmark #116

Conversation

filipevarjao commented Jul 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oltarasenko Jul 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

filipevarjao commented Jul 2, 2020 •

edited

Loading

oltarasenko Jul 7, 2020 •

edited

Loading