-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/benchmark #116
Feature/benchmark #116
Conversation
@@ -0,0 +1,39 @@ | |||
defmodule BenchTest do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to keep the test coverage up to 80% and to show it hat is possible to run the benchmark with a different spider
test/features/managet_test_spider.ex
Outdated
@@ -0,0 +1,48 @@ | |||
defmodule Features.Manager.TestSpider do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of splitting this code as you're suggesting. But could we address it in a separate PR?
test/manager_test.exs
Outdated
@@ -1,6 +1,8 @@ | |||
defmodule ManagerTest do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this one is unrelated to bench as well.
c1d32df
to
d98043c
Compare
d98043c
to
3651809
Compare
Logger.info("Adding 10 workers for #{name}") | ||
|
||
Enum.map(1..10, fn _x -> | ||
DynamicSupervisor.start_child(name, {Crawly.Worker, [name]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather prefer to have it inside crawly. To do something like Crawly.Manager.add_worker/spider_name
|
||
{:stored_requests, req_count} = Crawly.RequestsStorage.stats(name) | ||
|
||
{_, pid, :worker, _} = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would want to extend API to get specific manager using the following semantics:
Crawly.Engine.get_manager(spider_name)
Supervisor.which_children(Map.get(spiders, name)) | ||
|> Enum.find(&({Crawly.Manager, _, :worker, [Crawly.Manager]} = &1)) | ||
|
||
{:info, info} = GenServer.call(pid, :collect_metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be an API of CrawlyManager as well. Lets extend it as a separate PR please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,31 @@ | |||
defmodule Crawly.Bench.BenchRouter do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to check if it's possible (or if we should do it) to abstract it to a separate node? E.g. having an exs file, which is started separately from Crawly. So potentially it's possible to run it as a standalone process, to avoid any possible collision.
This PR adds the possibility to run a local benchmark for web scraping tool performance, the measurement collected is the number of requests and items per minute, memory usage, and the number of reductions on the spider process.
It starts a dummy local HTTP server, generating severals URLs in order to perform concurrent requests per domain