Adding some more information on the added statistics.

Also changing some of the default options related to statistics.
nietaki · May 15, 2017 · b0cf085 · b0cf085
1 parent 355a58c
commit b0cf085
Show file tree

Hide file tree

Showing 3 changed files with 17 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -19,6 +19,10 @@ Here's a rough diagram:
 
 ![crawlie architecture diagram](assets/crawlie_arch_v0.2.0.png)
 
+## Statistics
+
+If you're interested in the crawling statistics or want to track the progress in real time, see [`Crawlie.crawl_and_track_stats/3`](https://hexdocs.pm/crawlie/Crawlie.html#crawl_and_track_stats/3). It starts a [`Stats GenServer`](https:/hexdocs.pm/crawlie/Crawlie.Stats.Server.html) in Crawlie's supervision tree, which accumulates the statistics for the crawling session.
+
 ## Configuration
 
 See [the docs](https://hexdocs.pm/crawlie/Crawlie.html#crawl/3) for supported options.

diff --git a/lib/crawlie.ex b/lib/crawlie.ex
@@ -23,6 +23,7 @@ defmodule Crawlie do
   the options for [HttPoison](https://hexdocs.pm/httpoison/HTTPoison.html#request/5),
   as well as Crawlie specific options.
 
+  It is perfectly ok to run multiple crawling sessions at the same time, they're independent.
 
   ## arguments
   - `source` - a `Stream` or an `Enum` containing the urls to crawl
@@ -64,7 +65,16 @@ defmodule Crawlie do
   @doc """
   Crawls the urls provided in `source`, using the `Crawlie.ParserLogic` provided and collects the crawling statistics.
 
-  See `Crawlie.crawl/3` for details
+  The statistics are accumulated independently, per `Crawlie.crawl_and_track_stats/3` call.
+
+  See `Crawlie.crawl/3` for details.
+
+  ## Additional options
+
+  (apart from the ones from `Crawlie.crawl/3`, which all apply as well)
+
+  - `:max_fetch_failed_uris_tracked` - `100` by default. The maximum quantity of uris that will be kept in the `Crawlie.Stats.Server`, for which the fetch operation was failed.
+  - `:max_parse_failed_uris_tracked` - `100` by default. The maximum quantity of uris that will be kept in the `Crawlie.Stats.Server`, for which the parse operation was failed.
   """
   def crawl_and_track_stats(source, parser_logic, options \\ []) do
     ref = StatsServer.start_new()

diff --git a/lib/crawlie/options.ex b/lib/crawlie/options.ex
@@ -29,8 +29,8 @@ defmodule Crawlie.Options do
         stages: core_count(),
       ],
       pqueue_module: :pqueue3,
-      max_fetch_failed_uris_tracked: 1000,
-      max_parse_failed_uris_tracked: 1000,
+      max_fetch_failed_uris_tracked: 100,
+      max_parse_failed_uris_tracked: 100,
     ]
   end