-
Notifications
You must be signed in to change notification settings - Fork 110
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
1. Extend DataStorage with inspect function that allows inspecting states 2. Add Preview item pipeline that allows storing extracted items as a part of the state of DataStorage 3. Update html part so it's possible to navigate the preview
- Loading branch information
1 parent
a5d34e7
commit 17c89fc
Showing
9 changed files
with
188 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
defmodule Crawly.Pipelines.Experimental.Preview do | ||
@moduledoc """ | ||
Allows to preview items extracted by the spider so far | ||
Stores previewable items under 'Elixir.Crawly.Pipelines.Experimental.Preview' | ||
### Options | ||
- `limit`, (optional, if not provided 100 is used) - resrticts the number of items visible in preview | ||
Probably it's better to place it higher than CSV/JSON converters. | ||
### Example usage in Crawly config | ||
``` | ||
pipelines: [ | ||
{Crawly.Pipelines.Experimental.Preview, limit: 10}, | ||
# As you can see we're using data transformators afterwords | ||
Crawly.Pipelines.JSONEncoder, | ||
{Crawly.Pipelines.WriteToFile, extension: "jl", folder: "/tmp"} | ||
] | ||
``` | ||
""" | ||
@behaviour Crawly.Pipeline | ||
|
||
# Restrict the number of items stored in state of the worker | ||
@limit 20 | ||
|
||
require Logger | ||
|
||
@impl Crawly.Pipeline | ||
def run(item, state, opts \\ []) do | ||
preview = Map.get(state, __MODULE__, []) | ||
limit = Keyword.get(opts, :limit, @limit) | ||
|
||
case Enum.count(preview) >= limit do | ||
true -> | ||
{item, state} | ||
|
||
false -> | ||
new_preview = [item | preview] | ||
new_state = Map.put(state, __MODULE__, new_preview) | ||
{item, new_state} | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
<div class="row"> | ||
<div id="status"> | ||
<div> | ||
<%= if not preview_enabled? do %> | ||
Items previewer requires the Crawly.Pipelines.Experimental.Preview | ||
item pipeline in your config | ||
<% end %> | ||
</div> | ||
|
||
</div> | ||
<div class="leftcolumn"> | ||
<div class="card"> | ||
<h3> | ||
Extracted items: <%= spider_name %> | ||
<a href="/">Back</a> | ||
</h3> | ||
<table> | ||
<tr> | ||
<th>item</th> | ||
</tr> | ||
|
||
<%= for item <- items do %> | ||
<tr> | ||
<td> | ||
<ul> | ||
<%= for {key, value} <- item do %> | ||
<li><%= key %>: <%= value %></li> | ||
<% end %> | ||
</ul> | ||
</td> | ||
|
||
</tr> | ||
<% end %> | ||
</table> | ||
</div> | ||
<div class="rightcolumn"> | ||
</div> | ||
</div> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
defmodule Pipelines.PreviewTest do | ||
use ExUnit.Case, async: false | ||
|
||
@item %{first: "some", second: "data"} | ||
@state %{spider_name: Test, crawl_id: "test"} | ||
|
||
test "Preview items are stored in state" do | ||
pipelines = [{Crawly.Pipelines.Experimental.Preview}] | ||
|
||
{item, state} = Crawly.Utils.pipe(pipelines, @item, @state) | ||
|
||
assert assert item == @item | ||
|
||
preview = Map.get(state, :"Elixir.Crawly.Pipelines.Experimental.Preview") | ||
assert [@item] == preview | ||
end | ||
|
||
test "It's possible to resrtict number of stored items" do | ||
pipelines = [{Crawly.Pipelines.Experimental.Preview, limit: 2}] | ||
|
||
# Checking what happens if we try to store 3 items | ||
{_item, state0} = Crawly.Utils.pipe(pipelines, @item, @state) | ||
{_item, state1} = Crawly.Utils.pipe(pipelines, @item, state0) | ||
{_item, state2} = Crawly.Utils.pipe(pipelines, @item, state1) | ||
|
||
preview = Map.get(state2, :"Elixir.Crawly.Pipelines.Experimental.Preview") | ||
assert Enum.count(preview) == 2 | ||
end | ||
end |