-
Notifications
You must be signed in to change notification settings - Fork 109
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Standalone * Standalone Crawly implementation Allows to run Crawly and spiders without installing Elixir and creating projects. 1. Create Crawly release 2. Load spiders from SPIDERS_DIR 3. Configure Crawly via crawly.config 4. Allow to force reload spiders list after adding new spiders
- Loading branch information
1 parent
e736aa8
commit 5eeeb2a
Showing
16 changed files
with
295 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# ===================== base ===================== | ||
FROM elixir:alpine as build | ||
|
||
# install build dependencies | ||
RUN apk add --update git make gcc libc-dev autoconf libtool automake | ||
|
||
# set build dir | ||
WORKDIR /app | ||
|
||
# install hex + rebar | ||
RUN mix local.hex --force && \ | ||
mix local.rebar --force | ||
|
||
ENV MIX_ENV=standalone_crawly | ||
|
||
# install mix dependencies | ||
COPY mix.exs mix.lock /app/ | ||
COPY priv /app/priv/ | ||
COPY rel /app/rel | ||
|
||
|
||
|
||
RUN mix deps.get | ||
RUN mix local.rebar --force | ||
RUN mix deps.compile | ||
RUN mix deps.compile | ||
|
||
# build project code | ||
COPY config/config.exs config/ | ||
COPY config/crawly.config config/ | ||
COPY config/standalone_crawly.exs config/ | ||
|
||
# Create default config file | ||
# COPY config/app.config /app/config/app.config | ||
|
||
# COPY config/runtime.exs config/ | ||
COPY lib lib | ||
|
||
RUN mix compile | ||
|
||
COPY rel rel | ||
|
||
## build release | ||
RUN mix release | ||
|
||
# =================== release ==================== | ||
FROM alpine:latest AS release | ||
|
||
RUN apk add --update openssl make gcc libc-dev autoconf libtool automake | ||
|
||
WORKDIR /app | ||
|
||
RUN apk add --update bash | ||
COPY --from=build /app/_build/standalone_crawly/rel/crawly ./ | ||
COPY --from=build /app/config /app/config | ||
|
||
RUN mkdir /app/spiders | ||
|
||
EXPOSE 4001 | ||
|
||
ENTRYPOINT [ "/app/bin/crawly", "start_iex" ] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
[]. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# This file is responsible for configuring your application | ||
# and its dependencies with the aid of the Mix.Config module. | ||
import Config | ||
|
||
config :logger, :console, truncate: :infinity | ||
|
||
config :crawly, | ||
fetcher: {Crawly.Fetchers.HTTPoisonFetcher, []}, | ||
retry: [ | ||
retry_codes: [400], | ||
max_retries: 3, | ||
ignored_middlewares: [Crawly.Middlewares.UniqueRequest] | ||
], | ||
|
||
# Stop spider after scraping certain amount of items | ||
closespider_itemcount: 500, | ||
# Stop spider if it does crawl fast enough | ||
closespider_timeout: 20, | ||
concurrent_requests_per_domain: 5, | ||
|
||
# Request middlewares | ||
middlewares: [ | ||
Crawly.Middlewares.DomainFilter, | ||
Crawly.Middlewares.UniqueRequest, | ||
Crawly.Middlewares.RobotsTxt, | ||
{Crawly.Middlewares.UserAgent, | ||
user_agents: [ | ||
"Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.0", | ||
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36", | ||
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36 OPR/38.0.2220.41" | ||
]} | ||
], | ||
pipelines: [ | ||
{Crawly.Pipelines.Validate, fields: [:title, :price, :url]}, | ||
{Crawly.Pipelines.DuplicatesFilter, item_id: :title}, | ||
{Crawly.Pipelines.Experimental.Preview, limit: 100}, | ||
Crawly.Pipelines.JSONEncoder | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
@echo off | ||
rem Set the release to load code on demand (interactive) instead of preloading (embedded). | ||
rem set RELEASE_MODE=interactive | ||
|
||
rem Set the release to work across nodes. | ||
rem RELEASE_DISTRIBUTION must be "sname" (local), "name" (distributed) or "none". | ||
rem set RELEASE_DISTRIBUTION=name | ||
rem set RELEASE_NODE=<%= @release.name %> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
#!/bin/sh | ||
|
||
# # Sets and enables heart (recommended only in daemon mode) | ||
# case $RELEASE_COMMAND in | ||
# daemon*) | ||
# HEART_COMMAND="$RELEASE_ROOT/bin/$RELEASE_NAME $RELEASE_COMMAND" | ||
# export HEART_COMMAND | ||
# export ELIXIR_ERL_OPTIONS="-heart" | ||
# ;; | ||
# *) | ||
# ;; | ||
# esac | ||
|
||
# # Set the release to load code on demand (interactive) instead of preloading (embedded). | ||
# export RELEASE_MODE=interactive | ||
|
||
# # Set the release to work across nodes. | ||
# # RELEASE_DISTRIBUTION must be "sname" (local), "name" (distributed) or "none". | ||
# export RELEASE_DISTRIBUTION=name | ||
# export RELEASE_NODE=<%= @release.name %> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
## Customize flags given to the VM: https://www.erlang.org/doc/man/erl.html | ||
## -mode/-name/-sname/-setcookie are configured via env vars, do not set them here | ||
|
||
## Increase number of concurrent ports/sockets | ||
##+Q 65536 | ||
|
||
## Tweak GC to run more often | ||
##-env ERL_FULLSWEEP_AFTER 10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
## Customize flags given to the VM: https://www.erlang.org/doc/man/erl.html | ||
## -mode/-name/-sname/-setcookie are configured via env vars, do not set them here | ||
|
||
## Increase number of concurrent ports/sockets | ||
##+Q 65536 | ||
|
||
## Tweak GC to run more often | ||
##-env ERL_FULLSWEEP_AFTER 10 | ||
|
||
-config /app/config/crawly.config |
Oops, something went wrong.