A Crawler wrapper with the ability to keep record of each crawl. It also gives allows to save a screenshot whenever the cralwer fails.
Add to your Gemfile:
gem "power_client"
bundle install
Then, run the installer:
rails generate power_client:install
Use the create command to create a new Client:
rails generate power_client:client Login
This will generate the following files:
# app/clients/login_client.rb
class LoginClient < PowerClient::ChromeClient
URL = 'www.platan.us'
def perform
# client logic goes in here
end
end
The corresponding parser:
# app/parsers/login_parser.rb
class LoginParser
PARSERS_MATCHERS = {
# complete with the necessary regex matchers
}
def parse_data(data)
# parser logic goes in here
end
end
And its corresponding rspec file:
# spec/parsers/login_parser_spec.rb
require 'rails_helper'
RSpec.describe LoginParser, type: :parser do
pending 'describe what perform does here'
end
The corresponding Job:
# app/jobs/clients/login_job.rb
class Clients::LoginJob < PowerClient::ChromeClientJob
def perform
raw_data = client.perform
formatted_data = parser.parse_data(raw_data)
end
private
def client
@client ||= LoginClient.new
end
def parser
@parser ||= LoginParser.new
end
end
And its corresponding rspec file:
# spec/jobs/clients/login_job_spec.rb
require 'rails_helper'
describe Clients::LoginJob do
def perform(*_args)
described_class.for(*_args)
end
pending "describe what perform does here"
end
You can add the following flags to the generate command:
--specs false
: creates no RSpec files--parser false
: creates no Parser file--job false
: creates no Job file
Each client should inherit from ChromeClient and run each action inside an ensuring_browser_closure
block:
class BankMovementsClient
def get_movements
ensuring_browser_closure do
login
goto_bank_movements_page
results = []
for_each_transaction_table do |table|
results.append(
table.search('tbody').map { |row| transaction_row_to_hash row }
)
end
results.flatten
end
end
end
To register each crawled, this engine uses active_job_log
engine.
By default, Selenium::WebDriver
will be initialized with the headless
argument. To change this behaviour, you'll need to set the headless_webdriver?
enviroment variable to false.
To run the specs you need to execute, in the root path of the gem, the following command:
bundle exec guard
You need to put all your tests in the /power_client/spec/dummy/spec/
directory.
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
Thank you contributors!
Power Client is maintained by platanus.
Power Client is © 2020 platanus, spa. It is free software and may be redistributed under the terms specified in the LICENSE file.