diff --git a/.travis.yml b/.travis.yml index 2fb1e73..e51155e 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,7 +5,7 @@ script: "rake spec" rvm: - 2.2.4 - 2.3.3 - - 2.4.0 + - 2.4.1 - ruby-head matrix: allow_failures: diff --git a/README.md b/README.md index 79ce2ef..f974aca 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,7 @@ manager = ProxyFetcher::Manager.new # will immediately load proxy list from the manager.proxies #=> [#, ... ] + # @response_time=5217, @type="HTTP", @anonymity="High">, ... ] ``` You can initialize proxy manager without immediate load of proxy list from the remote server by passing `refresh: false` on initialization: @@ -75,8 +75,8 @@ Get raw proxy URLs as Strings: manager = ProxyFetcher::Manager.new manager.raw_proxies - # => ["http://97.77.104.22:3128", "http://94.23.205.32:3128", "http://209.79.65.140:8080", - # "http://91.217.42.2:8080", "http://97.77.104.22:80", "http://165.234.102.177:8080", ...] + # => ["97.77.104.22:3128", "94.23.205.32:3128", "209.79.65.140:8080", + # "91.217.42.2:8080", "97.77.104.22:80", "165.234.102.177:8080", ...] ``` If `ProxyFetcher::Manager` was already initialized somewhere, you can refresh the proxy list by calling `#refresh_list!` method: @@ -85,7 +85,7 @@ If `ProxyFetcher::Manager` was already initialized somewhere, you can refresh th manager.refresh_list! # or manager.fetch! #=> [#, ... ] + # @response_time=5217, @type="HTTP", @anonymity="High">, ... ] ``` If you need to filter proxy list, for example, by country or response time and selected provider supports filtering by GET params, then you @@ -117,19 +117,23 @@ then you already have Ruby 2.3 installed. In other cases you can install it with Just install the gem by running `gem install proxy_fetcher` in your terminal and run it: ```bash -proxy_fetcher >> proxies.txt # Will download proxies, validate them and write to file +proxy_fetcher >> proxies.txt # Will download proxies from the default provider, validate them and write to file ``` -If you need a list of proxies in JSON then pass `--json` argument to the command: +If you need a list of proxies from some specific provider, then you need to pass it's name with `-p` option: + +```bash +proxy_fetcher -p proxy_docker >> proxies.txt # Will download proxies from the default provider, validate them and write to file +``` + +If you need a list of proxies in JSON format just pass a `--json` option to the command: ```bash proxy_fetcher --json # Will print: -# {"proxies":["https://120.26.206.178:8888","https://119.61.13.242:1080","https://117.40.213.26:1080","https://92.62.72.242:1080", -# "https://58.20.41.172:1080","https://204.116.192.151:35923","https://190.5.96.58:1080","https://170.250.109.97:35923", -# "https://121.41.82.99:1080","https://77.53.105.155:35923"]} - +# {"proxies":["120.26.206.178:80","119.61.13.242:1080","117.40.213.26:80","92.62.72.242:1080","77.53.105.155:3124" +# "58.20.41.172:35923","204.116.192.151:35923","190.5.96.58:1080","170.250.109.97:35923","121.41.82.99:1080"]} ``` To get all the possible options run: @@ -144,10 +148,9 @@ Every proxy is a `ProxyFetcher::Proxy` object that has next readers (instance va * `addr` (IP address) * `port` +* `type` (proxy type, can be HTTP, HTTPS, SOCKS4 or/and SOCKS5) * `country` (USA or Brazil for example) * `response_time` (5217 for example) -* `speed` (`:slow`, `:medium` or `:fast`. **Note:** depends on the proxy provider and can be `nil`) -* `type` (URI schema, HTTP or HTTPS) * `anonymity` (`Low`, `Elite proxy` or `High +KA` for example) Also you can call next instance methods for every Proxy object: @@ -155,15 +158,11 @@ Also you can call next instance methods for every Proxy object: * `connectable?` (whether proxy server is available) * `http?` (whether proxy server has a HTTP protocol) * `https?` (whether proxy server has a HTTPS protocol) +* `socks4?` +* `socks5?` * `uri` (returns `URI::Generic` object) * `url` (returns a formatted URL like "_http://IP:PORT_" ) -You can sort or find any proxy by speed using next 3 instance methods (if it is available for the specific provider): - -* `fast?` -* `medium?` -* `slow?`' - ## Configuration To change open/read timeout for `cleanup!` and `connectable?` methods you need to change ProxyFetcher.config: @@ -188,10 +187,6 @@ class MyHTTPClient def self.fetch(url) # ... some magic to return proper HTML ... end - - def self.connectable?(url) - # ... some magic to check if url is connectable ... - end end ProxyFetcher.config.http_client = MyHTTPClient @@ -200,11 +195,34 @@ manager = ProxyFetcher::Manager.new manager.proxies #=> [#, ... ] + # @response_time=5217, @type="HTTP", @anonymity="High">, ... ] ``` You can take a look at the [lib/proxy_fetcher/utils/http_client.rb](lib/proxy_fetcher/utils/http_client.rb) for an example. +Moreover, you can write your own proxy validator to check if proxy is valid or not: + +```ruby +class MyProxyValidator + # [IMPORTANT]: below methods are required! + def self.connectable?(proxy_addr, proxy_port) + # ... some magic to check if proxy is valid ... + end +end + +ProxyFetcher.config.proxy_validator = MyProxyValidator + +manager = ProxyFetcher::Manager.new +manager.proxies + +#=> [#, ... ] + +manager.validate! + + #=> [ ... ] +``` + ## Providers Currently ProxyFetcher can deal with next proxy providers (services): diff --git a/lib/proxy_fetcher/providers/base.rb b/lib/proxy_fetcher/providers/base.rb index 3e73e9c..c5afc97 100644 --- a/lib/proxy_fetcher/providers/base.rb +++ b/lib/proxy_fetcher/providers/base.rb @@ -7,11 +7,6 @@ class Base def_delegators ProxyFetcher::HTML, :clear, :convert_to_int - PROXY_TYPES = [ - HTTP = 'HTTP'.freeze, - HTTPS = 'HTTPS'.freeze - ].freeze - attr_reader :proxy def fetch_proxies!(filters = {}) @@ -45,8 +40,8 @@ def to_proxy(*) end # Return normalized HTML element content by selector - def parse_element(element, selector, method = :at_xpath) - clear(element.public_send(method, selector).content) + def parse_element(parent, selector, method = :at_xpath) + clear(parent.public_send(method, selector).content) end end end diff --git a/lib/proxy_fetcher/providers/free_proxy_list.rb b/lib/proxy_fetcher/providers/free_proxy_list.rb index 46af728..1d572ca 100644 --- a/lib/proxy_fetcher/providers/free_proxy_list.rb +++ b/lib/proxy_fetcher/providers/free_proxy_list.rb @@ -22,8 +22,8 @@ def to_proxy(html_element) private def parse_type(element) - type = parse_element(element, 'td[6]') - type && type.casecmp('yes').zero? ? HTTPS : HTTP + https = parse_element(element, 'td[6]') + https && https.casecmp('yes').zero? ? ProxyFetcher::Proxy::HTTPS : ProxyFetcher::Proxy::HTTP end end diff --git a/lib/proxy_fetcher/providers/free_proxy_list_ssl.rb b/lib/proxy_fetcher/providers/free_proxy_list_ssl.rb index 5972a42..b2ec7c3 100644 --- a/lib/proxy_fetcher/providers/free_proxy_list_ssl.rb +++ b/lib/proxy_fetcher/providers/free_proxy_list_ssl.rb @@ -15,7 +15,7 @@ def to_proxy(html_element) proxy.port = convert_to_int(parse_element(html_element, 'td[2]')) proxy.country = parse_element(html_element, 'td[4]') proxy.anonymity = parse_element(html_element, 'td[5]') - proxy.type = HTTPS + proxy.type = ProxyFetcher::Proxy::HTTPS end end end diff --git a/lib/proxy_fetcher/providers/hide_my_name.rb b/lib/proxy_fetcher/providers/hide_my_name.rb index 43e2c17..3450e80 100644 --- a/lib/proxy_fetcher/providers/hide_my_name.rb +++ b/lib/proxy_fetcher/providers/hide_my_name.rb @@ -13,14 +13,9 @@ def to_proxy(html_element) proxy.addr = parse_element(html_element, 'td[1]') proxy.port = convert_to_int(parse_element(html_element, 'td[2]')) proxy.anonymity = parse_element(html_element, 'td[6]') - proxy.country = parse_country(html_element) - proxy.type = parse_type(html_element) - - response_time = parse_response_time(html_element) - - proxy.response_time = response_time - proxy.speed = speed_from_response_time(response_time) + proxy.type = parse_element(html_element, 'td[5]') + proxy.response_time = parse_response_time(html_element) end end @@ -30,29 +25,9 @@ def parse_country(element) clear(element.at_xpath('*//span[1]/following-sibling::text()[1]').content) end - def parse_type(element) - schemas = parse_element(element, 'td[5]') - - if schemas && schemas.downcase.include?('https') - HTTPS - else - HTTP - end - end - def parse_response_time(element) convert_to_int(element.at_xpath('td[4]').content.strip[/\d+/]) end - - def speed_from_response_time(response_time) - if response_time < 1500 - :fast - elsif response_time < 3000 - :medium - else - :slow - end - end end ProxyFetcher::Configuration.register_provider(:hide_my_name, HideMyName) diff --git a/lib/proxy_fetcher/providers/xroxy.rb b/lib/proxy_fetcher/providers/xroxy.rb index c743b2e..9a59568 100644 --- a/lib/proxy_fetcher/providers/xroxy.rb +++ b/lib/proxy_fetcher/providers/xroxy.rb @@ -13,11 +13,18 @@ def to_proxy(html_element) proxy.addr = parse_element(html_element, 'td[2]') proxy.port = convert_to_int(parse_element(html_element, 'td[3]')) proxy.anonymity = parse_element(html_element, 'td[4]') - proxy.type = parse_element(html_element, 'td[5]').casecmp('true').zero? ? HTTPS : HTTP proxy.country = parse_element(html_element, 'td[6]') proxy.response_time = convert_to_int(parse_element(html_element, 'td[7]')) + proxy.type = parse_type(html_element) end end + + private + + def parse_type(element) + https = parse_element(element, 'td[5]') + https.casecmp('true').zero? ? ProxyFetcher::Proxy::HTTPS : ProxyFetcher::Proxy::HTTP + end end ProxyFetcher::Configuration.register_provider(:xroxy, XRoxy) diff --git a/lib/proxy_fetcher/proxy.rb b/lib/proxy_fetcher/proxy.rb index b6607e5..1d97512 100644 --- a/lib/proxy_fetcher/proxy.rb +++ b/lib/proxy_fetcher/proxy.rb @@ -1,31 +1,34 @@ module ProxyFetcher - class Proxy < OpenStruct - def connectable? - ProxyFetcher.config.http_client.connectable?(url) - end + class Proxy + attr_accessor :addr, :port, :type, :country, :response_time, :anonymity - alias valid? connectable? + TYPES = [ + HTTP = 'HTTP'.freeze, + HTTPS = 'HTTPS'.freeze, + SOCKS4 = 'SOCKS4'.freeze, + SOCKS5 = 'SOCKS5'.freeze + ].freeze - %i[slow medium fast].each do |method| - define_method "#{method}?" do - speed == method + TYPES.each do |proxy_type| + define_method "#{proxy_type.downcase}?" do + type && type.upcase.include?(proxy_type) end end - def http? - type.casecmp('http').zero? - end + alias ssl? https? - def https? - type.casecmp('https').zero? + def connectable? + ProxyFetcher.config.proxy_validator.connectable?(addr, port) end + alias valid? connectable? + def uri - URI::Generic.build(host: addr, port: port, scheme: type) + URI::Generic.build(host: addr, port: port) end def url - uri.to_s + "#{addr}:#{port}" end end end diff --git a/lib/proxy_fetcher/version.rb b/lib/proxy_fetcher/version.rb index 181b0fe..637141d 100644 --- a/lib/proxy_fetcher/version.rb +++ b/lib/proxy_fetcher/version.rb @@ -9,7 +9,7 @@ module VERSION # Minor version number MINOR = 3 # Smallest version number - TINY = 0 + TINY = 1 # Full version number STRING = [MAJOR, MINOR, TINY].compact.join('.') diff --git a/spec/proxy_fetcher/configuration_spec.rb b/spec/proxy_fetcher/configuration_spec.rb index 9d68c75..0f9884d 100644 --- a/spec/proxy_fetcher/configuration_spec.rb +++ b/spec/proxy_fetcher/configuration_spec.rb @@ -10,20 +10,35 @@ class MyHTTPClient def self.fetch(url) url end + end + + expect { ProxyFetcher.config.http_client = MyHTTPClient }.not_to raise_error + end + it 'failed on setup if required methods are missing' do + MyWrongHTTPClient = Class.new + + expect { ProxyFetcher.config.http_client = MyWrongHTTPClient } + .to raise_error(ProxyFetcher::Configuration::WrongCustomClass) + end + end + + context 'custom proxy validator' do + it 'successfully setups if class has all the required methods' do + class MyProxyValidator def self.connectable?(*) true end end - expect { ProxyFetcher.config.http_client = MyHTTPClient }.not_to raise_error + expect { ProxyFetcher.config.proxy_validator = MyProxyValidator }.not_to raise_error end it 'failed on setup if required methods are missing' do - MyWrongHTTPClient = Class.new + MyWrongProxyValidator = Class.new - expect { ProxyFetcher.config.http_client = MyWrongHTTPClient } - .to raise_error(ProxyFetcher::Configuration::WrongHttpClient) + expect { ProxyFetcher.config.proxy_validator = MyWrongProxyValidator } + .to raise_error(ProxyFetcher::Configuration::WrongCustomClass) end end diff --git a/spec/proxy_fetcher/proxy_spec.rb b/spec/proxy_fetcher/proxy_spec.rb index 7d1bc7c..3617ce6 100644 --- a/spec/proxy_fetcher/proxy_spec.rb +++ b/spec/proxy_fetcher/proxy_spec.rb @@ -12,13 +12,16 @@ let(:proxy) { @manager.proxies.first.dup } it 'checks schema' do - proxy.type = ProxyFetcher::Providers::Base::HTTP + proxy.type = ProxyFetcher::Proxy::HTTP expect(proxy.http?).to be_truthy expect(proxy.https?).to be_falsey - proxy.type = ProxyFetcher::Providers::Base::HTTPS + proxy.type = ProxyFetcher::Proxy::HTTPS expect(proxy.https?).to be_truthy - expect(proxy.http?).to be_falsey + expect(proxy.http?).to be_truthy + + proxy.type = ProxyFetcher::Proxy::SOCKS5 + expect(proxy.socks5?).to be_truthy end it 'not connectable if IP addr is wrong' do @@ -44,15 +47,4 @@ it 'returns URL' do expect(proxy.url).to be_a(String) end - - it 'checks speed' do - proxy.speed = :fast - expect(proxy.fast?).to be_truthy - - proxy.speed = :slow - expect(proxy.slow?).to be_truthy - - proxy.speed = :medium - expect(proxy.medium?).to be_truthy - end end diff --git a/spec/support/manager_examples.rb b/spec/support/manager_examples.rb index 8b62cab..a8e3946 100644 --- a/spec/support/manager_examples.rb +++ b/spec/support/manager_examples.rb @@ -9,12 +9,12 @@ expect(manager.proxies).to be_empty end - it 'can returns Proxy objects' do + it 'returns Proxy objects' do manager = ProxyFetcher::Manager.new expect(manager.proxies).to all(be_a(ProxyFetcher::Proxy)) end - it 'can returns raw proxies' do + it 'returns raw proxies (HOST:PORT)' do manager = ProxyFetcher::Manager.new expect(manager.raw_proxies).to all(be_a(String)) end