Skip to content

Commit

Permalink
Refactor proxy & providers
Browse files Browse the repository at this point in the history
  • Loading branch information
nbulaj committed Aug 23, 2017
1 parent 0d4460b commit e861803
Show file tree
Hide file tree
Showing 12 changed files with 103 additions and 98 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ script: "rake spec"
rvm:
- 2.2.4
- 2.3.3
- 2.4.0
- 2.4.1
- ruby-head
matrix:
allow_failures:
Expand Down
64 changes: 41 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ manager = ProxyFetcher::Manager.new # will immediately load proxy list from the
manager.proxies

#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
# @response_time=5217, @speed=48, @type="HTTP", @anonymity="High">, ... ]
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
```

You can initialize proxy manager without immediate load of proxy list from the remote server by passing `refresh: false` on initialization:
Expand All @@ -75,8 +75,8 @@ Get raw proxy URLs as Strings:
manager = ProxyFetcher::Manager.new
manager.raw_proxies

# => ["http://97.77.104.22:3128", "http://94.23.205.32:3128", "http://209.79.65.140:8080",
# "http://91.217.42.2:8080", "http://97.77.104.22:80", "http://165.234.102.177:8080", ...]
# => ["97.77.104.22:3128", "94.23.205.32:3128", "209.79.65.140:8080",
# "91.217.42.2:8080", "97.77.104.22:80", "165.234.102.177:8080", ...]
```

If `ProxyFetcher::Manager` was already initialized somewhere, you can refresh the proxy list by calling `#refresh_list!` method:
Expand All @@ -85,7 +85,7 @@ If `ProxyFetcher::Manager` was already initialized somewhere, you can refresh th
manager.refresh_list! # or manager.fetch!

#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
# @response_time=5217, @speed=48, @type="HTTP", @anonymity="High">, ... ]
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
```

If you need to filter proxy list, for example, by country or response time and selected provider supports filtering by GET params, then you
Expand Down Expand Up @@ -117,19 +117,23 @@ then you already have Ruby 2.3 installed. In other cases you can install it with
Just install the gem by running `gem install proxy_fetcher` in your terminal and run it:

```bash
proxy_fetcher >> proxies.txt # Will download proxies, validate them and write to file
proxy_fetcher >> proxies.txt # Will download proxies from the default provider, validate them and write to file
```

If you need a list of proxies in JSON then pass `--json` argument to the command:
If you need a list of proxies from some specific provider, then you need to pass it's name with `-p` option:

```bash
proxy_fetcher -p proxy_docker >> proxies.txt # Will download proxies from the default provider, validate them and write to file
```

If you need a list of proxies in JSON format just pass a `--json` option to the command:

```bash
proxy_fetcher --json

# Will print:
# {"proxies":["https://120.26.206.178:8888","https://119.61.13.242:1080","https://117.40.213.26:1080","https://92.62.72.242:1080",
# "https://58.20.41.172:1080","https://204.116.192.151:35923","https://190.5.96.58:1080","https://170.250.109.97:35923",
# "https://121.41.82.99:1080","https://77.53.105.155:35923"]}

# {"proxies":["120.26.206.178:80","119.61.13.242:1080","117.40.213.26:80","92.62.72.242:1080","77.53.105.155:3124"
# "58.20.41.172:35923","204.116.192.151:35923","190.5.96.58:1080","170.250.109.97:35923","121.41.82.99:1080"]}
```

To get all the possible options run:
Expand All @@ -144,26 +148,21 @@ Every proxy is a `ProxyFetcher::Proxy` object that has next readers (instance va

* `addr` (IP address)
* `port`
* `type` (proxy type, can be HTTP, HTTPS, SOCKS4 or/and SOCKS5)
* `country` (USA or Brazil for example)
* `response_time` (5217 for example)
* `speed` (`:slow`, `:medium` or `:fast`. **Note:** depends on the proxy provider and can be `nil`)
* `type` (URI schema, HTTP or HTTPS)
* `anonymity` (`Low`, `Elite proxy` or `High +KA` for example)

Also you can call next instance methods for every Proxy object:

* `connectable?` (whether proxy server is available)
* `http?` (whether proxy server has a HTTP protocol)
* `https?` (whether proxy server has a HTTPS protocol)
* `socks4?`
* `socks5?`
* `uri` (returns `URI::Generic` object)
* `url` (returns a formatted URL like "_http://IP:PORT_" )

You can sort or find any proxy by speed using next 3 instance methods (if it is available for the specific provider):

* `fast?`
* `medium?`
* `slow?`'

## Configuration

To change open/read timeout for `cleanup!` and `connectable?` methods you need to change ProxyFetcher.config:
Expand All @@ -188,10 +187,6 @@ class MyHTTPClient
def self.fetch(url)
# ... some magic to return proper HTML ...
end

def self.connectable?(url)
# ... some magic to check if url is connectable ...
end
end

ProxyFetcher.config.http_client = MyHTTPClient
Expand All @@ -200,11 +195,34 @@ manager = ProxyFetcher::Manager.new
manager.proxies

#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
# @response_time=5217, @speed=48, @type="HTTP", @anonymity="High">, ... ]
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
```

You can take a look at the [lib/proxy_fetcher/utils/http_client.rb](lib/proxy_fetcher/utils/http_client.rb) for an example.

Moreover, you can write your own proxy validator to check if proxy is valid or not:

```ruby
class MyProxyValidator
# [IMPORTANT]: below methods are required!
def self.connectable?(proxy_addr, proxy_port)
# ... some magic to check if proxy is valid ...
end
end

ProxyFetcher.config.proxy_validator = MyProxyValidator

manager = ProxyFetcher::Manager.new
manager.proxies

#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]

manager.validate!

#=> [ ... ]
```

## Providers

Currently ProxyFetcher can deal with next proxy providers (services):
Expand Down
9 changes: 2 additions & 7 deletions lib/proxy_fetcher/providers/base.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,6 @@ class Base

def_delegators ProxyFetcher::HTML, :clear, :convert_to_int

PROXY_TYPES = [
HTTP = 'HTTP'.freeze,
HTTPS = 'HTTPS'.freeze
].freeze

attr_reader :proxy

def fetch_proxies!(filters = {})
Expand Down Expand Up @@ -45,8 +40,8 @@ def to_proxy(*)
end

# Return normalized HTML element content by selector
def parse_element(element, selector, method = :at_xpath)
clear(element.public_send(method, selector).content)
def parse_element(parent, selector, method = :at_xpath)
clear(parent.public_send(method, selector).content)
end
end
end
Expand Down
4 changes: 2 additions & 2 deletions lib/proxy_fetcher/providers/free_proxy_list.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ def to_proxy(html_element)
private

def parse_type(element)
type = parse_element(element, 'td[6]')
type && type.casecmp('yes').zero? ? HTTPS : HTTP
https = parse_element(element, 'td[6]')
https && https.casecmp('yes').zero? ? ProxyFetcher::Proxy::HTTPS : ProxyFetcher::Proxy::HTTP
end
end

Expand Down
2 changes: 1 addition & 1 deletion lib/proxy_fetcher/providers/free_proxy_list_ssl.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ def to_proxy(html_element)
proxy.port = convert_to_int(parse_element(html_element, 'td[2]'))
proxy.country = parse_element(html_element, 'td[4]')
proxy.anonymity = parse_element(html_element, 'td[5]')
proxy.type = HTTPS
proxy.type = ProxyFetcher::Proxy::HTTPS
end
end
end
Expand Down
29 changes: 2 additions & 27 deletions lib/proxy_fetcher/providers/hide_my_name.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,9 @@ def to_proxy(html_element)
proxy.addr = parse_element(html_element, 'td[1]')
proxy.port = convert_to_int(parse_element(html_element, 'td[2]'))
proxy.anonymity = parse_element(html_element, 'td[6]')

proxy.country = parse_country(html_element)
proxy.type = parse_type(html_element)

response_time = parse_response_time(html_element)

proxy.response_time = response_time
proxy.speed = speed_from_response_time(response_time)
proxy.type = parse_element(html_element, 'td[5]')
proxy.response_time = parse_response_time(html_element)
end
end

Expand All @@ -30,29 +25,9 @@ def parse_country(element)
clear(element.at_xpath('*//span[1]/following-sibling::text()[1]').content)
end

def parse_type(element)
schemas = parse_element(element, 'td[5]')

if schemas && schemas.downcase.include?('https')
HTTPS
else
HTTP
end
end

def parse_response_time(element)
convert_to_int(element.at_xpath('td[4]').content.strip[/\d+/])
end

def speed_from_response_time(response_time)
if response_time < 1500
:fast
elsif response_time < 3000
:medium
else
:slow
end
end
end

ProxyFetcher::Configuration.register_provider(:hide_my_name, HideMyName)
Expand Down
9 changes: 8 additions & 1 deletion lib/proxy_fetcher/providers/xroxy.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,18 @@ def to_proxy(html_element)
proxy.addr = parse_element(html_element, 'td[2]')
proxy.port = convert_to_int(parse_element(html_element, 'td[3]'))
proxy.anonymity = parse_element(html_element, 'td[4]')
proxy.type = parse_element(html_element, 'td[5]').casecmp('true').zero? ? HTTPS : HTTP
proxy.country = parse_element(html_element, 'td[6]')
proxy.response_time = convert_to_int(parse_element(html_element, 'td[7]'))
proxy.type = parse_type(html_element)
end
end

private

def parse_type(element)
https = parse_element(element, 'td[5]')
https.casecmp('true').zero? ? ProxyFetcher::Proxy::HTTPS : ProxyFetcher::Proxy::HTTP
end
end

ProxyFetcher::Configuration.register_provider(:xroxy, XRoxy)
Expand Down
33 changes: 18 additions & 15 deletions lib/proxy_fetcher/proxy.rb
Original file line number Diff line number Diff line change
@@ -1,31 +1,34 @@
module ProxyFetcher
class Proxy < OpenStruct
def connectable?
ProxyFetcher.config.http_client.connectable?(url)
end
class Proxy
attr_accessor :addr, :port, :type, :country, :response_time, :anonymity

alias valid? connectable?
TYPES = [
HTTP = 'HTTP'.freeze,
HTTPS = 'HTTPS'.freeze,
SOCKS4 = 'SOCKS4'.freeze,
SOCKS5 = 'SOCKS5'.freeze
].freeze

%i[slow medium fast].each do |method|
define_method "#{method}?" do
speed == method
TYPES.each do |proxy_type|
define_method "#{proxy_type.downcase}?" do
type && type.upcase.include?(proxy_type)
end
end

def http?
type.casecmp('http').zero?
end
alias ssl? https?

def https?
type.casecmp('https').zero?
def connectable?
ProxyFetcher.config.proxy_validator.connectable?(addr, port)
end

alias valid? connectable?

def uri
URI::Generic.build(host: addr, port: port, scheme: type)
URI::Generic.build(host: addr, port: port)
end

def url
uri.to_s
"#{addr}:#{port}"
end
end
end
2 changes: 1 addition & 1 deletion lib/proxy_fetcher/version.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ module VERSION
# Minor version number
MINOR = 3
# Smallest version number
TINY = 0
TINY = 1

# Full version number
STRING = [MAJOR, MINOR, TINY].compact.join('.')
Expand Down
23 changes: 19 additions & 4 deletions spec/proxy_fetcher/configuration_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,35 @@ class MyHTTPClient
def self.fetch(url)
url
end
end

expect { ProxyFetcher.config.http_client = MyHTTPClient }.not_to raise_error
end

it 'failed on setup if required methods are missing' do
MyWrongHTTPClient = Class.new

expect { ProxyFetcher.config.http_client = MyWrongHTTPClient }
.to raise_error(ProxyFetcher::Configuration::WrongCustomClass)
end
end

context 'custom proxy validator' do
it 'successfully setups if class has all the required methods' do
class MyProxyValidator
def self.connectable?(*)
true
end
end

expect { ProxyFetcher.config.http_client = MyHTTPClient }.not_to raise_error
expect { ProxyFetcher.config.proxy_validator = MyProxyValidator }.not_to raise_error
end

it 'failed on setup if required methods are missing' do
MyWrongHTTPClient = Class.new
MyWrongProxyValidator = Class.new

expect { ProxyFetcher.config.http_client = MyWrongHTTPClient }
.to raise_error(ProxyFetcher::Configuration::WrongHttpClient)
expect { ProxyFetcher.config.proxy_validator = MyWrongProxyValidator }
.to raise_error(ProxyFetcher::Configuration::WrongCustomClass)
end
end

Expand Down
20 changes: 6 additions & 14 deletions spec/proxy_fetcher/proxy_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,16 @@
let(:proxy) { @manager.proxies.first.dup }

it 'checks schema' do
proxy.type = ProxyFetcher::Providers::Base::HTTP
proxy.type = ProxyFetcher::Proxy::HTTP
expect(proxy.http?).to be_truthy
expect(proxy.https?).to be_falsey

proxy.type = ProxyFetcher::Providers::Base::HTTPS
proxy.type = ProxyFetcher::Proxy::HTTPS
expect(proxy.https?).to be_truthy
expect(proxy.http?).to be_falsey
expect(proxy.http?).to be_truthy

proxy.type = ProxyFetcher::Proxy::SOCKS5
expect(proxy.socks5?).to be_truthy
end

it 'not connectable if IP addr is wrong' do
Expand All @@ -44,15 +47,4 @@
it 'returns URL' do
expect(proxy.url).to be_a(String)
end

it 'checks speed' do
proxy.speed = :fast
expect(proxy.fast?).to be_truthy

proxy.speed = :slow
expect(proxy.slow?).to be_truthy

proxy.speed = :medium
expect(proxy.medium?).to be_truthy
end
end

0 comments on commit e861803

Please sign in to comment.