Skip to content

implemented fake-useragent package#49

Merged
pgaref merged 10 commits intopgaref:masterfrom
la55u:master
Jul 8, 2018
Merged

implemented fake-useragent package#49
pgaref merged 10 commits intopgaref:masterfrom
la55u:master

Conversation

@la55u
Copy link
Copy Markdown
Contributor

@la55u la55u commented Jul 6, 2018

The default way the program gets user agents is now from an online, up-to-date database
with the help of the fake-useragent package. Reading useragents from a custom local
file is still available as a parameter to the UserAgentManager class. Solves #28.

la55u added 7 commits July 4, 2018 16:05
Proxies can now be filtered with an optional parameter in the RequestProxy constructor.
For this the parsers have to set the protocol for each proxy object they build.
Protocols are stored in the ProxyObject.Protocol enum.
PremProxy: parsing was broken because the port number was 'encrypted' and no longer stored in CSS.
They are now obtained from a javascript file that holds a function to the key-port pairs.
The default way the program gets user agents is now from an online, up-to-date database
with the help of the fake-useragent package. Reading useragents from a custom local
file is still available as a parameter to the UserAgentManager class. Solves pgaref#28.
@coveralls
Copy link
Copy Markdown

coveralls commented Jul 6, 2018

Coverage Status

Coverage increased (+1.02%) to 62.344% when pulling 58d69f5 on la55u:master into fac206a on pgaref:master.

Copy link
Copy Markdown
Owner

@pgaref pgaref left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @la55u !
Looks good in general - check some inline comments.

def __init__(self, agent_file=os.path.join(os.path.dirname(__file__), '../data/user_agents.txt')):
self.agent_file = agent_file
self.useragents = self.load_user_agents(self.agent_file)
def __init__(self, fallback=None, file=None):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is fallback used here exactly?

Copy link
Copy Markdown
Contributor Author

@la55u la55u Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user can specify an arbitrary user agent as a fallback. It would be used in the rare occasion when some error happens in fake-useragent while processing the request (network error etc)
Or if we explicitly set a default fallback that would be better?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @la55u
Just took a better look at the library, my suggestions:

  • Let's avoid storing collected data at the os temp dir (keep everything in memory) by using cache=False argument
  • Also lets always use a fallback browser (could be the most popular one) as we dont want out library to fail for any reason

else:
logger.info('Using fake-useragent package for user agents.')
fb = fallback
self.fakeuseragent = FakeUserAgent(fallback=fb)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

user_agent = random.choice(self.useragents)
return user_agent.decode('utf-8')
else:
return self.fakeuseragent.random
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - are we sure the fakeuseragent.random is in utf-8 format?
We had issues with that in the past


def get_first_user_agent(self):
return self.useragents[0].decode('utf-8')
return self.useragents[0].decode('utf-8') if self.agent_file else None
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we print a WARNING message when this method is called with fakeuseragent?

Copy link
Copy Markdown
Contributor Author

@la55u la55u Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should, but I just noticed the logging doesn't work in this class. Can you help me why?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get rid of the logger.setLevel part - this is done in the caller/main class


def get_last_user_agent(self):
return self.useragents[-1].decode('utf-8')
return self.useragents[-1].decode('utf-8') if self.agent_file else None
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above


def get_len_user_agent(self):
return len(self.useragents)
return len(self.useragents) if self.agent_file else None
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Comment thread setup.py
'requests >= 2.18.4',
'pyOpenSSL >= 17.5.0'
'pyOpenSSL >= 17.5.0',
'fake-useragent >= 0.1.10'
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if self.agent_file:
return self.useragents[0].decode('utf-8')
else:
logger.warning('User-Agents file not set')
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
We could make the message a bit more descriptive like:
"Fake useragent lib does not support operaration blah - change to user-agent file?"

from fake_useragent import FakeUserAgent
import logging

logger = logging.getLogger(__name__)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@pgaref pgaref merged commit 5322364 into pgaref:master Jul 8, 2018
@pgaref
Copy link
Copy Markdown
Owner

pgaref commented Jul 8, 2018

Merged! Thanks again for the PR @la55u !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants