New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creation of HarEntry class #39
Conversation
- Adds HarEntry class which will allow for easier entry parsing - Adds Request and Response sub-classes for HarEntry - One test left to fix
Sync with master
This comment has been minimized.
This comment has been minimized.
- Made it so HarEntry, HarEntry.Request, and HarEntry.Response act like dicts - Changed type evaluations to isinstance - Rename sub_classes to http - Added Mixins for shared functions - Removed try/expect where appropriate
This comment has been minimized.
This comment has been minimized.
Great work on that, thanks for taking the time! Super close to being ready to merge 👍 |
- Changed README.rst - Changed six requirement 1.13.0 - Fixed weird variable name
if isinstance(changed_args[0], dict): | ||
changed_args[0] = HarEntry(changed_args[0]) | ||
# For some cases have HarParser as the first type with the Entry and second | ||
if isinstance(changed_args[0], HarParser): | ||
changed_args[1] = HarEntry(changed_args[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrname
Don't know if you know a better way to do this. When I was running the tests, the parser match_headers
and match_request_type
tests were failing because the first arguement was HarParser
with the dict
as the second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you post the traceback? I just ran the test suite and hit only a single failure:
========================================================== FAILURES ==========================================================
______________________________________________ test_init_entry_with_no_pageref _______________________________________________
self = <[KeyError("'url'") raised in repr()] HarEntry object at 0x10bd98748>
@cached_property
def startTime(self):
try:
> return parser.parse(self.raw_entry.get("startedDateTime", ""))
haralyzer/assets.py:680:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
timestr = '', parserinfo = None, kwargs = {}
def parse(timestr, parserinfo=None, **kwargs):
"""
Parse a string in one of the supported formats, using the
``parserinfo`` parameters.
:param timestr:
A string containing a date/time stamp.
:param parserinfo:
A :class:`parserinfo` object containing parameters for the parser.
If ``None``, the default arguments to the :class:`parserinfo`
constructor are used.
The ``**kwargs`` parameter takes the following keyword arguments:
:param default:
The default datetime object, if this is a datetime object and not
``None``, elements specified in ``timestr`` replace elements in the
default object.
:param ignoretz:
If set ``True``, time zones in parsed strings are ignored and a naive
:class:`datetime` object is returned.
:param tzinfos:
Additional time zone names / aliases which may be present in the
string. This argument maps time zone names (and optionally offsets
from those time zones) to time zones. This parameter can be a
dictionary with timezone aliases mapping time zone names to time
zones or a function taking two parameters (``tzname`` and
``tzoffset``) and returning a time zone.
The timezones to which the names are mapped can be an integer
offset from UTC in seconds or a :class:`tzinfo` object.
.. doctest::
:options: +NORMALIZE_WHITESPACE
>>> from dateutil.parser import parse
>>> from dateutil.tz import gettz
>>> tzinfos = {"BRST": -7200, "CST": gettz("America/Chicago")}
>>> parse("2012-01-19 17:21:00 BRST", tzinfos=tzinfos)
datetime.datetime(2012, 1, 19, 17, 21, tzinfo=tzoffset(u'BRST', -7200))
>>> parse("2012-01-19 17:21:00 CST", tzinfos=tzinfos)
datetime.datetime(2012, 1, 19, 17, 21,
tzinfo=tzfile('/usr/share/zoneinfo/America/Chicago'))
This parameter is ignored if ``ignoretz`` is set.
:param dayfirst:
Whether to interpret the first value in an ambiguous 3-integer date
(e.g. 01/05/09) as the day (``True``) or month (``False``). If
``yearfirst`` is set to ``True``, this distinguishes between YDM and
YMD. If set to ``None``, this value is retrieved from the current
:class:`parserinfo` object (which itself defaults to ``False``).
:param yearfirst:
Whether to interpret the first value in an ambiguous 3-integer date
(e.g. 01/05/09) as the year. If ``True``, the first number is taken to
be the year, otherwise the last number is taken to be the year. If
this is set to ``None``, the value is retrieved from the current
:class:`parserinfo` object (which itself defaults to ``False``).
:param fuzzy:
Whether to allow fuzzy parsing, allowing for string like "Today is
January 1, 2047 at 8:21:00AM".
:param fuzzy_with_tokens:
If ``True``, ``fuzzy`` is automatically set to True, and the parser
will return a tuple where the first element is the parsed
:class:`datetime.datetime` datetimestamp and the second element is
a tuple containing the portions of the string which were ignored:
.. doctest::
>>> from dateutil.parser import parse
>>> parse("Today is January 1, 2047 at 8:21:00AM", fuzzy_with_tokens=True)
(datetime.datetime(2047, 1, 1, 8, 21), (u'Today is ', u' ', u'at '))
:return:
Returns a :class:`datetime.datetime` object or, if the
``fuzzy_with_tokens`` option is ``True``, returns a tuple, the
first element being a :class:`datetime.datetime` object, the second
a tuple containing the fuzzy tokens.
:raises ValueError:
Raised for invalid or unknown string format, if the provided
:class:`tzinfo` is not in a valid format, or if an invalid date
would be created.
:raises OverflowError:
Raised if the parsed date exceeds the largest valid C integer on
your system.
"""
if parserinfo:
return parser(parserinfo).parse(timestr, **kwargs)
else:
> return DEFAULTPARSER.parse(timestr, **kwargs)
../../.pyenv/versions/3.6.0/envs/haralyzer/lib/python3.6/site-packages/dateutil/parser/_parser.py:1356:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <dateutil.parser._parser.parser object at 0x10bd07c50>, timestr = '', default = datetime.datetime(2020, 9, 13, 0, 0)
ignoretz = False, tzinfos = None, kwargs = {}, res = _result(), skipped_tokens = None
def parse(self, timestr, default=None,
ignoretz=False, tzinfos=None, **kwargs):
"""
Parse the date/time string into a :class:`datetime.datetime` object.
:param timestr:
Any date/time string using the supported formats.
:param default:
The default datetime object, if this is a datetime object and not
``None``, elements specified in ``timestr`` replace elements in the
default object.
:param ignoretz:
If set ``True``, time zones in parsed strings are ignored and a
naive :class:`datetime.datetime` object is returned.
:param tzinfos:
Additional time zone names / aliases which may be present in the
string. This argument maps time zone names (and optionally offsets
from those time zones) to time zones. This parameter can be a
dictionary with timezone aliases mapping time zone names to time
zones or a function taking two parameters (``tzname`` and
``tzoffset``) and returning a time zone.
The timezones to which the names are mapped can be an integer
offset from UTC in seconds or a :class:`tzinfo` object.
.. doctest::
:options: +NORMALIZE_WHITESPACE
>>> from dateutil.parser import parse
>>> from dateutil.tz import gettz
>>> tzinfos = {"BRST": -7200, "CST": gettz("America/Chicago")}
>>> parse("2012-01-19 17:21:00 BRST", tzinfos=tzinfos)
datetime.datetime(2012, 1, 19, 17, 21, tzinfo=tzoffset(u'BRST', -7200))
>>> parse("2012-01-19 17:21:00 CST", tzinfos=tzinfos)
datetime.datetime(2012, 1, 19, 17, 21,
tzinfo=tzfile('/usr/share/zoneinfo/America/Chicago'))
This parameter is ignored if ``ignoretz`` is set.
:param \\*\\*kwargs:
Keyword arguments as passed to ``_parse()``.
:return:
Returns a :class:`datetime.datetime` object or, if the
``fuzzy_with_tokens`` option is ``True``, returns a tuple, the
first element being a :class:`datetime.datetime` object, the second
a tuple containing the fuzzy tokens.
:raises ValueError:
Raised for invalid or unknown string format, if the provided
:class:`tzinfo` is not in a valid format, or if an invalid date
would be created.
:raises TypeError:
Raised for non-string or character stream input.
:raises OverflowError:
Raised if the parsed date exceeds the largest valid C integer on
your system.
"""
if default is None:
default = datetime.datetime.now().replace(hour=0, minute=0,
second=0, microsecond=0)
res, skipped_tokens = self._parse(timestr, **kwargs)
if res is None:
raise ValueError("Unknown string format:", timestr)
if len(res) == 0:
> raise ValueError("String does not contain a date:", timestr)
E ValueError: ('String does not contain a date:', '')
../../.pyenv/versions/3.6.0/envs/haralyzer/lib/python3.6/site-packages/dateutil/parser/_parser.py:651: ValueError
During handling of the above exception, another exception occurred:
har_data = <function har_data.<locals>.load_doc at 0x10b431e18>
def test_init_entry_with_no_pageref(har_data):
'''
If we find an entry with no pageref it should end up in a HarPage object
with page ID of unknown
'''
data = har_data('missing_pageref.har')
har_parser = HarParser(data)
# We should have two pages. One is defined in the pages key of the har file
# but has no entries. The other should be our unknown page, with a single
# entry
assert len(har_parser.pages) == 2
page = [p for p in har_parser.pages if p.page_id == 'unknown'][0]
> assert len(page.entries) == 1
tests/test_parser.py:47:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../.pyenv/versions/3.6.0/envs/haralyzer/lib/python3.6/site-packages/cached_property.py:35: in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
haralyzer/assets.py:469: in entries
if all(x.startTime for x in page_entries):
haralyzer/assets.py:469: in <genexpr>
if all(x.startTime for x in page_entries):
../../.pyenv/versions/3.6.0/envs/haralyzer/lib/python3.6/site-packages/cached_property.py:35: in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <[KeyError("'url'") raised in repr()] HarEntry object at 0x10bd98748>
@cached_property
def startTime(self):
try:
return parser.parse(self.raw_entry.get("startedDateTime", ""))
> except parser._parser.ParserError:
E AttributeError: module 'dateutil.parser._parser' has no attribute 'ParserError'
haralyzer/assets.py:681: AttributeError
Oddly though the Travic CI build passed... something weird in my environment maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the full error if you comment out:
if isinstance(changed_args[0], HarParser):
changed_args[1] = HarEntry(changed_args[1])
============================================================================================================== FAILURES ===============================================================================================================
_________________________________________________________________________________________________________ test_match_headers __________________________________________________________________________________________________________
har_data = <function har_data.<locals>.load_doc at 0x000001CEF6530DC0>
def test_match_headers(har_data):
# The HarParser does not work without a full har file, but we only want
# to test a piece, so this initial load is just so we can get the object
# loaded, we don't care about the data in that HAR file.
init_data = har_data('humanssuck.net.har')
har_parser = HarParser(init_data)
raw_headers = har_data('single_entry.har')
# Make sure that bad things happen if we don't give it response/request
test_data = {'captain beefheart':
{'accept': '.*text/html,application/xhtml.*',
'host': 'humanssuck.*',
'accept-encoding': '.*deflate',
},
}
with pytest.raises(ValueError):
_headers_test(har_parser, raw_headers, test_data, True, True)
# TEST THE REGEX FEATURE FIRST #
# These should all be True
test_data = {'request':
{'accept': '.*text/html,application/xhtml.*',
'host': 'humanssuck.*',
'accept-encoding': '.*deflate',
},
'response':
{'server': 'nginx',
'content-type': 'text.*',
'connection': '.*alive',
},
}
> _headers_test(har_parser, raw_headers, test_data, True, True)
tests\test_parser.py:85:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests\test_parser.py:229: in _headers_test
is_match = parser.match_headers(
haralyzer\assets.py:34: in inner
return func(*tuple(changed_args), **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <haralyzer.assets.HarParser object at 0x000001CEF67060A0>
entry = {'cache': {}, 'connection': '80', 'pageref': 'page_3', 'request': {'bodySize': -1, 'cookies': [], 'headers': [{'name':...pt-Encoding', 'value': 'gzip, deflate'}, {'name': 'Connection', 'value': 'keep-alive'}], 'headersSize':
292, ...}, ...}
header_type = 'request', header = 'accept', value = '.*text/html,application/xhtml.*', regex = True
@convert_to_entry
def match_headers(self, entry, header_type, header, value, regex=True):
"""
Function to match headers.
Since the output of headers might use different case, like:
'content-type' vs 'Content-Type'
This function is case-insensitive
:param entry: ``HarEntry`` object to analyze
:param header_type: ``str`` of header type. Valid values:
* 'request'
* 'response'
:param header: ``str`` of the header to search for
:param value: ``str`` of value to search for
:param regex: ``bool`` indicating whether to use regex or exact match
:returns: a ``bool`` indicating whether a match was found
"""
if header_type not in ["request", "response"]:
raise ValueError('Invalid header_type, should be either:\n\n'
'* \'request\'\n*\'response\'')
# TODO - headers are empty in some HAR data.... need fallbacks here
> for h in getattr(entry, header_type).headers:
E AttributeError: 'dict' object has no attribute 'request'
haralyzer\assets.py:84: AttributeError
_______________________________________________________________________________________________________ test_match_request_type _______________________________________________________________________________________________________
har_data = <function har_data.<locals>.load_doc at 0x000001CEF6593040>
def test_match_request_type(har_data):
"""
Tests the ability of the parser to match a request type.
"""
# The HarParser does not work without a full har file, but we only want
# to test a piece, so this initial load is just so we can get the object
# loaded, we don't care about the data in that HAR file.
init_data = har_data('humanssuck.net.har')
har_parser = HarParser(init_data)
entry = har_data('single_entry.har')
# TEST THE REGEX FEATURE FIRST #
> assert har_parser.match_request_type(entry, '.*ET')
tests\test_parser.py:146:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
haralyzer\assets.py:34: in inner
return func(*tuple(changed_args), **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <haralyzer.assets.HarParser object at 0x000001CEF6543CD0>
entry = {'cache': {}, 'connection': '80', 'pageref': 'page_3', 'request': {'bodySize': -1, 'cookies': [], 'headers': [{'name':...pt-Encoding', 'value': 'gzip, deflate'}, {'name': 'Connection', 'value': 'keep-alive'}], 'headersSize':
292, ...}, ...}
request_type = '.*ET', regex = True
@convert_to_entry
def match_request_type(self, entry, request_type, regex=True):
"""
Helper function that returns entries with a request type
matching the given `request_type` argument.
:param entry: ``HarEntry`` object to analyze
:param request_type: ``str`` of request type to match
:param regex: ``bool`` indicating whether to use a regex or string match
"""
if regex:
> return re.search(request_type, entry.request.method,
flags=re.IGNORECASE) is not None
E AttributeError: 'dict' object has no attribute 'request'
haralyzer\assets.py:122: AttributeError
_______________________________________________________________________________________________________ test_match_status_code ________________________________________________________________________________________________________
har_data = <function har_data.<locals>.load_doc at 0x000001CEF6AC1DC0>
def test_match_status_code(har_data):
"""
Tests the ability of the parser to match status codes.
"""
init_data = har_data('humanssuck.net.har')
har_parser = HarParser(init_data)
entry = har_data('single_entry.har')
# TEST THE REGEX FEATURE FIRST #
> assert har_parser.match_status_code(entry, '2.*')
tests\test_parser.py:163:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
haralyzer\assets.py:34: in inner
return func(*tuple(changed_args), **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <haralyzer.assets.HarParser object at 0x000001CEF66A0760>
entry = {'cache': {}, 'connection': '80', 'pageref': 'page_3', 'request': {'bodySize': -1, 'cookies': [], 'headers': [{'name':...pt-Encoding', 'value': 'gzip, deflate'}, {'name': 'Connection', 'value': 'keep-alive'}], 'headersSize':
292, ...}, ...}
status_code = '2.*', regex = True
@convert_to_entry
def match_status_code(self, entry, status_code, regex=True):
"""
Helper function that returns entries with a status code matching
then given `status_code` argument.
NOTE: This is doing a STRING comparison NOT NUMERICAL
:param entry: entry object to analyze
:param status_code: ``str`` of status code to search for
:param regex: ``bool`` indicating whether to use a regex or string match
"""
if regex:
return re.search(status_code,
> str(entry.response.status)) is not None
E AttributeError: 'dict' object has no attribute 'response'
haralyzer\assets.py:159: AttributeError
======================================================================================================= short test summary info =======================================================================================================
FAILED tests/test_parser.py::test_match_headers - AttributeError: 'dict' object has no attribute 'request'
FAILED tests/test_parser.py::test_match_request_type - AttributeError: 'dict' object has no attribute 'request'
FAILED tests/test_parser.py::test_match_status_code - AttributeError: 'dict' object has no attribute 'response'
For you errors I would check the python version and the version of dateutil
@Cyb3r-Jak3 unfortunately I am going to be AFK for the next two weeks or so at least. This is super close to be ready to merge so I will get back to reviewing ASAP. |
@mrname |
I have also created These test that haralyzer work with these browsers and ensures future changes I realize this is a lot of data to add for this PR but I feel that it is worth it. |
@Cyb3r-Jak3 sorry for the delay and I totally understand the need to fork. I am going to merge and release this soon. Would you like to be a maintainer on this project instead of managing a fork? |
@mrname No worries about the delay. I would definitely like to become a mantainer |
@mrname Thank you for giving me permission. Please let me know if you want me to change anything before merging this. |
@Cyb3r-Jak3 I will be honest that I won't have time to do another thorough review soon, but I can tell that you have been very careful and put a lot of effort into this 👍 . Feel free to merge when you feel comfortable doing so. What is your username on pypi? I can add you so that you are able to release as well. There is currently no CI around releasing or anything (feel free to add this if you are inspired). I normally just merge, manually bump the version, and then release with setup.py. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Thanks for the contribution, as well as volunteering to help maintain this package!
@mrname My name of pypi is Cyber_Jake. I have build previous systems of automated deployment and even have a template repo for it so will do. (Also can you make me an admin on the repo so I can add the correct secrets to created automated deployments) |
@Cyb3r-Jak3 it looks like collaborator is the highest level of access I can give you. Happy to set anything up on the repo level you need, or we could even transfer/fork the repo to your account if you prefer. |
@mrname Sounds good. I will merge this then create a new PR for the CI deployment. |
This PR creates a new class
HarEntry()
which is an object for each entry in a page. This allows for easier use when using a lot of entries as you don't have to parse the JSON for each one. Common values are set as attributes and it is easy to expand for further use cases.Notable Changes:
HarEntry()
class in assests.pyRequest
andResponse
classes in new sub_classes file