Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creation of HarEntry class #39

Merged
merged 32 commits into from Dec 19, 2020
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
b5860a2
First Big Change with new HarEntry class
Cyb3r-Jak3 Aug 30, 2020
1c8e3d9
More changes to sub_classes
Cyb3r-Jak3 Aug 30, 2020
ea82283
Fixed last test
Cyb3r-Jak3 Aug 30, 2020
ce1a799
Merge pull request #1 from mrname/master
Cyb3r-Jak3 Aug 30, 2020
ea0db7c
More work on 2.7
Cyb3r-Jak3 Aug 30, 2020
8e40ccf
Working on 2.7 again
Cyb3r-Jak3 Aug 30, 2020
1391564
Added documentation and added a HarEntry status attribute
Cyb3r-Jak3 Aug 31, 2020
61c14e5
Had to remove all typing for 2.7 :(
Cyb3r-Jak3 Aug 31, 2020
651339a
Added tests for entry
Cyb3r-Jak3 Aug 31, 2020
71dc482
Found more typing
Cyb3r-Jak3 Aug 31, 2020
2c2d5e4
Seeing if this works for 2.7
Cyb3r-Jak3 Aug 31, 2020
6af96ef
Updated doc string and added test for request.cookies and entry heade…
Cyb3r-Jak3 Aug 31, 2020
858579f
Clean Up of Code:
Cyb3r-Jak3 Sep 5, 2020
3529e6e
Added more methods to mimic a dict
Cyb3r-Jak3 Sep 5, 2020
9bcc2db
Fixed tests for 2.7
Cyb3r-Jak3 Sep 5, 2020
0418234
Updated the coverage
Cyb3r-Jak3 Sep 5, 2020
bc56274
Better MimicDict and New Mixin
Cyb3r-Jak3 Sep 6, 2020
bca31b4
Fixes 2.7 Issues:
Cyb3r-Jak3 Sep 6, 2020
cc15335
More changes for 2.7
Cyb3r-Jak3 Sep 6, 2020
431ff84
Couple of tweaks to improve tests
Cyb3r-Jak3 Sep 6, 2020
4ee61e0
Updated Changes
Cyb3r-Jak3 Sep 12, 2020
edfdff5
Added convert_to_entry decorator
Cyb3r-Jak3 Sep 12, 2020
8fa1dd4
Added Tests for Firefox and Chrome to make sure future changes aren't…
Cyb3r-Jak3 Sep 25, 2020
f1c5461
Working on travis debugging
Cyb3r-Jak3 Sep 25, 2020
487c24f
Trying out tox
Cyb3r-Jak3 Sep 25, 2020
999df88
Tox should be a seperate PR
Cyb3r-Jak3 Sep 25, 2020
1547658
Fixed the travis tests
Cyb3r-Jak3 Sep 25, 2020
a6dfee3
Makes HarPage oject iterable
Cyb3r-Jak3 Oct 11, 2020
b6a8df2
Added test for next to up the coverage
Cyb3r-Jak3 Oct 11, 2020
de9b811
Adds in pypy for testing
Cyb3r-Jak3 Oct 11, 2020
c2e1ce9
Working with python 2
Cyb3r-Jak3 Oct 11, 2020
07e866e
Drops pypy from testing. The version installed on travis too out of date
Cyb3r-Jak3 Oct 11, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Expand Up @@ -8,6 +8,7 @@ __pycache__/
# Distribution / packaging
.Python
env/
.venv
build/
develop-eggs/
dist/
Expand Down Expand Up @@ -57,3 +58,6 @@ target/

# Vim stuff
.ropeproject/

# Pycharm settings
.idea
2 changes: 2 additions & 0 deletions .travis.yml
Expand Up @@ -6,6 +6,8 @@ matrix:
python:
- "2.7"
- "3.6"
- "3.7"
- "3.8"
install:
- "pip install ."
- "pip install -r requirements_dev.txt"
Expand Down
102 changes: 100 additions & 2 deletions README.rst
Expand Up @@ -19,9 +19,11 @@ A Python Framework For Using HAR Files To Analyze Web Pages.
Overview
--------

The haralyzer module contains two classes for analyzing web pages based
The haralyzer module contains three classes for analyzing web pages based
on a HAR file. ``HarParser()`` represents a full file (which might have
multiple pages), and ``HarPage()`` represents a single page from said file.
multiple pages). ``HarPage()`` represents a single page from said file.
``HarEntry()`` represents an entry in a ``HarPage()`` there are are multiple entries per page.
Cyb3r-Jak3 marked this conversation as resolved.
Show resolved Hide resolved
Each ``HarEntry`` has a request and response that contains items such as the headers, status code, timings, etc

``HarParser`` has a couple of helpful methods for analyzing single entries
from a HAR file, but most of the pertinent functions are inside of the page
Expand Down Expand Up @@ -119,6 +121,102 @@ to a page, an additional page will be created with an ID of `unknown`. This
not have attributes for things like time to first byte or page load, and will
return `None`.

HarEntry
++++++++

The ``HarEntry()`` object contains useful information for each request. The main purpose is to have easy of use as it has a lot of attributes.
Each entry also contains a ``Request()`` and ``Response()`` which are styled off of the requests library.::

import json
from haralyzer import HarPage

with open("humanssuck.net.har", 'r') as f:
har_page = HarPage('page_3', har_data=json.loads(f.read()))

### GET BASIC INFO
print(har_page.hostname)
# 'humanssuck.net'
print(har_page.url)
# 'http://humanssuck.net/'

### GET LIST OF ENTRIES
print(har_page.entries)
# [HarEntry for http://humanssuck.net/, HarEntry for http://humanssuck.net/test.css, ...]

### WORKING WITH ENTRIES
single_entry = har_page.entries[0]

### REQUEST HEADERS
print(single_entry.request.headers)
# [{'name': 'Host', 'value': 'humanssuck.net'}, {'name': 'User-Agent', 'value': 'Mozilla/5.0 (X11; Linux i686 on x86_64; rv:25.0) Gecko/20100101 Firefox/25.0'}, ...]

### RESPONSE HEADERS
print(single_entry.response.headers)
# [{'name': 'Server', 'value': 'nginx'}, {'name': 'Date', 'value': 'Mon, 23 Feb 2015 03:28:12 GMT'}, ...]

### RESPONSE CODE
print(single_entry.response.status)
# 200

# GET THE VALUE OF A REQUEST OR RESPONSE HEADER
print(single_entry.request.get_header_value("accept"))
# text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

# ALL ATTRIBUTES OF A ENTRY

single_entry.cache -> Dictionary of cached content
single_entry.cookies -> List of combined cookies for request and response
single_entry.headers -> List of combined headers for request and response
single_entry.pageref -> String of the pageref
single_entry.port -> Integer of the port number for the server
single_entry.request -> Request object
single_entry.response -> Response object
single_entry.secure -> Bool if secure is set
single_entry.serverAddress -> String of the server IP
single_entry.startTime -> Datetime of the start time
single_entry.time -> Integer of total time for entry
single_entry.timings -> Dictionary of the timings for a request
single_entry.url -> String of the request url

# ALL ATTRIBUTES OF A REQUEST

single_entry.request.accept -> String of the ``Accept`` header
single_entry.request.bodySize -> Integer of the body size for the request
single_entry.request.cacheControl -> String of the ``Cache-Control`` header
single_entry.request.cookies -> List of cookies
single_entry.request.encoding -> String of the ``Accept-Encoding`` header
single_entry.request.headers -> List of headers
single_entry.request.headersSize -> Integer of the size of the headers
single_entry.request.host -> String of the ``Host`` header
single_entry.request.httpVersion -> String of the http version used
single_entry.request.language -> String of the ``Accept-Language`` header
single_entry.request.method -> String of the HTTP method used
single_entry.request.queryString -> List of query string used
single_entry.request.url -> String of the URL
single_entry.request.userAgent -> String of the User-Agent

# ALL ATTRIBUTES OF A RESPONSE
single_entry.response.bodySize -> Integer of the body size for the response
single_entry.response.cacheControl -> String of the ``Cache-Control`` header
single_entry.response.contentSecurityPolicy -> String of the `Content-Security-Policy`` header
single_entry.response.contentSize -> Integer of the content size
single_entry.response.contentType -> String of the ``content-type`` header
single_entry.response.date -> String of the ``date`` header
single_entry.response.headers -> List of headers
single_entry.response.headersSize -> Integer of the size of the headers
single_entry.response.httpVersion -> String of the http version used
single_entry.response.lastModified -> String of the ``last-modified`` header
single_entry.response.mimeType -> String of the mimeType of the content
single_entry.response.redirectURL -> String of the redirect URL or None
single_entry.response.status -> Integer of th HTTP status code
single_entry.response.statusText -> String of HTTP status
single_entry.response.text -> String of content received

mrname marked this conversation as resolved.
Show resolved Hide resolved
** You are still able to access items like a dictionary.
print(single_entry["connection"])
# "80"


MultiHarParser
++++++++++++++

Expand Down
5 changes: 4 additions & 1 deletion haralyzer/__init__.py
@@ -1,7 +1,10 @@
"""
Module for analyzing web pages using HAR files
"""
from .assets import HarParser, HarPage
from .assets import HarParser, HarPage, HarEntry


from .multihar import MultiHarParser


__all__ = ["HarPage", "HarParser", "MultiHarParser", "HarEntry"]