Skip to content

Commit

Permalink
Merge de9b811 into 3680f4f
Browse files Browse the repository at this point in the history
  • Loading branch information
Cyb3r-Jak3 committed Oct 11, 2020
2 parents 3680f4f + de9b811 commit d05187f
Show file tree
Hide file tree
Showing 22 changed files with 21,576 additions and 91 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Expand Up @@ -8,6 +8,7 @@ __pycache__/
# Distribution / packaging
.Python
env/
.venv
build/
develop-eggs/
dist/
Expand Down Expand Up @@ -57,3 +58,6 @@ target/

# Vim stuff
.ropeproject/

# Pycharm settings
.idea
6 changes: 5 additions & 1 deletion .travis.yml
Expand Up @@ -6,9 +6,13 @@ matrix:
python:
- "2.7"
- "3.6"
- "3.7"
- "3.8"
- "pypy"
- "pypy3.6-7.3.1"
install:
- "pip install ."
- "pip install -r requirements_dev.txt"
script: py.test --cov haralyzer tests/
script: py.test --cov haralyzer tests/ -vv
after_success:
- coveralls
102 changes: 100 additions & 2 deletions README.rst
Expand Up @@ -19,9 +19,11 @@ A Python Framework For Using HAR Files To Analyze Web Pages.
Overview
--------

The haralyzer module contains two classes for analyzing web pages based
The haralyzer module contains three classes for analyzing web pages based
on a HAR file. ``HarParser()`` represents a full file (which might have
multiple pages), and ``HarPage()`` represents a single page from said file.
multiple pages). ``HarPage()`` represents a single page from said file.
``HarEntry()`` represents an entry in a ``HarPage()``, and there are are multiple entries per page.
Each ``HarEntry`` has a request and response that contains items such as the headers, status code, timings, etc

``HarParser`` has a couple of helpful methods for analyzing single entries
from a HAR file, but most of the pertinent functions are inside of the page
Expand Down Expand Up @@ -119,6 +121,102 @@ to a page, an additional page will be created with an ID of `unknown`. This
not have attributes for things like time to first byte or page load, and will
return `None`.

HarEntry
++++++++

The ``HarEntry()`` object contains useful information for each request. The main purpose is to have easy of use as it has a lot of attributes.
Each entry also contains a ``Request()`` and ``Response()`` which are styled off of the requests library.::

import json
from haralyzer import HarPage

with open("humanssuck.net.har", 'r') as f:
har_page = HarPage('page_3', har_data=json.loads(f.read()))

### GET BASIC INFO
print(har_page.hostname)
# 'humanssuck.net'
print(har_page.url)
# 'http://humanssuck.net/'

### GET LIST OF ENTRIES
print(har_page.entries)
# [HarEntry for http://humanssuck.net/, HarEntry for http://humanssuck.net/test.css, ...]

### WORKING WITH ENTRIES
single_entry = har_page.entries[0]

### REQUEST HEADERS
print(single_entry.request.headers)
# [{'name': 'Host', 'value': 'humanssuck.net'}, {'name': 'User-Agent', 'value': 'Mozilla/5.0 (X11; Linux i686 on x86_64; rv:25.0) Gecko/20100101 Firefox/25.0'}, ...]

### RESPONSE HEADERS
print(single_entry.response.headers)
# [{'name': 'Server', 'value': 'nginx'}, {'name': 'Date', 'value': 'Mon, 23 Feb 2015 03:28:12 GMT'}, ...]

### RESPONSE CODE
print(single_entry.response.status)
# 200

# GET THE VALUE OF A REQUEST OR RESPONSE HEADER
print(single_entry.request.get_header_value("accept"))
# text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

# ALL ATTRIBUTES OF A ENTRY

single_entry.cache -> Dictionary of cached content
single_entry.cookies -> List of combined cookies for request and response
single_entry.headers -> List of combined headers for request and response
single_entry.pageref -> String of the pageref
single_entry.port -> Integer of the port number for the server
single_entry.request -> Request object
single_entry.response -> Response object
single_entry.secure -> Bool if secure is set
single_entry.serverAddress -> String of the server IP
single_entry.startTime -> Datetime of the start time
single_entry.time -> Integer of total time for entry
single_entry.timings -> Dictionary of the timings for a request
single_entry.url -> String of the request url

# ALL ATTRIBUTES OF A REQUEST

single_entry.request.accept -> String of the ``Accept`` header
single_entry.request.bodySize -> Integer of the body size for the request
single_entry.request.cacheControl -> String of the ``Cache-Control`` header
single_entry.request.cookies -> List of cookies
single_entry.request.encoding -> String of the ``Accept-Encoding`` header
single_entry.request.headers -> List of headers
single_entry.request.headersSize -> Integer of the size of the headers
single_entry.request.host -> String of the ``Host`` header
single_entry.request.httpVersion -> String of the http version used
single_entry.request.language -> String of the ``Accept-Language`` header
single_entry.request.method -> String of the HTTP method used
single_entry.request.queryString -> List of query string used
single_entry.request.url -> String of the URL
single_entry.request.userAgent -> String of the User-Agent

# ALL ATTRIBUTES OF A RESPONSE
single_entry.response.bodySize -> Integer of the body size for the response
single_entry.response.cacheControl -> String of the ``Cache-Control`` header
single_entry.response.contentSecurityPolicy -> String of the `Content-Security-Policy`` header
single_entry.response.contentSize -> Integer of the content size
single_entry.response.contentType -> String of the ``content-type`` header
single_entry.response.date -> String of the ``date`` header
single_entry.response.headers -> List of headers
single_entry.response.headersSize -> Integer of the size of the headers
single_entry.response.httpVersion -> String of the http version used
single_entry.response.lastModified -> String of the ``last-modified`` header
single_entry.response.mimeType -> String of the mimeType of the content
single_entry.response.redirectURL -> String of the redirect URL or None
single_entry.response.status -> Integer of th HTTP status code
single_entry.response.statusText -> String of HTTP status
single_entry.response.text -> String of content received

** You are still able to access items like a dictionary.
print(single_entry["connection"])
# "80"


MultiHarParser
++++++++++++++

Expand Down
5 changes: 4 additions & 1 deletion haralyzer/__init__.py
@@ -1,7 +1,10 @@
"""
Module for analyzing web pages using HAR files
"""
from .assets import HarParser, HarPage
from .assets import HarParser, HarPage, HarEntry


from .multihar import MultiHarParser


__all__ = ["HarPage", "HarParser", "MultiHarParser", "HarEntry"]

0 comments on commit d05187f

Please sign in to comment.