Skip to content
This repository has been archived by the owner on Dec 28, 2020. It is now read-only.

Commit

Permalink
Updated docs for the analysis classes
Browse files Browse the repository at this point in the history
  • Loading branch information
palewire committed Jul 22, 2014
1 parent 00b5def commit 29b9102
Show file tree
Hide file tree
Showing 3 changed files with 50 additions and 9 deletions.
1 change: 1 addition & 0 deletions bin/storytracker-links2csv
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ kwargs, args = p.parse_args()

if sys.stdin:
o = storytracker.ArchivedURL(None, None, sys.stdin.read())
print o
f = six.BytesIO()
f = o.write_hyperlinks_csv_to_file(f)
sys.stdout.write(f.getvalue())
Expand Down
57 changes: 49 additions & 8 deletions docs/analysis.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
Analysis
========


ArchivedURL
-----------

An URL's archived HTML with tools for analysis.

.. py:class:: ArchivedURL(url, timestamp, html)
**Initialization arguments**

.. py:attribute:: url
The url archived
Expand All @@ -21,6 +22,8 @@ An URL's archived HTML with tools for analysis.
The HTML archived

**Other attributes**

.. py:attribute:: gzip
Returns the archived HTML as a stream of gzipped data
Expand All @@ -37,6 +40,17 @@ An URL's archived HTML with tools for analysis.
A list of all the hyperlinks extracted from the HTML

.. py:attribute:: images
A list of all the images extracts from the HTML

**Output methods**

.. py:attribute:: write_hyperlinks_csv_to_file(file, encoding="utf-8")
Returns the provided file object with a ready-to-serve CSV list of
all hyperlinks extracted from the HTML.

.. py:method:: write_gzip_to_directory(path)
Writes gzipped HTML data to a file in the provided directory path
Expand All @@ -58,7 +72,6 @@ Example usage:
>>> obj.timestamp
datetime.datetime(2014, 7, 6, 16, 31, 57, 697250)
ArchivedURLSet
--------------

Expand All @@ -82,22 +95,50 @@ Example usage:
>>> obj_list[1].timestamp
datetime.datetime(2014, 7, 6, 16, 31, 57, 697250)
Hyperlink
---------

A hyperlink extracted from an :py:class:`ArchivedURL` object.

.. py:class:: Hyperlink
.. py:class:: Hyperlink(href, string, index, images=[])
**Initialization arguments**

.. py:attribute:: href
The URL the hyperlink references

.. py:attribute:: string
The strings contents of the anchor tag

.. py:attribute:: index
The index value of the links order within its source HTML. Starts counting at zero.

.. py:attribute:: images
A list of the :py:class:`Image` objects extracted from the HTML

.. py:attribute:: contents
**Other attributes**

The contents of the anchor tag
.. py:attribute:: __csv__
Returns a list of values ready to be written to a CSV file object

.. py:attribute:: domain
The domain of the href

.. py:attribute:: href
Image
-----

The URL the hyperlink references
.. py:class:: Image(src)
An image extracted from an archived URL.

**Initialization arguments**

.. py:attribute:: src
The ``src`` attribute of the image tag
1 change: 0 additions & 1 deletion docs/archiving.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,6 @@ Example usage:
# Which of course can be piped into other commands like anything else
$ storytracker-archive http://www.latimes.com -cm | grep lakers
get
---

Expand Down

0 comments on commit 29b9102

Please sign in to comment.