Updated docs for the analysis classes

palewire · Jul 22, 2014 · 29b9102 · 29b9102
1 parent 00b5def
commit 29b9102
Show file tree

Hide file tree

Showing 3 changed files with 50 additions and 9 deletions.
diff --git a/bin/storytracker-links2csv b/bin/storytracker-links2csv
@@ -17,6 +17,7 @@ kwargs, args = p.parse_args()
 
 if sys.stdin:
     o = storytracker.ArchivedURL(None, None, sys.stdin.read())
+    print o
     f = six.BytesIO()
     f = o.write_hyperlinks_csv_to_file(f)
     sys.stdout.write(f.getvalue())

diff --git a/docs/analysis.rst b/docs/analysis.rst
@@ -1,14 +1,15 @@
 Analysis
 ========
 
-
 ArchivedURL
 -----------
 
 An URL's archived HTML with tools for analysis.
 
 .. py:class:: ArchivedURL(url, timestamp, html)
 
+    **Initialization arguments**
+
     .. py:attribute:: url
 
         The url archived
@@ -21,6 +22,8 @@ An URL's archived HTML with tools for analysis.
 
         The HTML archived
 
+    **Other attributes**
+
     .. py:attribute:: gzip
 
         Returns the archived HTML as a stream of gzipped data
@@ -37,6 +40,17 @@ An URL's archived HTML with tools for analysis.
 
         A list of all the hyperlinks extracted from the HTML
 
+    .. py:attribute:: images
+
+        A list of all the images extracts from the HTML
+
+    **Output methods**
+
+    .. py:attribute:: write_hyperlinks_csv_to_file(file, encoding="utf-8")
+
+        Returns the provided file object with a ready-to-serve CSV list of
+        all hyperlinks extracted from the HTML.
+
     .. py:method:: write_gzip_to_directory(path)
 
         Writes gzipped HTML data to a file in the provided directory path
@@ -58,7 +72,6 @@ Example usage:
     >>> obj.timestamp
     datetime.datetime(2014, 7, 6, 16, 31, 57, 697250)
 
-
 ArchivedURLSet
 --------------
 
@@ -82,22 +95,50 @@ Example usage:
     >>> obj_list[1].timestamp
     datetime.datetime(2014, 7, 6, 16, 31, 57, 697250)
 
-
 Hyperlink
 ---------
 
 A hyperlink extracted from an :py:class:`ArchivedURL` object.
 
-.. py:class:: Hyperlink
+.. py:class:: Hyperlink(href, string, index, images=[])
+
+    **Initialization arguments**
+
+    .. py:attribute:: href
+
+        The URL the hyperlink references
+
+    .. py:attribute:: string
+
+        The strings contents of the anchor tag
+
+    .. py:attribute:: index
+
+        The index value of the links order within its source HTML. Starts counting at zero.
+
+    .. py:attribute:: images
+
+        A list of the :py:class:`Image` objects extracted from the HTML
 
-    .. py:attribute:: contents
+    **Other attributes**
 
-        The contents of the anchor tag
+    .. py:attribute:: __csv__
+
+        Returns a list of values ready to be written to a CSV file object
 
     .. py:attribute:: domain
 
         The domain of the href
 
-    .. py:attribute:: href
+Image
+-----
 
-        The URL the hyperlink references
+.. py:class:: Image(src)
+
+    An image extracted from an archived URL.
+
+    **Initialization arguments**
+
+    .. py:attribute:: src
+
+        The ``src`` attribute of the image tag
diff --git a/docs/archiving.rst b/docs/archiving.rst
@@ -74,7 +74,6 @@ Example usage:
     # Which of course can be piped into other commands like anything else
     $ storytracker-archive http://www.latimes.com -cm | grep lakers
 
-
 get
 ---