Permalink
Browse files

Update documentation to explain titles?, images? and so on access pat…

…terns.
  • Loading branch information...
1 parent e847d2d commit 8b93a3b4d738c473c5f77948f35caa23855bc6ee @lethain committed Nov 24, 2012
Showing with 34 additions and 2 deletions.
  1. +3 −1 .gitignore
  2. +1 −0 CHANGES.txt
  3. +28 −0 README.rst
  4. +2 −1 setup.py
View
@@ -3,4 +3,6 @@ env/
*~
.#*
*#
-build/
+build/
+dist/
+MANIFEST
View
@@ -1,3 +1,4 @@
+v0.1.2, 11/23/2012 -- Add html5lib to dependencies to ensure parsing is possible.
v0.1.2, 11/23/2012 -- Update setup.py dependencies for saner installation, again.
v0.1.1, 11/23/2012 -- Update setup.py dependencies for saner installation.
v0.1, 11/17/2012 -- Initial release.
View
@@ -31,6 +31,7 @@ An extremely simple example of using `extraction` is::
>>> extracted.title
>>> "Social Hierarchies in Engineering Organizations - Irrational Exuberance"
>>> print extracted.title, extracted.description, extracted.image, extracted.url
+ >>> print extracted.titles, extracted.descriptions, extracted.images, extracted.urls
Note that `source_url` is optional in extract, but is recommended
as it makes it possible to rewrite relative urls and image urls
@@ -190,6 +191,33 @@ The simplest possible example is the "Hello World" example from above::
>>> "Social Hierarchies in Engineering Organizations - Irrational Exuberance"
>>> print extracted.title, extracted.description, extracted.image, extracted.url
+You can get the best title, description and such out of an `Extracted`
+instance (which are returned by `Extractor.extract`) by::
+
+ >>> print extracted.title
+ >>> print extracted.description
+ >>> print extracted.url
+ >>> print extracted.image
+ >>> print extracted.feed
+
+You can get the full list of extracted values using the plural versions::
+
+ >>> print extracted.titles
+ >>> print extracted.descriptions
+ >>> print extracted.urls
+ >>> print extracted.images
+ >>> print extracted.feeds
+
+If you're looking for data which is being extracted but doesn't fall into
+one of those categories (perhaps using a custom technique), then
+take a look at the `Extracted._unexpected_values` dictionary::
+
+ >>> print extracted._unexpected_values
+
+Any type of metadata which isn't anticipated is stored there
+(look at `Subclassing Extracted to Extract New Types of Data`
+if this is something you're running into frequently).
+
Using Custom Techniques and Changing Technique Ordering
-------------------------------------------------------
View
@@ -2,7 +2,7 @@
setup(
name='extraction',
- version='0.1.2',
+ version='0.1.3',
author='Will Larson',
author_email='lethain@gmail.com',
packages=['extraction', 'extraction.tests', 'extraction.examples'],
@@ -12,5 +12,6 @@
long_description=open('README.rst').read(),
install_requires=[
"beautifulsoup4 >= 4.1.3",
+ "html5lib",
],
)

0 comments on commit 8b93a3b

Please sign in to comment.