Permalink
Browse files

Add devbin with the first dev script parse

devbin/ is supposed to be home to developer scripts.

The first script to be added is called `parse' which is intended for
experimenting with or testing GoogleParser alone, without moving
parts. It raises the usefulness of the saved responses from --debug to a
whole new level, because we can parse the saved response repeatedly
rather than just eyeballing the HTML.
  • Loading branch information...
1 parent 93ea7b0 commit 2788ec968e0a59a569b39924844b10851e55c61f @zmwangx zmwangx committed Jun 5, 2016
Showing with 26 additions and 0 deletions.
  1. +1 −0 devbin/googler.py
  2. +25 −0 devbin/parse
View
View
@@ -0,0 +1,25 @@
+#!/usr/bin/env python3
+
+"""Parse saved responses with GoogleParser."""
+
+import argparse
+import json
+
+import googler
+
+def main():
+ argparser = argparse.ArgumentParser(description='Parse Google responses.')
+ argparser.add_argument('-N', '--news', action='store_true',
+ help='parse as Google News responses')
+ argparser.add_argument('files', nargs='+', metavar='FILE',
+ help="HTML file with Google's response body")
+ args = argparser.parse_args()
+ for fn in args.files:
+ with open(fn, encoding='utf-8') as fp:
+ htmlparser = googler.GoogleParser(news=args.news)
+ htmlparser.feed(fp.read())
+ results_object = [r.jsonizable_object() for r in htmlparser.results]
+ print(json.dumps(results_object, indent=2, sort_keys=True, ensure_ascii=False))
+
+if __name__ == '__main__':
+ main()

0 comments on commit 2788ec9

Please sign in to comment.