Add vidscraper-cmd. #7

Closed
wants to merge 2 commits into
from

Projects

None yet

3 participants

@willkg

This adds the beginnings of a vidscraper-cmd. It supports auto-scraping
videos, but nothing else, yet. However, the code is written such that
adding support for other actions is easy to do.

The video subcommand supports specifying fields to be returned and also
api keys (though the latter is untested).

It returns data in JSON format--which required adding items() and
to_json() methods to the Video class.

Also adds a new topic in the documentation.

Closes #6.

@paulswartz
  • can't be automatically merged.
  • tests? At the very least, the new video methods should be tested, but it shouldn't be hard to test the CommandHandler either.
  • I don't see anything new in the documentation.
@paulswartz paulswartz commented on an outdated diff Apr 3, 2012
vidscraper/__init__.py
@@ -112,3 +115,76 @@ def auto_search(query, fields=None, order_by=None, crawl=False,
suites[suite] = search
return suites
+
+
+# fetchvideo -> auto_scrape(url, fields, api_keys)
+
+
+class CommandHandler(object):
@paulswartz
paulswartz Apr 3, 2012

For clarity, perhaps VidscraperCommandHandler?

@paulswartz paulswartz commented on an outdated diff Apr 3, 2012
vidscraper/__init__.py
+ "e.g. --fields=a,b,c")
+ parser.add_option("--apikeys", dest="api_keys",
+ help="api keys comma separated. "
+ "e.g. --apikeys=key:val,key2:val")
+ (options, args) = parser.parse_args()
+
+ if len(args) == 0:
+ parser.error("URL needed.")
+
+ if options.fields:
+ fields = options.fields.split(",")
+ else:
+ fields = None
+
+ if options.api_keys:
+ api_keys = dict(mem.split(":")
@paulswartz
paulswartz Apr 3, 2012

Probably doesn't affect too much, but this should be mem.split(':', 1) in case there's a ":" in the API key.

@paulswartz paulswartz commented on the diff Apr 3, 2012
vidscraper/__init__.py
+
+ if options.fields:
+ fields = options.fields.split(",")
+ else:
+ fields = None
+
+ if options.api_keys:
+ api_keys = dict(mem.split(":")
+ for mem in options.api_keys.split(","))
+ else:
+ api_keys = None
+
+ for arg in args:
+ print "Scraping %s" % arg
+ video = auto_scrape(arg, fields=fields, api_keys=api_keys)
+ print video.to_json(indent=2, sort_keys=True)
@paulswartz
paulswartz Apr 3, 2012

While sys.exit(None) does the right thing, for clarity this should return 0.

@paulswartz

Comments aside, I'm excited that other people find Vidscraper useful; thanks for the request!

@willkg

Cool. Sounds like the general structure is good. I'll fix up the issues you mentioned and clean some other things up (and include the topic file I wrote but forgot to add to the commit).

@paulswartz

Yes, the structure looks good. I like that you've made it easy to extend with new commands in the future.

@melinath

Looks good. Only thing that's bothering me atm is the use of mem as a filler variable where a more verbose name would be more readable - for example, for mem in self.fields could be rewritten as for field in self.fields.

@willkg

Rebased from master so it should apply cleanly now.

Addressed other issues from your comments.

@melinath

The branch needs to be rebased onto develop, not master, unfortunately. :-)

@willkg

Whoops. This project is different than all the other ones I work on and I forgot.

@willkg

Rebased against develop and re-pushed.

@paulswartz

Are you sure it's rebased against our develop? Github says it still can't be merged, and the list of commits (https://github.com/pculture/vidscraper/pull/7/commits) includes a bunch that have already happened.

@paulswartz paulswartz commented on the diff Apr 5, 2012
vidscraper/tests/unit/test_video.py
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+# THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import unittest
+
+from nose.tools import eq_
+
+from vidscraper import auto_scrape
+from vidscraper.compat import json
+from vidscraper.suites.base import Video
+
+
+class VideoTestCase(unittest.TestCase):
+ def test_items(self):
+ video = auto_scrape("http://www.youtube.com/watch?v=J_DV9b0x7v4")
@paulswartz
paulswartz Apr 5, 2012

Yay tests! Doing it this way, though, requires that Vidscraper actually go to YouTube and grab some data, and you aren't actually looking at the data. Can you test this by just making a plain Video instance?

@willkg
willkg Apr 5, 2012

Show me an example of how to do that with your system and I'll change the code.

I'm really running out of time I can spend on the nit-picky stuff.

@paulswartz
paulswartz Apr 5, 2012

vidscraper.videos.Video(url) won't actually load the URL.

@paulswartz paulswartz commented on the diff Apr 5, 2012
vidscraper/tests/unit/test_video.py
+ eq_(item[0], Video._all_fields[i])
+
+ def test_items_with_fields(self):
+ fields = ['title', 'user']
+ video = auto_scrape("http://www.youtube.com/watch?v=J_DV9b0x7v4",
+ fields)
+
+ # Make sure items can be iterated over and that there's one
+ # for every field.
+ for i, item in enumerate(video.items()):
+ eq_(item[0], fields[i])
+
+ def test_to_json(self):
+ video = auto_scrape("http://www.youtube.com/watch?v=J_DV9b0x7v4")
+
+ data_json = video.to_json()
@paulswartz
paulswartz Apr 5, 2012

Should this verify that the keys in the JSON are the same as from items()?

@willkg
willkg Apr 5, 2012

I don't think so. All to_json() does is take what items() gives it and serializes it. Therefore the only thing I think is worth testing is to make sure that it serializes and can be unserialized.

If you want to make it do that, then you should add that after this lands.

@paulswartz paulswartz commented on the diff Apr 5, 2012
vidscraper/tests/unit/test_video.py
+# documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+# IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+# OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+# IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+# NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+# THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import unittest
+
+from nose.tools import eq_
@paulswartz
paulswartz Apr 5, 2012

We don't depend on nose at the moment; can you use self.assertEqual instead?

@willkg
willkg Apr 5, 2012

The docs say you use nose. So... I used nose. If you want to change it, go for it.

Will Kahn-Gr... added some commits Apr 3, 2012
Will Kahn-Greene Add vidscraper-cmd.
This adds the beginnings of a vidscraper-cmd. It supports auto-scraping
videos, but nothing else, yet. However, the code is written such that
adding support for other actions is easy to do.

The video subcommand supports specifying fields to be returned and also
api keys (though the latter is untested).

It returns data in JSON format--which required adding ``items()`` and
``to_json()`` methods to the Video class.

Also adds a new topic in the documentation.

Closes #6.
d864cc0
Will Kahn-Greene Address comments from reviews
* adds missing files
* improves documentation
* adds tests for new Video methods
* handles datetime.datetime serialization issues with json.dumps.
b62371a
@willkg

That should fix the rebasing issue.

I'm kind of out of time I can spend on upstreaming this. So if it's not good as is, I'll just maintain my fork with this and future things I want to do. Then you can cherry-pick whatever is interesting to you.

@paulswartz

Still not mergeable, but I've rebased it and filed #8 with the changes.

@paulswartz paulswartz closed this Apr 5, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment