New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parsing of rss 2.0 feeds with yahoo media enclosures #731

Closed
jdragojevic opened this Issue Jul 18, 2013 · 9 comments

Comments

Projects
None yet
2 participants
@jdragojevic
Contributor

jdragojevic commented Jul 18, 2013

An item in a Kaltura generated rss feed looks something like this:

<item>
      <title>Kaltura Video Solutions for Media Companies</title>
      <link>http://qa.pculture.org/amara_tests/1_zr7niumr</link>
      <media:content url="http://cdnbakmi.kaltura.com/p/1492321/sp/149232100/serveFlavor/entryId/1_zr7niumr/flavorId/1_djpnqf7y/name/a.mp4">
        <media:title>Kaltura Video Solutions for Media Companies</media:title>
        <media:description>Kaltura’s groundbreaking media middleware platform gives you everything you need to manage live and on-demand rich-media. From video ingestion and transcoding, through moderation and metadata management, to distribution, monetization and content analysis – our Video Platform covers it all.</media:description>
        <media:keywords>kaltura, mobile, monetization, media, devices, ads, multi-platform</media:keywords>
        <media:thumbnail url="http://cdnbakmi.kaltura.com/p/1492321/sp/149232100/thumbnail/entry_id/1_zr7niumr/version/100000"></media:thumbnail>
        <media:rating scheme="urn:simple"></media:rating>
      </media>
</item>

We need to improve our parsing of these feeds so that we set the Video title, description and thumbnail when importing videos to teams.

Additionally the content:url includes the Kaltura unique id - we may want to grab this value and store it for future syncing of subs.

ex. in http://....entryId/1_zr7niumr/ - 1_zr7niumr is the piece we may want to store as a unique id.

@bendk

This comment has been minimized.

Show comment
Hide comment
@bendk

bendk Aug 6, 2013

Member

One question for this one is should the feed data overwrite the normal title/descriptions we get. For example for youtube videos we scrape the title/description from youtube.

I'm going to make it so that it does overwrite it, but tell me if it shouldn't.

Member

bendk commented Aug 6, 2013

One question for this one is should the feed data overwrite the normal title/descriptions we get. For example for youtube videos we scrape the title/description from youtube.

I'm going to make it so that it does overwrite it, but tell me if it shouldn't.

@bendk

This comment has been minimized.

Show comment
Hide comment
@bendk

bendk Aug 6, 2013

Member

How can we know that a video URL is from kaltura? I'm going to assume that if the hostname ends with kaltura.com, and we can scrape the ID correctly, then it's a kaltuara URL. Is that okay?

Member

bendk commented Aug 6, 2013

How can we know that a video URL is from kaltura? I'm going to assume that if the hostname ends with kaltura.com, and we can scrape the ID correctly, then it's a kaltuara URL. Is that okay?

bendk added a commit that referenced this issue Aug 6, 2013

Get Kaltura IDs from Video URLs (#731)
Refactored the VideoUrl class a bit to make it easier to get the video
type from it.  Added an extra_info() method to VideoUrl and VideoType
so that we can get type-specific info.  Made a KalturaVideoType class.
@bendk

This comment has been minimized.

Show comment
Hide comment
@bendk

bendk Aug 6, 2013

Member

Couple of notes on the video Ids:

  • We don't need to store this value in the DB because we can parse it from the URL
  • We will only parse it from the URL if the VideoUrl object has type="K" for Kaltura. This should happen for all Kaltura URLs going forward, no matter how they get added to the system. But it won't work for Kaltura URLs that were already in the system before the changes. If we want to implement that we'll need a database migration.
  • The parsing depends on us being able to identify kaltura URLs, hopefully the method I proposed above is good.
Member

bendk commented Aug 6, 2013

Couple of notes on the video Ids:

  • We don't need to store this value in the DB because we can parse it from the URL
  • We will only parse it from the URL if the VideoUrl object has type="K" for Kaltura. This should happen for all Kaltura URLs going forward, no matter how they get added to the system. But it won't work for Kaltura URLs that were already in the system before the changes. If we want to implement that we'll need a database migration.
  • The parsing depends on us being able to identify kaltura URLs, hopefully the method I proposed above is good.
@jdragojevic

This comment has been minimized.

Show comment
Hide comment
@jdragojevic

jdragojevic Aug 7, 2013

Contributor

@bendk - I am not getting any title / descrption values on these videos when I import this feed.

http://www.kaltura.com/api_v3/getFeed.php?partnerId=1492321&feedId=0_py3x4ruz

Contributor

jdragojevic commented Aug 7, 2013

@bendk - I am not getting any title / descrption values on these videos when I import this feed.

http://www.kaltura.com/api_v3/getFeed.php?partnerId=1492321&feedId=0_py3x4ruz

@jdragojevic

This comment has been minimized.

Show comment
Hide comment
@jdragojevic

jdragojevic Aug 7, 2013

Contributor

@bendk - I have three test failures on the tests for check_api_v2.test_video_url_resource - they all pass against the latest version of the staging branch.

It could be with how I am creating the videos. Using a factory, and then using

video_url = self.test_video.get_video_url() - to get the url stored for the video which is coming back as None.

Could be I need to update something in the tests, but I'm not sure what would be correct.

======================================================================
FAIL: Verify video urls for a particular video are listed.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "apps/webdriver_testing/check_api_v2/test_video_url_resource.py", line 38, in test_list
    self.assertEqual(video_url, response['objects'][0]['url'])
AssertionError: None != u'http://unisubs.example.com/0.mp4'
-------------------- >> begin captured logging << --------------------
test_steps: INFO: Video: DEmsCHxz9zsQ
test_steps: INFO: testcase: webdriver_testing.check_api_v2.test_video_url_resource.TestCaseVideoUrl.test_list
test_steps: INFO: description: Verify video urls for a particular video are listed.
--------------------- >> end captured logging << ---------------------

======================================================================
FAIL: Add an additional new url.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "apps/webdriver_testing/check_api_v2/test_video_url_resource.py", line 57, in test_url__post
    self.assertIn(video_url, response['all_urls'], response)
AssertionError: None not found in [u'http://unisubs.example.com/0.mp4', u'http://unisubs.example.com/newurl.mp4'] : {u'description': u'Greatest Video ever made', u'all_urls': [u'http://unisubs.example.com/0.mp4', u'http://unisubs.example.com/newurl.mp4'], u'created': u'2013-08-07T05:52:39.854527', u'title': u'Test Video 0', u'site_url': u'http://unisubs.example.com:9000/videos/DEmsCHxz9zsQ/info/', u'languages': [], u'thumbnail': u'', u'resource_uri': u'/api2/partners/videos/DEmsCHxz9zsQ/', u'team': None, u'duration': None, u'original_language': None, u'id': u'DEmsCHxz9zsQ', u'metadata': {}}
-------------------- >> begin captured logging << --------------------
test_steps: INFO: testcase: webdriver_testing.check_api_v2.test_video_url_resource.TestCaseVideoUrl.test_url__post
test_steps: INFO: description: Add an additional new url.
--------------------- >> end captured logging << ---------------------

======================================================================
FAIL: Verify video urls for a particular video are listed.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "apps/webdriver_testing/check_api_v2/test_video_url_resource.py", line 103, in test_url__put_primary
    self.assertEqual('http://unisubs.example.com/newerurl.mp4', self.test_video.get_video_url())
AssertionError: 'http://unisubs.example.com/newerurl.mp4' != None
-------------------- >> begin captured logging << --------------------
test_steps: INFO: testcase: webdriver_testing.check_api_v2.test_video_url_resource.TestCaseVideoUrl.test_url__put_primary
test_steps: INFO: description: Verify video urls for a particular video are listed.
--------------------- >> end captured logging << ---------------------

Contributor

jdragojevic commented Aug 7, 2013

@bendk - I have three test failures on the tests for check_api_v2.test_video_url_resource - they all pass against the latest version of the staging branch.

It could be with how I am creating the videos. Using a factory, and then using

video_url = self.test_video.get_video_url() - to get the url stored for the video which is coming back as None.

Could be I need to update something in the tests, but I'm not sure what would be correct.

======================================================================
FAIL: Verify video urls for a particular video are listed.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "apps/webdriver_testing/check_api_v2/test_video_url_resource.py", line 38, in test_list
    self.assertEqual(video_url, response['objects'][0]['url'])
AssertionError: None != u'http://unisubs.example.com/0.mp4'
-------------------- >> begin captured logging << --------------------
test_steps: INFO: Video: DEmsCHxz9zsQ
test_steps: INFO: testcase: webdriver_testing.check_api_v2.test_video_url_resource.TestCaseVideoUrl.test_list
test_steps: INFO: description: Verify video urls for a particular video are listed.
--------------------- >> end captured logging << ---------------------

======================================================================
FAIL: Add an additional new url.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "apps/webdriver_testing/check_api_v2/test_video_url_resource.py", line 57, in test_url__post
    self.assertIn(video_url, response['all_urls'], response)
AssertionError: None not found in [u'http://unisubs.example.com/0.mp4', u'http://unisubs.example.com/newurl.mp4'] : {u'description': u'Greatest Video ever made', u'all_urls': [u'http://unisubs.example.com/0.mp4', u'http://unisubs.example.com/newurl.mp4'], u'created': u'2013-08-07T05:52:39.854527', u'title': u'Test Video 0', u'site_url': u'http://unisubs.example.com:9000/videos/DEmsCHxz9zsQ/info/', u'languages': [], u'thumbnail': u'', u'resource_uri': u'/api2/partners/videos/DEmsCHxz9zsQ/', u'team': None, u'duration': None, u'original_language': None, u'id': u'DEmsCHxz9zsQ', u'metadata': {}}
-------------------- >> begin captured logging << --------------------
test_steps: INFO: testcase: webdriver_testing.check_api_v2.test_video_url_resource.TestCaseVideoUrl.test_url__post
test_steps: INFO: description: Add an additional new url.
--------------------- >> end captured logging << ---------------------

======================================================================
FAIL: Verify video urls for a particular video are listed.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "apps/webdriver_testing/check_api_v2/test_video_url_resource.py", line 103, in test_url__put_primary
    self.assertEqual('http://unisubs.example.com/newerurl.mp4', self.test_video.get_video_url())
AssertionError: 'http://unisubs.example.com/newerurl.mp4' != None
-------------------- >> begin captured logging << --------------------
test_steps: INFO: testcase: webdriver_testing.check_api_v2.test_video_url_resource.TestCaseVideoUrl.test_url__put_primary
test_steps: INFO: description: Verify video urls for a particular video are listed.
--------------------- >> end captured logging << ---------------------

@jdragojevic

This comment has been minimized.

Show comment
Hide comment
@jdragojevic
Contributor

jdragojevic commented Aug 7, 2013

@jdragojevic

This comment has been minimized.

Show comment
Hide comment
@jdragojevic

jdragojevic Aug 7, 2013

Contributor

btw - in the kaltura feed - you are forced to enter a base url value that's used as the link, ex:

<link>http://qa.amara.org/1_zlgl6ut8</link>

but it's not a valid value and we shouldn't use it, but instead the media:content url

Contributor

jdragojevic commented Aug 7, 2013

btw - in the kaltura feed - you are forced to enter a base url value that's used as the link, ex:

<link>http://qa.amara.org/1_zlgl6ut8</link>

but it's not a valid value and we shouldn't use it, but instead the media:content url

@bendk

This comment has been minimized.

Show comment
Hide comment
@bendk

bendk Aug 7, 2013

Member

I just pushed a fix for all of the above issues.

Member

bendk commented Aug 7, 2013

I just pushed a fix for all of the above issues.

@jdragojevic

This comment has been minimized.

Show comment
Hide comment
@jdragojevic

jdragojevic Aug 8, 2013

Contributor

@bendk

  1. Verified that the selenium tests are passing now. Ran tests for check_api_v2, check_create_page, check_videos, check_teams...
  2. Checked on gh-31 demo that we can add itunes and yahoo rss feeds to the site and they are displayed with the title / description metadata. Agree with comment that we should use the values in the feed vs ones we can scrape
  3. Checked locally that we can add the feed to teams and the title / description values are stored for the videos.
  4. Commented in #837 that we will need that fixed relatively soon so as the teams that use the Kaltura integration will be regularly updating their feeds and they will need to be associated with the team.
Contributor

jdragojevic commented Aug 8, 2013

@bendk

  1. Verified that the selenium tests are passing now. Ran tests for check_api_v2, check_create_page, check_videos, check_teams...
  2. Checked on gh-31 demo that we can add itunes and yahoo rss feeds to the site and they are displayed with the title / description metadata. Agree with comment that we should use the values in the feed vs ones we can scrape
  3. Checked locally that we can add the feed to teams and the title / description values are stored for the videos.
  4. Commented in #837 that we will need that fixed relatively soon so as the teams that use the Kaltura integration will be regularly updating their feeds and they will need to be associated with the team.

@jdragojevic jdragojevic referenced this issue Aug 8, 2013

Merged

Gh 731 #840

ehazlett added a commit that referenced this issue Aug 8, 2013

@jdragojevic jdragojevic closed this Aug 9, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment