Skip to content

Conversation

dsubak
Copy link
Contributor

@dsubak dsubak commented Sep 16, 2025

What are the relevant tickets?

Fixes https://github.com/mitodl/hq/issues/4803

Description (What does it do?)

Adds a browser user agent to the extract task so we can access the lockthequill rss feed.

How can this be tested?

  • Check out this branch.
  • Set OPEN_PODCAST_DATA_BRANCH=master and GITHUB_ACCESS_TOKEN to your access token value in backend.local.env. This will allow you to pull the config values from the yaml files in https://github.com/mitodl/open-podcast-data/tree/master
  • Run ./manage.py backpopulate_podcast_data from your web container.
  • Run the following block of code also from the web container once the above step is complete:
from learning_resources.models import LearningResource
from learning_resources.models import LearningResourceType
podcasts =LearningResource.objects.filter(resource_type=LearningResourceType.podcast.name)
for podcast in podcasts:
    print(podcast.url)

Observe that the https://lockthequill.buzzsprout.com/ resource is present. If it hits an HTTPError, it will simply skip processing that entry.

In [8]: for podcast in podcasts:
   ...:     print(podcast.url)
   ...: 
https://biology.mit.edu/news/biogenesis-podcast/
https://chalk-radio.simplecast.com/
https://cap.csail.mit.edu/podcasts
https://us.ivoox.com/es/podcast-the-digital-transformation-journey_sq_f1908594_1.html
https://www.technologyreview.com/supertopic/curious-coincidence/
https://mitsloan.mit.edu/podcast
https://anchor.fm/misti-comm
https://soundcloud.com/jwel-wpl-podcast
https://lockthequill.buzzsprout.com/
https://www.povertyactionlab.org/page/j-pal-voices-impact-and-promise-summer-jobs-united-states
https://ctl.mit.edu/podcasts
http://glimpse.mit.edu/
https://news.mit.edu/podcasts/curiosity-unbounded
https://entrepreneurship.mit.edu/podcast/
https://medical.mit.edu/podcast
https://cisr.mit.edu/research-library?sort=date&view=list&filters=1&pub_type%5B0%5D=12
http://www.sloansportsconference.com/
https://lgo.mit.edu/
https://mitpress.mit.edu/podcasts
https://news.mit.edu/podcasts/mit-news
https://climate.mit.edu/
https://soundcloud.com/mitstudents
https://mitsmr.com/3co75FC
https://bootcamp.mit.edu/
http://mitxpro.libsyn.com/
https://alum.mit.edu/topic/podcast
https://sloanreview.mit.edu/audio-series/counterpoints/
https://teachlabpodcast.com/
https://energy.mit.edu/audio
https://cmsw.mit.edu/category/media/podcasts/
https://sloanreview.mit.edu/audio-series/three-big-points/
https://open.mit.edu/c/themove

Additional Questions

  • I noticed in testing that when I deleted the learning resources associated with podcasts via backpopulate_podcast_data --delete (in order to ensure a clean testing environment) that I started getting a LOT of errors out of learning_resources_search.tasks.upsert_learning_resource on my worker. Is that expected?
  • Looks like Brave New Planet's RSS feed link kicks back a 404 now; should we remove that config yaml file?

@dsubak dsubak marked this pull request as ready for review September 16, 2025 18:59
@dsubak dsubak requested a review from Copilot September 16, 2025 19:25
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@dsubak dsubak added the Needs Review An open Pull Request that is ready for review label Sep 16, 2025
@dsubak dsubak requested a review from Copilot September 17, 2025 12:52
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +23 to +25
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/39.0.2171.95 Safari/537.36"
Copy link

Copilot AI Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user-agent string references Chrome version 39.0.2171.95 from 2014, which is extremely outdated. Consider using a more recent user-agent string to avoid potential blocking by servers that filter out very old browsers.

Suggested change
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/39.0.2171.95 Safari/537.36"
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@abeglova abeglova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dsubak
Copy link
Contributor Author

dsubak commented Sep 17, 2025

@abeglova Any advice on verifying this guy after merge prior to release? It looks like get_podcast_data fires off once every 2 hours but that seems like a long time to hold things up. I could pop onto a web server in RC and run backpopulate_podcast_data to trigger it out of band if that's okay.

@abeglova
Copy link
Contributor

yeah - after this gets to RC you should login to the rc web server and follow the steps you described to test locally to test it on rc.

No need to wait until the job runs

@dsubak dsubak merged commit 4891d1c into main Sep 19, 2025
13 checks passed
@dsubak dsubak deleted the dansubak/202509_fix_buzzspout_503 branch September 19, 2025 12:48
This was referenced Sep 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Review An open Pull Request that is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants