Skip to content

Conversation

@mbertrand
Copy link
Member

@mbertrand mbertrand commented Jul 8, 2024

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/4807
Closes https://github.com/mitodl/hq/issues/4834

Description (What does it do?)

  • Changes the readable_id value for podcasts and episodes to equal the guid value returned by the podcast RSS feeds. Note: readable_id is a misleading name for the field, it should probably be unique_id instead because it's not necessarily a user-friendly id, but that's a potential issue for another day.
  • Makes sure that all unpublished podcasts and episodes are removed from the search index (noticed this was missing when working on the above).
  • Because this will unpublish all existing podcasts/episodes and create new ones with different ids, I included a new management command that will update any learning paths/userlists that include podcasts & episodes from the old unpublished resources to the new resources that match based on a specified field (in this case, should be url), and optionally delete the old unpublished resources afterward. This could also be useful for any other types of resources that need their id's changed. I might adopt this to another PR that is currently blocked for unrelated reasons (Professional Education ETL Pipeline #1210)
# Usage: transfer_list_resources <resource_type> <matching_field> <from_source> <to_source> [--delete]
./manage.py transfer_list_resources podcast_episode url podcast podcast
./manage.py transfer_list_resources podcast url podcast podcast

How can this be tested?

  • On the main branch, if you don't already have podcasts imported, run ./manage.py backpopulate_podcast_data
  • Add some podcasts and episodes to learning paths and userlists. Note which ones you add to which paths/lists so you can check on them later.
  • Switch to this branch. Run ./manage.py backpopulate_podcast_data again, followed by these commands, with or without the delete option:
    ./manage.py transfer_list_resources podcast_episode url podcast podcast [--delete]
    ./manage.py transfer_list_resources podcast url podcast podcast [--delete]
    
  • Check the relevant paths/lists and make sure they include the podcasts/episodes you added before, and that they now point to new resources with the correct readable_id values.
  • Search for podcasts/episodes by title, you should not get any dupes in results.

@mbertrand mbertrand force-pushed the mb/podcast_ep_ids branch from f8158a3 to 13a1d96 Compare July 9, 2024 17:10
@mbertrand mbertrand added Needs Review An open Pull Request that is ready for review and removed Work in Progress labels Jul 9, 2024
@mbertrand mbertrand force-pushed the mb/podcast_ep_ids branch 2 times, most recently from 6887b98 to 243330f Compare July 9, 2024 17:51
@gumaerc gumaerc self-assigned this Jul 9, 2024
@gumaerc
Copy link
Contributor

gumaerc commented Jul 9, 2024

@mbertrand I'm not exactly sure this did what it was supposed to do when I ran it, but we should be sure it's not something with my local data that caused the issue. I first backpopulated my podcasts on main like you described, then switched over to your branch here and got this output from the commands you suggested I run:

gumaerc@gumaerc-work:~/Code/mit-open$ docker compose exec web ./manage.py transfer_list_resources podcast_episode url podcast podcast --delete
Migrate podcast_episode relationships from podcast to podcast, matching on url
Processed 684 resources and found 4 published matches, took 2.104389 seconds
gumaerc@gumaerc-work:~/Code/mit-open$ docker compose exec web ./manage.py transfer_list_resources podcast url podcast podcast --delete
Migrate podcast relationships from podcast to podcast, matching on url
Processed 10 resources and found 0 published matches, took 0.032394 seconds

It seems kind of odd that there would be only 4 published matches, no? I added a podcast to a learning path a while back as part of some testing, and it is still part of that list, but the readable_id seems to be the old format:

https://open-api.c4103.com/api/v1/learningpaths/5987/items/

{
            "id": 2625,
            "resource": {
                "id": 6014,
                "topics": [],
                "offered_by": null,
                "platform": {
                    "code": "podcast",
                    "name": "Podcast"
                },
                "course_feature": [],
                "departments": [],
                "certification": false,
                "certification_type": {
                    "code": "none",
                    "name": "No Certificate"
                },
                "prices": [
                    "0.00"
                ],
                "runs": [],
                "image": {
                    "id": 4663,
                    "url": "[https://megaphone.imgix.net/podcasts/b0ad37e4-2754-11ef-9e47-276f6a2f4171/image/730be6c9cb033f01c6d89d6f5d025c0d.jpg?ixlib=rails-4.3.1&max-w=3000&max-h=3000&fit=crop&auto=format,compress](https://megaphone.imgix.net/podcasts/b0ad37e4-2754-11ef-9e47-276f6a2f4171/image/730be6c9cb033f01c6d89d6f5d025c0d.jpg?ixlib=rails-4.3.1&max-w=3000&max-h=3000&fit=crop&auto=format%2Ccompress)",
                    "description": null,
                    "alt": null
                },
                "learning_path_parents": [],
                "user_list_parents": [],
                "views": 0,
                "learning_format": [
                    {
                        "code": "online",
                        "name": "Online"
                    }
                ],
                "free": true,
                "resource_category": "learning_material",
                "resource_type": "podcast_episode",
                "podcast_episode": {
                    "id": 2449,
                    "transcript": "",
                    "episode_link": "https://sloanreview.mit.edu/audio/building-connections-through-open-research-metas-joelle-pineau",
                    "duration": "2015",
                    "rss": "<item>\n <title>\n  Building Connections Through Open Research: Meta’s Joelle Pineau\n </title>\n <link>\n  [https://sloanreview.mit.edu/audio/building-connections-through-open-research-metas-joelle-pineau\n](https://sloanreview.mit.edu/audio/building-connections-through-open-research-metas-joelle-pineau%5Cn) </link>\n <description>\n  Joelle Pineau’s curiosity led her to pursue a doctorate in engineering with a focus on robotics, which she describes as her “gateway into AI.” As vice president of AI research at Meta, Joelle leads a team committed to openness in the service of high-quality research, responsible AI development, and community contribution.\nIn this episode, Joelle, who is also a professor at McGill University, weighs the advantages industry and academia each have for conducting artificial intelligence research. She also describes specific AI research projects Meta is working on, including scientific discovery initiatives focused on addressing societal problems like carbon capture. Read the episode transcript here.\nGuest bio:\nJoelle Pineau is vice president of AI research at Meta and a professor at McGill University. Her research focuses primarily on developing new models and algorithms for planning and learning in complex, partially observable domains. She also applies these algorithms to robotics, health care, games, and conversational agents. Pineau serves on the board of the &lt;cite&gt;Journal of Artificial Intelligence Research&lt;/cite&gt; and the &lt;cite&gt;Journal of Machine Learning Research&lt;/cite&gt;. She has a bachelor’s degree in engineering from the University of Waterloo and master’s degree and doctorate in robotics from Carnegie Mellon University.\n Me, Myself, and AI is a collaborative podcast from MIT Sloan Management Review and Boston Consulting Group and is hosted by Sam Ransbotham and Shervin Khodabandeh. Our engineer is David Lishansky, and the coordinating producers are Allison Ryder and Andy Goffin.\nStay in touch with us by joining our LinkedIn group, AI for Leaders at [mitsmr.com/AIforLeaders](http://mitsmr.com/AIforLeaders) or by following Me, Myself, and AI on LinkedIn.\nWe encourage you to rate and review our show. Your comments may be used in Me, Myself, and AI materials.\n </description>\n <pubDate>\n  Tue, 25 Jun 2024 07:00:00 -0000\n </pubDate>\n <itunes:title>\n  Building Connections Through Open Research: Meta’s Joelle Pineau\n </itunes:title>\n <itunes:episodeType>\n  full\n </itunes:episodeType>\n <itunes:season>\n  8\n </itunes:season>\n <itunes:episode>\n  8\n </itunes:episode>\n <itunes:author>\n  MIT Sloan Management Review and Boston Consulting Group (BCG)\n </itunes:author>\n <itunes:image href=\"[https://megaphone.imgix.net/podcasts/b0ad37e4-2754-11ef-9e47-276f6a2f4171/image/730be6c9cb033f01c6d89d6f5d025c0d.jpg?ixlib=rails-4.3.1&amp;max-w=3000&amp;max-h=3000&amp;fit=crop&amp;auto=format,compress\](https://megaphone.imgix.net/podcasts/b0ad37e4-2754-11ef-9e47-276f6a2f4171/image/730be6c9cb033f01c6d89d6f5d025c0d.jpg?ixlib=rails-4.3.1&max-w=3000&max-h=3000&fit=crop&auto=format%2Ccompress%5C)"/>\n <itunes:subtitle/>\n <itunes:summary>\n  Joelle Pineau’s curiosity led her to pursue a doctorate in engineering with a focus on robotics, which she describes as her “gateway into AI.” As vice president of AI research at Meta, Joelle leads a team committed to openness in the service of high-quality research, responsible AI development, and community contribution.\nIn this episode, Joelle, who is also a professor at McGill University, weighs the advantages industry and academia each have for conducting artificial intelligence research. She also describes specific AI research projects Meta is working on, including scientific discovery initiatives focused on addressing societal problems like carbon capture. Read the episode transcript here.\nGuest bio:\nJoelle Pineau is vice president of AI research at Meta and a professor at McGill University. Her research focuses primarily on developing new models and algorithms for planning and learning in complex, partially observable domains. She also applies these algorithms to robotics, health care, games, and conversational agents. Pineau serves on the board of the &lt;cite&gt;Journal of Artificial Intelligence Research&lt;/cite&gt; and the &lt;cite&gt;Journal of Machine Learning Research&lt;/cite&gt;. She has a bachelor’s degree in engineering from the University of Waterloo and master’s degree and doctorate in robotics from Carnegie Mellon University.\n Me, Myself, and AI is a collaborative podcast from MIT Sloan Management Review and Boston Consulting Group and is hosted by Sam Ransbotham and Shervin Khodabandeh. Our engineer is David Lishansky, and the coordinating producers are Allison Ryder and Andy Goffin.\nStay in touch with us by joining our LinkedIn group, AI for Leaders at [mitsmr.com/AIforLeaders](http://mitsmr.com/AIforLeaders) or by following Me, Myself, and AI on LinkedIn.\nWe encourage you to rate and review our show. Your comments may be used in Me, Myself, and AI materials.\n </itunes:summary>\n <content:encoded>\n  &lt;p&gt;Joelle Pineau’s curiosity led her to pursue a doctorate in engineering with a focus on robotics, which she describes as her “gateway into AI.” As vice president of AI research at Meta, Joelle leads a team committed to openness in the service of high-quality research, responsible AI development, and community contribution.&lt;/p&gt;&lt;p&gt;In this episode, Joelle, who is also a professor at McGill University, weighs the advantages industry and academia each have for conducting artificial intelligence research. She also describes specific AI research projects Meta is working on, including scientific discovery initiatives focused on addressing societal problems like carbon capture. Read the episode transcript &lt;a href=\"[https://mitsmr.com/3XsUquf\](https://mitsmr.com/3XsUquf%5C)"&gt;here&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;Guest bio:&lt;/p&gt;&lt;p&gt;Joelle Pineau is vice president of AI research at Meta and a professor at McGill University. Her research focuses primarily on developing new models and algorithms for planning and learning in complex, partially observable domains. She also applies these algorithms to robotics, health care, games, and conversational agents. Pineau serves on the board of the &amp;lt;cite&amp;gt;Journal of Artificial Intelligence Research&amp;lt;/cite&amp;gt; and the &amp;lt;cite&amp;gt;Journal of Machine Learning Research&amp;lt;/cite&amp;gt;. She has a bachelor’s degree in engineering from the University of Waterloo and master’s degree and doctorate in robotics from Carnegie Mellon University.&lt;/p&gt;&lt;p&gt;&lt;strong&gt; &lt;/strong&gt;&lt;em&gt;Me, Myself, and AI&lt;/em&gt; is a collaborative podcast from &lt;em&gt;MIT Sloan Management Review&lt;/em&gt; and Boston Consulting Group and is hosted by Sam Ransbotham and Shervin Khodabandeh. Our engineer is David Lishansky, and the coordinating producers are Allison Ryder and Andy Goffin.&lt;/p&gt;&lt;p&gt;Stay in touch with us by joining our LinkedIn group, AI for Leaders at &lt;a href=\"[https://cms.megaphone.fm/organizations/d9d31c72-668c-11ed-91bd-c3d63f9b708b/podcasts/f17e8dc0-0c38-11ec-bf53-9f4a62c5ca79/episodes/a079ef90-0ee4-11ee-8628-8776634095b4/mitsmr.com/AIforLeaders\](https://cms.megaphone.fm/organizations/d9d31c72-668c-11ed-91bd-c3d63f9b708b/podcasts/f17e8dc0-0c38-11ec-bf53-9f4a62c5ca79/episodes/a079ef90-0ee4-11ee-8628-8776634095b4/mitsmr.com/AIforLeaders%5C)"&gt;mitsmr.com/AIforLeaders&lt;/a&gt; or by following &lt;em&gt;Me, Myself, and AI&lt;/em&gt; on &lt;a href=\"[https://www.linkedin.com/showcase/me-myself-and-ai/\](https://www.linkedin.com/showcase/me-myself-and-ai/%5C)"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;We encourage you to rate and review our show. Your comments may be used in &lt;em&gt;Me, Myself, and AI &lt;/em&gt;materials.&lt;/p&gt;\n </content:encoded>\n <itunes:duration>\n  2015\n </itunes:duration>\n <itunes:explicit>\n  no\n </itunes:explicit>\n <guid isPermaLink=\"false\">\n  me-myself-and-ai02788c41db63357d971e926ddca3bdb3: b0ad37e4-2754-11ef-9e47-276f6a2f4171\n </guid>\n <enclosure length=\"0\" type=\"audio/mpeg\" url=\"[https://pdst.fm/e/chrt.fm/track/2481B9/traffic.megaphone.fm/AMMTO4810589869.mp3?updated=1718730321\](https://pdst.fm/e/chrt.fm/track/2481B9/traffic.megaphone.fm/AMMTO4810589869.mp3?updated=1718730321%5C)"/>\n</item>\n"
                },
                "readable_id": "building-connections-through-open-research-metas-joelle-pineau8eb5d7a51ed433d2a5ac175fa287943e",
                "title": "Building Connections Through Open Research: Meta’s Joelle Pineau",
                "description": "Joelle Pineau’s curiosity led her to pursue a doctorate in engineering with a focus on robotics, which she describes as her “gateway into AI.” As vice president of AI research at Meta, Joelle leads a team committed to openness in the service of high-quality research, responsible AI development, and community contribution.\nIn this episode, Joelle, who is also a professor at McGill University, weighs the advantages industry and academia each have for conducting artificial intelligence research. She also describes specific AI research projects Meta is working on, including scientific discovery initiatives focused on addressing societal problems like carbon capture. Read the episode transcript here.\nGuest bio:\nJoelle Pineau is vice president of AI research at Meta and a professor at McGill University. Her research focuses primarily on developing new models and algorithms for planning and learning in complex, partially observable domains. She also applies these algorithms to robotics, health care, games, and conversational agents. Pineau serves on the board of the <cite>Journal of Artificial Intelligence Research</cite> and the <cite>Journal of Machine Learning Research</cite>. She has a bachelor’s degree in engineering from the University of Waterloo and master’s degree and doctorate in robotics from Carnegie Mellon University.\n&nbsp;Me, Myself, and AI&nbsp;is a collaborative podcast from&nbsp;MIT Sloan Management Review&nbsp;and Boston Consulting Group and is hosted by Sam Ransbotham and Shervin Khodabandeh. Our engineer is David Lishansky, and&nbsp;the coordinating&nbsp;producers are Allison Ryder and Andy Goffin.\nStay in touch with us by joining our LinkedIn group, AI for Leaders at [mitsmr.com/AIforLeaders](http://mitsmr.com/AIforLeaders) or by following Me, Myself, and AI on LinkedIn.\nWe encourage you to rate and review our show. Your comments may be used in Me, Myself, and AI materials.",
                "full_description": null,
                "last_modified": "2024-06-25T07:00:00Z",
                "published": true,
                "languages": null,
                "url": "https://pdst.fm/e/chrt.fm/track/2481B9/traffic.megaphone.fm/AMMTO4810589869.mp3?updated=1718730321",
                "professional": false,
                "next_start_date": null
            },
            "position": 0,
            "parent": 5987,
            "child": 6014
        },

I checked some other random podcasts and they also seem to have the old style generated readable_id with the slugified name and a generated UUID at the end. I tried a search on podcasts with sortby=new to try and see if any of them had been updated with new ID's and it doesn't seem so:

https://open-api.c4103.com/api/v1/learning_resources_search/?aggregations=resource_type&aggregations=certification_type&aggregations=learning_format&aggregations=department&aggregations=topic&aggregations=offered_by&aggregations=free&aggregations=professional&aggregations=resource_category&offset=0&q=podcast&resource_category=learning_material&sortby=new

For what it's worth, I think readable_id isn't really supposed to mean "human readable" from a user perspective. I think it more means that the ID isn't just a number or GUID and there is an element to it that is readable so it can be looked up / reconstructed. For example, when we plan to integrate mit-open functionality into OCW, we will construct readable_id out of the different properties of the course that make up OCW readable ID's from the course metadata.

Anyway, I'm not exactly sure why my ID's weren't updated. Any ideas?

@mbertrand
Copy link
Member Author

@gumaerc did you run ./manage.py backpopulate_podcast_data again after switching to this branch? That is what will create new resources with the new id format, and unpublish the old podcasts/episodes. It needs to run before the transfer_list_resources commands, which will just update any learning paths/userlists that the old unpublished podcasts happen to be on.

However, I noticed there may be something off with the # of published podcasts after running this, looking into it now, so I'd recommend holding off on trying again until I sort that out. If you run backpopulate_podcast_data too many times you may hit a github rate exception (though using a VPN gets around this).

mitodl@67b0efc40d31:/src$ ./manage.py backpopulate_podcast_data
Started task 6c1b7629-b5f9-4a83-af14-0f91957f2ce6 to get podcast data
Waiting on task...
Population of podcast data finished, took 366.829355 seconds
mitodl@67b0efc40d31:/src$ ./manage.py transfer_list_resources podcast_episode url podcast podcast 
Migrate podcast_episode relationships from podcast to podcast, matching on url
Processed 1770 resources and found 1770 published matches, took 12.849384 seconds
mitodl@67b0efc40d31:/src$ ./manage.py transfer_list_resources podcast url podcast podcast
Migrate podcast relationships from podcast to podcast, matching on url
Processed 31 resources and found 3 published matches, took 0.06731 seconds

@mbertrand
Copy link
Member Author

@gumaerc made some changes, turns out lots of podcast rss feeds don't have a guid so I switched to an rss url based readable_id instead.

Copy link
Contributor

@gumaerc gumaerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good after the latest changes. For what it's worth, I did my original testing wrong and missed that I was supposed to run the command again after switching to your branch, but that makes sense.

@mbertrand mbertrand force-pushed the mb/podcast_ep_ids branch from 33a93bf to 9fbb9de Compare July 11, 2024 16:37
@mbertrand mbertrand merged commit d9cbccc into main Jul 11, 2024
This was referenced Jul 11, 2024
@mbertrand mbertrand deleted the mb/podcast_ep_ids branch October 23, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Review An open Pull Request that is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants