# Chat Podcast

Author: Kenneth Leung

## 01. RSS Podcast Download
- Download metadata from RSS feed for podcast episodes
- Download audio files from MP3 URLs

### (1) Import dependencies

In [17]:
import pandas as pd
import requests
import feedparser
from urllib.request import urlopen

In [4]:
# URL to RSS - Me, Myself and AI Podcast (Public RSS Link)
rss_url = 'https://feeds.megaphone.fm/TPG7603691495'

In [5]:
import feedparser
feeds = feedparser.parse(rss_url)

In [6]:
feed = feeds.entries[3]
feed

{'title': "Helping Doctors Make Better Decisions With Data: UC Berkeley's Ziad Obermeyer",
 'title_detail': {'type': 'text/plain',
  'language': None,
  'base': 'https://feeds.megaphone.fm/TPG7603691495',
  'value': "Helping Doctors Make Better Decisions With Data: UC Berkeley's Ziad Obermeyer"},
 'links': [{'rel': 'alternate',
   'type': 'text/html',
   'href': 'https://sloanreview.mit.edu/audio/helping-doctors-make-better-decisions-with-data-uc-berkeleys-ziad-obermeyer'},
  {'length': '0',
   'type': 'audio/mpeg',
   'href': 'https://pdst.fm/e/chrt.fm/track/2481B9/traffic.megaphone.fm/AMMTO4998268065.mp3?updated=1676039895',
   'rel': 'enclosure'}],
 'link': 'https://sloanreview.mit.edu/audio/helping-doctors-make-better-decisions-with-data-uc-berkeleys-ziad-obermeyer',
 'summary': 'When Ziad Obermeyer was a resident in an emergency medicine program, he found himself lying awake at night worrying about the complex elements of patient diagnoses that physicians could miss. He subsequent

### (2) Extract all metadata

In [7]:
df_metadata = pd.DataFrame()

In [8]:
for feed in feeds.entries:
    dict_i = {'UID': feed['id'],
              'Title': feed['title'],
              'URL': feed['links'][1]['href'].split('?')[0],
              'Date': feed['published'],
              'Summary': feed['summary']
             }
    df_i = pd.DataFrame([dict_i])
    df_metadata = pd.concat([df_metadata, df_i], ignore_index=True)

In [9]:
# Processing of data
df_metadata['Date'] = pd.to_datetime(df_metadata['Date'])
df_metadata['Title'] = df_metadata['Title'].str.replace(':', ' -')

In [10]:
df_metadata

Unnamed: 0,UID,Title,URL,Date,Summary
0,24094f6e-af17-11ed-893a-5fb27a12c51b,A One-Stop Data Shop - The Lego Group’s Anders...,https://pdst.fm/e/chrt.fm/track/2481B9/traffic...,2023-03-28 07:00:00+00:00,Anders Butzbach Christensen began his career i...
1,63697ace-8c55-11ed-9e17-57edbf799c14,A Third Path to Talent Development - Delta’s M...,https://pdst.fm/e/chrt.fm/track/2481B9/traffic...,2023-03-14 07:00:00+00:00,"Michelle McCrackin, senior manager of analytic..."
2,346f76cc-9cf6-11ed-8601-7f5de014a9b1,Out of the Lab and Into a Product - Microsoft’...,https://pdst.fm/e/chrt.fm/track/2481B9/traffic...,2023-02-28 08:00:00+00:00,As a partner with OpenAI — the company that re...
3,b071a528-8096-11ed-ba62-a3fabfa7610e,Helping Doctors Make Better Decisions With Dat...,https://pdst.fm/e/chrt.fm/track/2481B9/traffic...,2023-02-14 08:00:00+00:00,When Ziad Obermeyer was a resident in an emerg...
4,1fc1fd00-7b20-11ed-a69d-af6f0040618e,Bonus Episode - How Encouraging AI Use Will Be...,https://pdst.fm/e/chrt.fm/track/2481B9/traffic...,2023-01-10 08:00:00+00:00,"While Me, Myself, and AI is on winter break, w..."
5,37de4bc2-725c-11ed-a707-a3a8b9b7d75a,Bonus Episode - Learn to Make the Most of Your...,https://pdst.fm/e/chrt.fm/track/2481B9/traffic...,2022-12-06 09:00:00+00:00,"While Me, Myself, and AI is on winter break, w..."
6,c5bf0f9a-4722-11ed-8cda-9b328f07e86b,"Digital First, Physical Second - Wayfair’s Fio...",https://pdst.fm/e/chrt.fm/track/2481B9/traffic...,2022-11-08 11:00:00+00:00,With a background in building enterprise platf...
7,63ad275e-4008-11ed-9b78-676d7d28b461,Investing in the Last Mile - PayPal’s Khatereh...,https://pdst.fm/e/chrt.fm/track/2481B9/traffic...,2022-10-25 10:00:00+00:00,Khatereh (KK) Khodavirdi is focused on using A...
8,db572508-23c9-11ed-a4e5-873e49641731,Keeping Humans in the (Feedback) Loop - Orange...,https://pdst.fm/e/chrt.fm/track/2481B9/traffic...,2022-10-11 10:00:00+00:00,"Ameen Kazerouni, chief data and analytics offi..."
9,312f6bfa-1986-11ed-a6f9-2f34ce8db758,The Three Roles of the Chief Data Officer - AD...,https://pdst.fm/e/chrt.fm/track/2481B9/traffic...,2022-09-27 10:00:00+00:00,As chief data officer of payroll and benefits ...


In [11]:
# df_metadata.to_csv('../podcast_metadata_raw.csv', index=False, 
#                    encoding = 'utf-8-sig')

___
### (3) Download MP3 Files

In [18]:
titles = list(df_metadata['Title'].values)
urls = list(df_metadata['URL'].values)

In [21]:
for title, url in zip(titles, urls):
    print(f'Downloading episode: {title}')
    
    with urlopen(url) as file:
        content = file.read()
        
    # Save to file
    with open(f"{title}.mp3", 'wb') as download:
        download.write(content)

Downloading episode: A One-Stop Data Shop - The Lego Group’s Anders Butzbach Christensen
Downloading episode: A Third Path to Talent Development - Delta’s Michelle McCrackin
Downloading episode: Out of the Lab and Into a Product - Microsoft’s Eric Boyd
