# Nick Babcock
<b>Building an On-Demand Podcast Episode Summaries Application</b>
<p> This notebook provides a comprehensive solution for an application that will provide users with a weekly a summary of each podcast episode that was released that week. This application will summarize each episode of the desired podcast, giving the user a personalized "newsletter" of helpful information when one is trying to identify if the episode is of interest or not. This will ultimately save the user time spent on scrolling through numerous pages of summaries provided by the podcast itself. Instead of wasting time clicking on each episode, the user can simply input the podcast they wish to explore to generate an on-command, personalized summary of numerous episodes that are available in the same place.

## Approach

This project aims to understand how to approach building and deploying LLM apps that are of value to many users.<p>
The approach to building this product is divided into three parts -

- Part 1: use a Large Language Model (LLM) from OpenAI to build the information extraction functionality paired with a Speech to Text model for transcribing the podcast
- Part 2: use a cloud deployment provider to convert the information extraction function to run on demand (the app's backend)
- Part 3: use GPT-3.5 from OpenAI as to create and deploy a front-end that allows users to experience the end to end functionality

## Part 1 - Podcast Transcription and Information Extraction

- This model can be built using a podcast of your choice, as long as you can access its RSS feed. The RSS feed contains real time information provided by the podcast, such as headlines, summaries, and other updates.

### <b> Retrieve the audio file using the podcast's RSS feed </b>

In [25]:
!pip install feedparser

Collecting feedparser
  Downloading feedparser-6.0.10-py3-none-any.whl (81 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/81.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m71.7/81.1 kB[0m [31m2.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.1/81.1 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sgmllib3k (from feedparser)
  Downloading sgmllib3k-1.0.0.tar.gz (5.8 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: sgmllib3k
  Building wheel for sgmllib3k (setup.py) ... [?25l[?25hdone
  Created wheel for sgmllib3k: filename=sgmllib3k-1.0.0-py3-none-any.whl size=6047 sha256=51e2f9e2c2a6bc76bf7d035e48b96a25ac5c6752433daad6d8aa7e8f68ad1f71
  Stored in directory: /root/.cache/pip/wheels/f0/69/93/a47e9d621be168e9e33c7ce60524393c0b92ae83cf6c6e89c5
Successfully built sgmllib3k
I

In [26]:
# Enter the RSS feed URL of the selected podcast here.

import feedparser
podcast_feed_url = "https://feeds.megaphone.fm/locked-on-mets"
podcast_feed = feedparser.parse(podcast_feed_url)

In [27]:
print ("The number of podcast entries is ", len(podcast_feed.entries))

The number of podcast entries is  1175


In [28]:
# Get the URL of the latest podcast episode and download the MP3 file;  store in
# Google Colab's memory

for item in podcast_feed.entries[0].links:
  if (item['type'] == 'audio/mpeg'):
    episode_url = item.href
!wget -O 'podcast_episode.mp3' {episode_url}

--2023-08-27 03:01:39--  https://www.podtrac.com/pts/redirect.mp3/chtbl.com/track/39A2A2/traffic.megaphone.fm/LKN6318206367.mp3?updated=1693026352
Resolving www.podtrac.com (www.podtrac.com)... 54.75.137.72, 54.77.159.220
Connecting to www.podtrac.com (www.podtrac.com)|54.75.137.72|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://chtbl.com/track/39A2A2/traffic.megaphone.fm/LKN6318206367.mp3?updated=1693026352 [following]
--2023-08-27 03:01:39--  https://chtbl.com/track/39A2A2/traffic.megaphone.fm/LKN6318206367.mp3?updated=1693026352
Resolving chtbl.com (chtbl.com)... 13.227.219.76, 13.227.219.22, 13.227.219.12, ...
Connecting to chtbl.com (chtbl.com)|13.227.219.76|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://traffic.megaphone.fm/LKN6318206367.mp3?updated=1693026352 [following]
--2023-08-27 03:01:39--  https://traffic.megaphone.fm/LKN6318206367.mp3?updated=1693026352
Resolving traffic.megaphone.fm (traffic.m

### <b> Transcribe the audio file </b>

- Use the OpenAI speech to text model `medium` from the <b> Whisper </b> package to transcribe the podcast's speech to text

In [7]:
!pip install git+https://github.com/openai/whisper.git  -q

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for openai-whisper (pyproject.toml) ... [?25l[?25hdone


In [8]:
# Download (if not already downloaded) the Whisper package and store in memory

%%time

import pathlib
import whisper

model_path = pathlib.Path("/content/podcast/medium.pt")
if model_path.exists():
  print ("Model has already been downloaded")
else:
  print ("Begin download of Whisper Model")
  whisper._download(whisper._MODELS["medium"], '/content/podcast/', False)

Begin download of Whisper Model


100%|██████████████████████████████████████| 1.42G/1.42G [00:14<00:00, 106MiB/s]


CPU times: user 6.29 s, sys: 6.81 s, total: 13.1 s
Wall time: 26.5 s


In [29]:
# Load the model

model = whisper.load_model('medium', device='cuda', download_root='/content/podcast/')

In [30]:
# Feed the model the location of the podcast file to generate transcript

%%time

result = model.transcribe("/content/podcast_episode.mp3")

CPU times: user 4min 10s, sys: 1.25 s, total: 4min 12s
Wall time: 4min 24s


In [38]:
# Verify the transcription was successful

podcast_transcript = result['text']

In [40]:
# For the purpose of this project and preventing the need for running the above code
# after the notebook times out, the above transcript will be saved to a local variable

podcast_transcript

" It's the Locked On Podcast Network, your team every day. Football season is about to kick off and FanDuel is giving you the chance to win all season long because right now when you've been on a Super Bowl winner you can get bonus bets every time they win in the regular season. Just pick any team to win the Super Bowl and yes you'll get bonus bets for every victory. You can use your bonus bets on spreads, player props, over-unders and more. So visit FanDuel.com slash Locked On and start earning bonus bets with America's number one sports book. That's FanDuel.com slash Locked On. Who are the top ten prospects in the Mets farm system? I'm breaking down my list on today's edition, Locked On Mets. You are Locked On Mets, your daily New York Mets podcast. Part of the Locked On Podcast Network, your team every day. Hello to all you amazing Mets fans who are listening to Locked On Mets. Part of the Locked On Podcast Network, your team every day. Thank you for making Locked On Mets your first

In [None]:
# Saving transcript to a local variable

podcast_transcript = """ It's the Locked On Podcast Network. Your team every day. This is Corbin Smith of the Locked On Seahawks Podcast. US Cellular knows how important your kids' relationship with technology is, and they've made it their mission to help them establish good digital habits early on. That's why they've partnered with Screen Sanity, a nonprofit dedicated to helping kids navigate the digital landscape. And for a smarter start to the school year, US Cellular is also offering a free basic phone on new eligible lines, providing an alternative to a smartphone for children. Start smart with US Cellular. Visit uscellular.com slash built for us to find out more. Terms of clock. Metz outfielder DJ Stewart is turning heads coming off a great series this week against the Pittsburgh Pirates. Could he actually stick around beyond 2023? We'll discuss that on today's edition of Locked On Metz. You are Locked On Metz, your daily New York Metz podcast. Part of the Locked On Podcast Network. Your team every day. Hello to all you amazing Metz fans who are listening to Locked On Metz. Part of the Locked On Podcast Network. Your team every day. Thank you for making Locked On Metz your first listen every day. Locked On Metz is free and available on all platforms including YouTube. The Metz won a series on Wednesday in the first segment today. I will discuss a nice win and really DJ Stewart's role in it. He had a breakout game where he put his name on the map. So I want to talk about that. Then in the second segment, go a little bit more into his background if you're not aware where Stewart came from. And if there's a chance he can stick with the Metz beyond 2023. Then final segment, talk about the Metz versus the Cardinals and potentially more likely the last time the Metz ever had to face Adam Wainwright. What that could look like and then we'll take a little trip around the minor league affiliates. Before we get to any of it though, I'm HerosRyan Ficklestein. If you want to find any of my work, follow me on Twitter at FicklesteinRyan. You can also find some of my writing at JustBaseball.com where I work as the managing editor. Today we're going to lead off with a game recap because guess what? The Metz won a series. But really there's a story that came out of this game that's actually a feel-good one that maybe means something to this franchise moving forward. And that is the breakout of DJ Stewart. Now this does not mean that because of one two home run game, suddenly DJ Stewart is the starting right fielder for the New York Mets moving forward. But when you look at a team with the Rafael Ortega's of the world, Danny Mendick, Jonathan Aruz, there's a lot of guys that are playing baseball right now that you don't expect to be long term fixtures with the Mets or even guys that wear a Mets uniform beyond this season. Guys that are just eating innings for the Mets in a lost year. DJ Stewart is a tier above all those guys. Whether he's going to be a part of the 2024 Mets or not, which we will discuss in the next segment, he certainly looks like a big leaguer. He looks like a guy that could have a couple more years of big league service time ahead of him where he can go out and get some major league checks. And as he talked about the game, buy a lot of diapers for his daughter. As he said, diapers are expensive. That's what keeps him going. He's had a rough time over the last couple of years dealing with injuries and he's really having a nice year overall at the Mets. He was good in Syracuse and now he's putting up some good at bats with the big league club. But really, he was the reason they won this game and this series. He was great. And it was just a nice day Mets baseball game. For what should I go? It's a good watch. You can see your team go out and win. And I know there's starting to be contingent of Mets fans that are rooting for losses that are looking at the lottery and the Pirates are a team that now is ahead of them in the lottery. But for me, I feel like that stuff's going to handle itself. I just like watching the Mets win baseball games and I like watching a starting pitcher not be completely awful. And while I don't believe that Tyler McGill has suddenly turned a corner instead, I think he was just playing a bad team. But hey, he got through five innings and for that you deserve a steak dinner at this point. He gave up two runs, was a two run homer, five hits allowed, walked four so it wasn't pretty. But he got through it and thank you to the Pirates for allowing him to do that. DJ Stewart hit his first home run to get the Mets on the board in the second inning, a solo shot. The Mets actually scored a couple more runs that inning as Omar Nerviah is doubled. Rafael Ortega and Brandon Nemo each drew walks and then Francisco Lindor drove in a pair with a bases loaded RBI single. The three nothing Mets lead would hold for a very short time. I mean, the lead held, but that wide lead was short and quickly as Tyler McGill gave up that two run homer in the following inning. But the Mets got a run back on a Brandon Nemo single. They were up four to two and then DJ Stewart did something on the defensive side for the Mets. The Pirates had a man on first. It was Andrew McCutcheon, Jack Sawinski hit a double into the corner and DJ Stewart executed a perfect relay throw to Jeff McNeil, who made it even better throw home to gun down McCutcheon at the plate and get McGill out of that inning and out of the game. And then bottom half of the fifth, Stewart comes up, hits another home run. And the thing about these home runs from DJ Stewart, they aren't just the wall scrapers. He's hitting bombs. He's hitting the ball with authority. And it really makes you wonder if there's some staying power here. Now look at the rest of the game. Not that it matters too much. Phil Bickford gave up a run in the sixth. The Mets end up getting a couple back in the seventh. Pino Lanzo hits his thirty sixth home run of the season. And then Rafael Ortega has an RBI single. The Mets end up winning the game eight to three. But again, the big story of the day is Stewart, a guy that obviously had a great game. And when you have one great game, you've only played twenty five games in the season. Your numbers are going to shoot through the roof. But still, this is a man that hit sixteen home runs and fifty one games since Syracuse and now has four in the big leagues. And he looks the part. He looks athletic enough. He can play a corner spot in the outfield. And I think when you just try to zoom out a bit and think about the Mets next year and guys that you want to keep around and you're watching this team play right now, it really comes down to this for me. If you could tender a contract to one of these two guys, who are you picking? Going into next year, would you rather see DJ Stewart back or Daniel Vogel back back? Because Vogel back, you could keep them. You could tender them a contract, bring them on back down. And go through another year where he clogs up your spot and he's a great guy in the clubhouse. But the guy doesn't play a position that can't hit left handed pitching. DJ Stewart at least can play a spot in the outfield. He is certainly more athletic and the power to me looks better than that of Vogel back. And guess what? Stewart walks a lot, too. So when I look at the 2024 Mets, I'm starting to see a future where DJ Stewart could be a fourth outfielder for this team. And while it's not the biggest thing to find in the world in these final six, seven weeks of baseball, it's not nothing. And I want to talk about where Stewart came from a little bit more, get you some more details on his background if you're not aware of how he got to the Mets and what he did prior. And if there is actually something here at the Mets have found because guess what? Sometimes there are guys that just double upon that end up having a pretty nice impact for your franchise for a couple of years. And maybe they found something here with Stewart. So we're going to discuss all of that in a minute before we do, though. Today's episode is brought to you by Nutri-Full. Nutri-Full is the number one dermatologist recommended hair growth supplement clinically shown to improve your hair growth. Visible thickness and visible scalp coverage. Nutri-Full uses physician formulated natural science based ingredients. Their drug free patented technology provides consistent reliable results without compromising your sexual health. Go to Nutri-Full.com slash men to take their hair health wellness quiz to identify the causes of your thinning hair. And Nutri-Full will give you a personalized plan for better hair health through whole body wellness. Nutri-Full supports healthy hair growth from within by targeting the root causes of your thinning, such as stress, hormones, environment, nutrition, lifestyle and metabolism through whole body health. And it works in a clinical study. Eighty four percent of men showed improvement in their hair after six months taking Nutri-Full's men's hair growth supplements. Take the first step to visibly thicker, healthier hair. And for a limited time, Nutri-Full is offering our listeners ten dollars off your first month subscription and free shipping when you go to Nutri-Full.com slash men and enter the promo code locked on MLB. Find out why over four thousand health care professionals recommend Nutri-Full for healthier hair by going to Nutri-Full.com slash men. That's spelled N U T R A F O L dot com slash men. That's your promo code locked on MLB. That's Nutri-Full dot com slash men promo code locked on MLB. I am Jeff Carr from Lockdown Reds. You know, at the end of the first year, Discover credit cards automatically double all the cash back that you've earned. That's right. Everything that you've earned doubled all the cash back from eating at your favorite soup, dumpling restaurant, doubled all the cash back from that trip that you sort of learned how to snowboard, also doubled. And the best part, you don't have to do anything ridiculous to get it. Nope. Discover does it automatically. Seriously, though, see terms and check it out for yourself at Discover dot com slash match. The New York Mets put the St. Louis Cardinals tonight at seven fifteen Eastern Time. Catch every pitch in the Mets hometown broadcast with Sirius XM on the SXM app. Just search Mets. I want to give you a little more background on DJ Stewart. For those of you who aren't aware about where he was drafted or his playing career up to the point where now he is with the New York Mets. He was a first round pick back in 2015 by the Baltimore Orioles. There is a connection there with Buck Show Walter in his last season at Florida State before getting drafted. DJ Stewart hit three eighteen. He got on base at a five hundred clip and he slugged at a five ninety four clip with fifteen home runs, twelve stolen bases and fifty nine RBI and sixty four games. He ended up making his MLB debut in twenty eighteen playing seventeen games down the stretch for Buck Show Walter before he was let go by the Orioles. He posted an eight ninety OPS in those first forty seven plate appearances. Now twenty nineteen he was up and down forty four games in the big leagues. He only had a six ninety eight OPS but in sixty three games in triple A. Fared much better at two ninety one three ninety six on base five forty eight slug twelve home runs. Nineteen doubles. Twenty twenty he played about half of the sixty game season for the Orioles played in thirty one games had an eight oh nine OPS. So he was pretty solid. Twenty twenty one was his one real full year in the big leagues. Hundred games played hit two oh four had a three twenty four on base three seventy four slug six ninety eight OPS twelve home runs ten doubles thirty three RBI and three hundred eighteen plate appearances. Now last year he was hurt. He only played three games with the Orioles five games rehabbing in the Florida Complex League and then twenty nine games in triple A where he did have an eight seventy OPS this year. There's been some injuries too. He spoke about those a bit after the game today but in fifty one games in Syracuse as I already mentioned he had sixteen home runs. He had two twenty nine three sixty two on base five sixteen slug and the big thing for me is the plate discipline. He walked fifteen point three percent of the time struck out twenty point one percent of time with the Mets this year. He's walked twelve point three percent of time struck out twenty four point six percent of the time. Small sample size is twenty five games played but the numbers are solid right now. He's hitting two thirty four three thirty nine on base five thirty two slug got the OPS in the high eight hundreds. That's really good. And his WRC plus is one thirty seven. Now that again matters hitters based on a league average of one hundred. So he's thirty seven percent better than your league average hitter. That's an amazing number and it is largely ballooned by a two home run game because he went into the day with a one oh four WRC plus. But you hit two home runs and you're just barely over 50 played appearances on the season. He's at was it fifty seven right now. Yeah that's going to really really juice the stats. But you look at his career in the big leagues he's now played two hundred nineteen big league games six hundred and seventy nine or I guess two hundred and twenty. You add in today so you add in today changes numbers is before today if you look at fan graphs to comprise his whole big league career. It was two hundred nineteen games played six hundred and seventy five played appearances. He said two twelve three twenty seven on base four hundred slug. So that's a seven twenty seven OPS. That's not bad. His WRC plus at ninety nine about a league average hitter. He's walked in thirteen point two percent of his big league played appearances struck out twenty six point eight percent of the time. So that's his whole big league career. That is the full season in twenty twenty one. That's parts of twenty eighteen twenty nineteen twenty twenty and then what he's done in the small sample size this year. This is a guy that still hasn't really gotten his feet wet and really established himself as a big leader. But he is a former first round pick who clearly has town. I mean you watch him play you think all right there is something here. He also tore the cover off the ball in spring training too I might add. I think the Mets might have something here in D.J. Stewart is not to say OK I'm swayed by a single game but I do think if he can stay healthy down the stretch I want him in a lineup every single day over Rafael Ortega over Danny Mendick. I know they don't all play the same positions but my larger point is get D.J. Stewart in there let him face lefties. Let's just see because if you look at the Mets 40 man roster heading into next year I could see him on it. I really could and I mentioned the Daniel Vogelbach not comparison but just the thought exercise. If you go into next season you have to tender one of them a contract. Who do you tender a D.J. Stewart to me seems like a much more valuable player to a roster off the bench. I would hope that neither of those guys is penciled into a starting lineup on opening day and if they are the Mets got some issues. But you go into next season you think about the outfielders come opening day. Stale Marte so many he gets healthy and you hope he's more like the 2022 version himself. He's going to be the starting right fielder. Brandon was going to be the starting center fielder left fielders now gaping hole. Mark Hanna if he had stuck around they didn't trade him. Maybe the Mets would have picked up his club option. He's gone. We'll see maybe the Brewers don't and you can reunite with can unfree agency. Even then I don't know if fans would be necessarily happy with Mark Hanna being the opening day left fielder. Now question would be how do the young players factor in at this point unless something changes drastically over the next six and a half weeks. Whatever it is in the season. I don't think it's prep baby or Ronnie Mauricio. That's pencil into that spot. Can those guys win jobs and win playing time in spring training? Yes and are the Mets at the team that would leave that door open for one of those guys to grab some spots in spring training to grab some at bats. I think there's a chance that they are that team that's maybe not as all in and free agency as they've been in years past. But I still think they're going to add a pretty significant outfielder to start next year in free agency. I imagine someone's going to get added but are they going to add two guys? Maybe I don't know. I could see them going into next season with DJ Stewart as the fourth outfielder and you know you basically just see exactly how the season plays out. Stewart could be with him the whole year or he could start on the opening day roster and you know who knows maybe Drew Gilbert finds his way up to the big leagues at some point next season. Maybe Luis and Hala Cunha finds his way up and he's playing some outfield. We have no idea what the 2020 formats are going to look like. It is way too early to be really thinking about the opening day roster next year. But from what I've seen in the short sample size of all the guys you're going to watch play baseball down the stretch here, DJ Stewart is one to have your eye trained to a little bit more because there is a chance here that the Mets have found a guy that can be in their outfield rotation for the next couple of seasons. You have control of him if you want it up until 2027. You have three arbitration years if you want them. One attendor in the contract that is on the table for the Mets. And again based on the fact that he's a former first round pick who's put up pretty good numbers throughout his minor league career who has been about a league average hitter when he's been at the big league level. I think the Mets could probably do a lot worse than DJ Stewart moving forward. So I'm looking forward to seeing if he can keep this up and have a nice little finish to the season for the Mets and maybe earn himself a job next year. So we'll see what it looks like. But he's definitely the story of the day and it finally gave us something positive to talk about on the big league diamond which we just haven't had in a long time. Now the question is can the New York Mets leave a great final impression on Adam Wainwright because this will likely be the last time you ever have to watch Adam Wainwright pitch against the New York Mets in their series against the Cardinals this weekend. Well really this game will be on Thursday and I want to preview that match up a little bit and also take a trip around the minor league affiliate. So we're going to do all of that just a minute before we do these episodes brought to you by sleeper want the chance to win more money with less picks had to sleep or where you can win a hundred times your money on just two or more fancy baseball picks sleepers now offering up to a hundred times payout for their eight pick contest where you can choose as many as eight players that you like and pick more or less on your favorite baseball stats like home runs strikeouts hits and more get your picks right and you could win big there's built-in group chat functionality. Sleeper where you can see copier groups picks with the tap of a button and just can be made in 30 seconds or less it's that easy and they're safe and fast withdrawals so if you want to get in on the actual sleeper use the promo code locked on you'll get up to a hundred dollar match on your first deposit terms and conditions apply see sleepers terms of use for details currently operational in over 30 states check out sleeper today. NFL Sunday ticket is now on YouTube and YouTube TV which means that it just got easier to be an NFL fan even if you live far away like maybe you like the Bears but you're hibernating in Panthers territory but with NFL Sunday ticket you're out of market team is never more than a short distance away specifically the distance from you to remote control NFL Sunday ticket now on YouTube and YouTube TV go to YouTube dot com slash pre sale to get fifty dollars off terms and embargoes apply offer ends 919 no refunds no free tickets. The New York Mets play the St. Louis Cardinals 715 Eastern time tonight catch every pitch in the Mets hometown broadcast with Sirius XM on the SXM app just search Mets. Now if you're going to watch one game in this series against the Cardinals it's tonight's matchup it's Jose Kitana pitching against the team that he finished last year with first Adam Wainwright and this year Adam Wainwright has been an absolute disaster probably should not have come back for his age 41 season the velocity has been down and he has just looked awful last year he was great pitchers and he was a great player and he was a great player. He's been a great pitcher with three seven one ERA across 32 starts racked up a hundred and ninety one and two third innings pitched this year in 15 starts he's gotten eight seven eight ERA in 66 and two third innings he is three and seven he's one of the reasons why the Cardinals have been so bad this year and you look at his last three starts. He gets the Cubs on July 29th got through six innings gave up for earned August 4th against the Rockies gave up seven earned on nine hits pitch only three innings last start out against the Kansas City Royals Adam Wainwright in one inning gave up nine hits. And eight earned runs no walks no strikeouts woof we all know Adam Wainwright's history with the Mets I don't need to get into it but 20 20 geez 2006 and LCS a young Wainwright did something that we all like to forget now here we are how many years later is that cheese 2006 is it? 17 years later and he's clearly on his last legs finishing off what has been a great big league career and now he gets the pitch against this Mets team the Royals can knock him up for eight earned you'd hope the Mets could beat up Adam Wainwright and honestly I just don't know it's actually the Fox game why are the Mets playing so well? I guess you went into the season thinking oh middle of August Mets Cardinals that'll be a great matchup but man I don't know it's gonna be interesting to say the least you would think the Mets should win that game but I am just fascinated to enjoy it either way to either enjoy the Mets beating up Adam Wainwright on his way out or for Adam Wainwright to embarrass the Mets and I just get to laugh because it's better to laugh than cry. So we'll see how that one shakes out the rest of the series we don't know who's gonna pitch on Friday against Zach Thompson for the Cardinals left-handed pitcher with the 396 ERA this year the Mets are giving Kodai Senga an extra day rest and I imagine they do that throughout the rest of the season no reason to push Senga you know let him finish the year strong and always pitch with that extra day rest. Next year you can go back to trying to get him to pitch on regular rest but for now it makes sense and particularly because I guarantee you they're lining him up for Ghost Fork glow in the dark ball night which is not this Friday obviously but next Friday we'll talk about that promotion at more length later because it's against the Angels and Shohei Atani which I do not think is a coincidence but if they give him an extra day here like they did and they do that again Senga will start that Friday night blackout night city field you hope to get a big crowd I'm sure they do that so you'll see him pitch on Saturday it's Miles Michaelis who has a 427 ERA this year and then Carlos Carrasco will go back out there again to close out the series on Sunday against Dakota Hudson so that's what we got ahead Mets versus Cardinals four games set against two teams that are nowhere close to playoffs biggest disappointments in the National League who are playing more for the lottery balls. Then you know trying to put W's in the win column and no really directly competing in that regard so guess it's a win win series either you win the series or you win the lottery stuff let's take a quick trip around the affiliates always got to grab the big M to the rumble ponies hat before we do this one and you know let's start there because that's actually the one positive note I think of all the games that's true Gilbert drew Gilbert had a big big day to. Day to day to night geez I don't even know that enough they played a day game or night game I just looked at the box score I'm going to be completely honest so it's what I just say to tonight. Tonight drew Gilbert went through for five that with a double and two runs scored he has been awesome coming over to the rumble ponies we said hella Cunha has not been over for the walk and run scored believe you sitting below them and does the line a buck 40 and change that exactly what you like to see I think that's what we carried away about that one big M to the loss 10 to 7 Tyler Stewart had his second rough start in a row here if you remember when I was breaking down all the great pitching the Metro supposed to have in double a Stewart was one of these guys that had an unbelievable season. Up till about a week ago today where he was leading the minorities in the array is a huge guy six foot nine and just been having a lot of success and then gets knocked around his last time out and this time only pitches three innings gives up three earned on three hits and three walks did strike out six though of the nine outs that he recorded. Let's move down to Brooklyn looking at what the cyclones did today they lost seven to one Alex Ramirez and Ryan Clifford the two prospects really paying attention to on Brooklyn right now both went over for with two strikeouts a piece not a good day there. Finally Syracuse one seven to three quiet day for Brett Beatty and Ronnie Mauricio Ronnie was over for with a walk and two strikeouts Beatty one for four with a strikeout but hey the team won at least well for all you every dayers on tomorrow's show we're going to recap whatever happened against Adam Wainwright and you'll have a lot more farm report action we'll do our Friday farm report looking at the Mets minor leagues. I might do the updated top 10 so if you want to check that out make sure you file rate and review wherever you get your podcast follow me on Twitter at Finkelstein Ryan and follow the show Locked On Mets. Hey Prime members, you can listen to this Locked On podcast at free on Amazon Music download the Amazon Music app today.
 """

### <b> Create a summary of the podcast's transcipt </b>

- Use the OpenAI LLM model `gpt-3.5-turbo` to generate the summary. Use the `openai` library to make calls to OpenAI's API along with the `tiktoken` library to determine the number of tokens that will be passed to the API. This gives an indication of the cost associated with the API call and which model is appropriate to use due to the size of the tokens used.

In [4]:
# Install the libraries mentioned above to access OpenAi's gpt models

!pip install openai
!pip install tiktoken

Collecting openai
  Downloading openai-0.27.9-py3-none-any.whl (75 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/75.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━[0m [32m71.7/75.5 kB[0m [31m2.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.5/75.5 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.27.9


In [None]:
# Use personal API key to gain access to OpenAI's API

import openai
from getpass import getpass

openai.api_key = getpass('Enter the OpenAI API Key in the cell  ')

Enter the OpenAI API Key in the cell  ··········


In [None]:
# Confirm the API key works by listing all available OpenAI models

models = openai.Model.list()
for model in models["data"]:
  print (model["root"])

gpt-3.5-turbo-16k-0613
text-davinci-001
text-search-curie-query-001
davinci
text-babbage-001
curie-instruct-beta
davinci-similarity
code-davinci-edit-001
text-similarity-curie-001
ada-code-search-text
babbage
text-search-ada-query-001
gpt-3.5-turbo-0613
babbage-search-query
ada-similarity
gpt-3.5-turbo
text-search-ada-doc-001
text-search-babbage-query-001
code-search-ada-code-001
curie-search-document
text-search-davinci-query-001
text-search-curie-doc-001
babbage-search-document
babbage-code-search-text
text-embedding-ada-002
davinci-instruct-beta
davinci-search-query
text-similarity-babbage-001
text-davinci-002
code-search-babbage-text-001
text-search-davinci-doc-001
code-search-ada-text-001
text-davinci-003
ada-search-query
text-similarity-ada-001
ada-code-search-code
whisper-1
ada
text-davinci-edit-001
davinci-search-document
curie-search-query
babbage-similarity
ada-search-document
text-ada-001
text-similarity-davinci-001
gpt-3.5-turbo-16k
curie
curie-similarity
babbage-code-searc

In [None]:
# Check the number of tokens in the transcript to confirm the gpt-3.5-turbo model
# can handle its size

import tiktoken
enc = tiktoken.encoding_for_model("gpt-3.5-turbo")
print ("Number of tokens in input prompt ", len(enc.encode(podcast_transcript)))

Number of tokens in input prompt  5750


- The number of tokens in this transcript is greater than the allowed 4096 tokens that the `gpt-3.5-turbo` model can handle, so the larger model `gpt-3.5-turbo-16k` will be used instead

In [None]:
# Instruct the model on how to behave

instructPrompt = """
You will be provided a transcript of a podcast episode. Provide a detailed summary of the episode, providing all key points and topics that are talked about.
List each individual person that is mentioned in this episode, and include details why they are talked about. Also include all significant performances and statistics
that are talked about during the episode. Also mention all controversial opinions, ideas, or questions that the host talked about during this episode. This should be
an informative and interesting summary of the episode which I provided the transcript for.
"""

request = instructPrompt + podcast_transcript

In [None]:
# Make the call to the API to create the summary

chatOutput = openai.ChatCompletion.create(model="gpt-3.5-turbo-16k",
                                            messages=[{"role": "system", "content": "You are a helpful assistant."},
                                                      {"role": "user", "content": request}
                                                      ]
                                            )

In [None]:
# Investigate the contents of the generated summary

podcastSummary = chatOutput.choices[0].message.content
podcastSummary

"In this episode of the Locked On Mets podcast, host Ryan Finklestein discusses the New York Mets' recent win against the Pittsburgh Pirates. He highlights outfielder DJ Stewart's breakout performance in the game, in which he hit two home runs and made a crucial defensive play. Finklestein notes that while Stewart's performance does not guarantee him a spot as a long-term fixture on the team, he looks like a solid player who could have a future in the major leagues. He compares Stewart to other players on the team who are only filling in for the season and suggests that Stewart has more potential and value. Finklestein also provides stats and background information on Stewart's career, highlighting his power and plate discipline. He suggests that Stewart could potentially be the fourth outfielder for the Mets in the 2024 season. Finklestein then previews the Mets' upcoming series against the St. Louis Cardinals, focusing on the matchup between Mets pitcher Jose Quijada and Cardinals pi

### <b> Use a function to extract additional information to provide more context on the episode </b>

- Using OpenAI's function calling, we want to include relevant information about the episode from an outside source to provide additional context to the user. In this case, we will look for a guest speaker and provide a summary about their background.

In [None]:
request = podcast_transcript[:10000]
enc = tiktoken.encoding_for_model("gpt-3.5-turbo")
print ("Number of tokens in input prompt ", len(enc.encode(request)))

Number of tokens in input prompt  2198


In [None]:
# Create and describe the function that extracts a guest speaker's information
# using OpenAis function calling feature

completion = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": request}],
    functions=[
    {
        "name": "get_podcast_guest_information",
        "description": """Get information on the podcast guest, or individual who is a talking point of the episode, using their full name
                          and the name of the organization they're a part of, to search for them on Wikipedia or Google.""",
        "parameters": {
            "type": "object",
            "properties": {
                "guest_name": {
                    "type": "string",
                    "description": "The full name of the guest or significant individual in the podcast",
                },
                "guest_organization": {
                    "type": "string",
                    "description": "The full name of the organization that the podcast guest or significant individual belongs to or runs",
                },
                "guest_title": {
                    "type": "string",
                    "description": "The title, designation or role of the podcast guest or significant individual in their organization",
                },
            },
            "required": ["guest_name"],
        },
    }
    ],
    function_call={"name": "get_podcast_guest_information"}
    )

In [None]:
completion

<OpenAIObject chat.completion id=chatcmpl-7olCmdxXel7u2CHaZHmLEpdVkajyp at 0x7d5e318a5440> JSON: {
  "id": "chatcmpl-7olCmdxXel7u2CHaZHmLEpdVkajyp",
  "object": "chat.completion",
  "created": 1692332212,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "get_podcast_guest_information",
          "arguments": "{\n  \"guest_name\": \"DJ Stewart\",\n  \"guest_organization\": \"New York Mets\"\n}"
        }
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 2356,
    "completion_tokens": 21,
    "total_tokens": 2377
  }
}

In [None]:
# Use the JSON format to extract a guest name

import json

podcast_guest = ""
podcast_guest_org = ""
podcast_guest_title = ""
response_message = completion["choices"][0]["message"]
if response_message.get("function_call"):
  function_name = response_message["function_call"]["name"]
  function_args = json.loads(response_message["function_call"]["arguments"])
  podcast_guest=function_args.get("guest_name")
  podcast_guest_org=function_args.get("guest_organization")
  podcast_guest_title=function_args.get("guest_title")

In [None]:
# Print the extracted guest or individual information

if podcast_guest_org is None:
  podcast_guest_org = ""
if podcast_guest_title is None:
  podcast_guest_title = ""

print (podcast_guest)
print (podcast_guest_org)
print (podcast_guest_title)

DJ Stewart
New York Mets



- Use Wikipedia's python library to query Wikipedia and find information aobut the identified guest

In [27]:
!pip install wikipedia

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11678 sha256=2a41a676ff9fe53c284267f768465ef10aae0d05fc2aa0c77f249013645e6b32
  Stored in directory: /root/.cache/pip/wheels/5e/b6/c5/93f3dec388ae76edc830cb42901bb0232504dfc0df02fc50de
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


In [None]:
import wikipedia

input = wikipedia.page(podcast_guest + " " + podcast_guest_org + " " + podcast_guest_title, auto_suggest=True)

In [None]:
input.summary

'Demetrius Jerome Stewart (born November 30, 1993) is an American professional baseball outfielder for the New York Mets of Major League Baseball (MLB). He has previously played in MLB for the Baltimore Orioles. Stewart played college baseball for the Florida State Seminoles.'

## Part 2 - Deploying to a cloud back-end service and allow its use on-demand

- It is necesarry to build both a back-end and front-end service into an on-demand cloud function

In [1]:
!pip install requests



In [None]:
# Build a function to perform the above steps in one function that can be passed to the cloud service

def get_transcribe_podcast(rss_url, local_path):
  print ("Starting Podcast Transcription Function")
  print ("Feed URL: ", rss_url)
  print ("Local Path:", local_path)

  # Read from the RSS Feed URL
  import feedparser
  intelligence_feed = feedparser.parse(rss_url)
  for item in intelligence_feed.entries[0].links:
    if (item['type'] == 'audio/mpeg'):
      episode_url = item.href
  episode_name = "podcast_episode.mp3"
  print ("RSS URL read and episode URL: ", episode_url)

  # Download the podcast episode by parsing the RSS feed
  from pathlib import Path
  p = Path(local_path)
  p.mkdir(exist_ok=True)

  print ("Downloading the podcast episode")
  import requests
  with requests.get(episode_url, stream=True) as r:
    r.raise_for_status()
    episode_path = p.joinpath(episode_name)
    with open(episode_path, 'wb') as f:
      for chunk in r.iter_content(chunk_size=8192):
        f.write(chunk)

  print ("Podcast Episode downloaded")

  # Load the Whisper model
  import os
  import whisper
  print ("Download and Load the Whisper model")
  model = whisper.load_model("medium")
  print (model.device)

  # Perform the transcription
  print ("Starting podcast transcription")
  result = model.transcribe(local_path + episode_name)

  # Return the transcribed text
  print ("Podcast transcription completed, returning results...")
  return result

In [None]:
# Test the function

output = get_transcribe_podcast("https://feeds.megaphone.fm/locked-on-mets", "/content/podcast/")

Starting Podcast Transcription Function
Feed URL:  https://feeds.megaphone.fm/locked-on-mets
Local Path: /content/podcast/
RSS URL read and episode URL:  https://www.podtrac.com/pts/redirect.mp3/chtbl.com/track/39A2A2/traffic.megaphone.fm/LKN3109355599.mp3?updated=1692249801
Downloading the podcast episode
Podcast Episode downloaded
Download and Load the Whisper model


100%|█████████████████████████████████████| 1.42G/1.42G [00:25<00:00, 60.8MiB/s]


cuda:0
Starting podcast transcription
Podcast transcription completed, returning results...


In [None]:
# Verify the function is working as expected

output['text'][:1000]

" It's the Locked On Podcast Network. Your team every day. This is Corbin Smith of the Locked On Seahawks Podcast. US Cellular knows how important your kids' relationship with technology is, and they've made it their mission to help them establish good digital habits early on. That's why they've partnered with Screen Sanity, a nonprofit dedicated to helping kids navigate the digital landscape. And for a smarter start to the school year, US Cellular is also offering a free basic phone on new eligible lines, providing an alternative to a smartphone for children. Start smart with US Cellular. Visit uscellular.com slash built for us to find out more. Terms of clock. Metz outfielder DJ Stewart is turning heads coming off a great series this week against the Pittsburgh Pirates. Could he actually stick around beyond 2023? We'll discuss that on today's edition of Locked On Metz. You are Locked On Metz, your daily New York Metz podcast. Part of the Locked On Podcast Network. Your team every day

### <b> Create a cloud transcription function using Modal Labs </b>

In [2]:
# Install Modal package

!pip install modal

Collecting modal
  Downloading modal-0.51.3202-py3-none-any.whl (284 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/284.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m215.0/284.0 kB[0m [31m6.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.0/284.0 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
Collecting aiostream (from modal)
  Downloading aiostream-0.4.5-py3-none-any.whl (35 kB)
Collecting asgiref (from modal)
  Downloading asgiref-3.7.2-py3-none-any.whl (24 kB)
Collecting fastapi (from modal)
  Downloading fastapi-0.103.0-py3-none-any.whl (66 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m66.2/66.2 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting grpclib==0.4.3 (from modal)
  Downloading grpclib-0.4.3.tar.gz (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [

In [None]:
!modal token new --source corise > authenticationURL.txt

In [3]:
# Setting the Modal token for access to Modal's library

import getpass
import subprocess

def set_modal_token():
  token_id = getpass.getpass('Please enter your Modal token ID in the cell: ')
  token_secret = getpass.getpass('Please enter your Modal token secret in the cell:  ')

  # Using subprocess to execute the command
  subprocess.run(f"!modal token set --token-id (token_id) --token-secret (token_secret)", shell=True)

In [4]:
set_modal_token()

Please enter your Modal token ID in the cell: ··········
Please enter your Modal token secret in the cell:  ··········


In [5]:
# Creating a token for Modal to communicate with this notebook

!modal token new > authenticationURL200.txt

- Create a python file including the intended function to use with Modal. We don't want to have to use this Google Colab environment every time we want to run the function

In [10]:
# Creating the python file that contains a function to perform the transcribing and
# sumarizing steps above, but to run on the Modal cloud environment

%%writefile /content/podcast/podcast_backend.py
import modal

def download_whisper():
  # Load the Whisper model
  import os
  import whisper
  print ("Download the Whisper model")

  # Perform download only once and save to Container storage
  whisper._download(whisper._MODELS["medium"], '/content/podcast/', False)


stub = modal.Stub("corise-podcast-project")
corise_image = modal.Image.debian_slim().pip_install("feedparser",
                                                     "https://github.com/openai/whisper/archive/9f70a352f9f8630ab3aa0d06af5cb9532bd8c21d.tar.gz",
                                                     "requests",
                                                     "ffmpeg").apt_install("ffmpeg").run_function(download_whisper)

@stub.function(image=corise_image, gpu="any")
def get_transcribe_podcast(rss_url, local_path):
  print ("Starting Podcast Transcription Function")
  print ("Feed URL: ", rss_url)
  print ("Local Path:", local_path)

  # Read from the RSS Feed URL
  import feedparser
  intelligence_feed = feedparser.parse(rss_url)
  for item in intelligence_feed.entries[0].links:
    if (item['type'] == 'audio/mpeg'):
      episode_url = item.href
  episode_name = "podcast_episode.mp3"
  print ("RSS URL read and episode URL: ", episode_url)

  # Download the podcast episode by parsing the RSS feed
  from pathlib import Path
  p = Path(local_path)
  p.mkdir(exist_ok=True)

  print ("Downloading the podcast episode")
  import requests
  with requests.get(episode_url, stream=True) as r:
    r.raise_for_status()
    episode_path = p.joinpath(episode_name)
    with open(episode_path, 'wb') as f:
      for chunk in r.iter_content(chunk_size=8192):
        f.write(chunk)

  print ("Podcast Episode downloaded")

  # Load the Whisper model
  import os
  import whisper

  # Load model from saved location
  print ("Load the Whisper model")
  model = whisper.load_model('medium', device='cuda', download_root='/content/podcast/')

  # Perform the transcription
  print ("Starting podcast transcription")
  result = model.transcribe(local_path + episode_name)

  # Return the transcribed text
  print ("Podcast transcription completed, returning results...")
  return result

@stub.local_entrypoint()
def main(url, path):
  output = get_transcribe_podcast.call(url, path)
  print (output['text'])

Writing /content/podcast/podcast_backend.py


In [None]:
!modal run /content/podcast/podcast_backend.py --url https://feeds.megaphone.fm/locked-on-mets --path /content/podcast/

[?25l[34m⠋[0m Initializing...[2K[32m✓[0m Initialized. [37mView app at [0m[4;37mhttps://modal.com/apps/ap-t4x1Uj4yOD6sSi6t3pQ69D[0m
[2K[34m⠋[0m Initializing...
[2K[34m⠸[0m Creating objects...
[37m├── [0m[34m⠋[0m Creating get_transcribe_podcast...
[37m├── [0m[32m🔨[0m Created mount /content/podcast/podcast_backend.py
[37m├── [0m[34m⠋[0m Creating download_whisper...
[37m└── [0m[34m⠋[0m Creating mount /content/podcast/podcast_backend.py: Uploaded 0/0 inspected
[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[34m⠦[0m Creating objects...
[37m├── [0m[32m🔨[0m Created get_transcribe_podcast.
[37m├── [0m[32m🔨[0m Created mount /content/podcast/podcast_backend.py
[37m├── [0m[32m🔨[0m Created download_whisper.
[37m└── [0m[32m🔨[0m Created mount /content/podcast/podcast_backend.py
[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[32m✓[0m Created objects.
[37m├── [0m[32m🔨[0m Created get_transcribe_podcast.
[37m├── [0m[32m🔨[0m Created mount /content/podcast

### <b> Create a cloud information extraction function using Modal Labs </b>

In [9]:
# Overwrite the previously created podcast_backend.py file to incorporate the information
# extraction function along with the podcast transcription function

# Since we will be utilizing OpenAI's API in this function, a parameter was created to
# use the token connecting our Modal environment to our OpenAI account ("my-openai-secret")

%%writefile /content/podcast/podcast_backend.py

import modal

def download_whisper():
  # Load the Whisper model
  import os
  import whisper
  print ("Download the Whisper model")

  # Perform download only once and save to Container storage
  whisper._download(whisper._MODELS["medium"], '/content/podcast/', False)


stub = modal.Stub("corise-podcast-project")
corise_image = modal.Image.debian_slim().pip_install("feedparser",
                                                     "https://github.com/openai/whisper/archive/9f70a352f9f8630ab3aa0d06af5cb9532bd8c21d.tar.gz",
                                                     "requests",
                                                     "ffmpeg",
                                                     "openai",
                                                     "tiktoken",
                                                     "wikipedia",
                                                     "ffmpeg-python").apt_install("ffmpeg").run_function(download_whisper)

@stub.function(image=corise_image, gpu="any", timeout=600)
def get_transcribe_podcast(rss_url, local_path):
  print ("Starting Podcast Transcription Function")
  print ("Feed URL: ", rss_url)
  print ("Local Path:", local_path)

  # Read from the RSS Feed URL
  import feedparser
  intelligence_feed = feedparser.parse(rss_url)
  podcast_title = intelligence_feed['feed']['title']
  episode_title = intelligence_feed.entries[0]['title']
  episode_image = intelligence_feed['feed']['image'].href
  for item in intelligence_feed.entries[0].links:
    if (item['type'] == 'audio/mpeg'):
      episode_url = item.href
  episode_name = "podcast_episode.mp3"
  print ("RSS URL read and episode URL: ", episode_url)

  # Download the podcast episode by parsing the RSS feed
  from pathlib import Path
  p = Path(local_path)
  p.mkdir(exist_ok=True)

  print ("Downloading the podcast episode")
  import requests
  with requests.get(episode_url, stream=True) as r:
    r.raise_for_status()
    episode_path = p.joinpath(episode_name)
    with open(episode_path, 'wb') as f:
      for chunk in r.iter_content(chunk_size=8192):
        f.write(chunk)

  print ("Podcast Episode downloaded")

  # Load the Whisper model
  import os
  import whisper

  # Load model from saved location
  print ("Load the Whisper model")
  model = whisper.load_model('medium', device='cuda', download_root='/content/podcast/')

  # Perform the transcription
  print ("Starting podcast transcription")
  result = model.transcribe(local_path + episode_name)

  # Return the transcribed text
  print ("Podcast transcription completed, returning results...")
  output = {}
  output['podcast_title'] = podcast_title
  output['episode_title'] = episode_title
  output['episode_image'] = episode_image
  output['episode_transcript'] = result['text']
  return output

@stub.function(image=corise_image, secret=modal.Secret.from_name("my-openai-secret-2"))
def get_podcast_summary(podcast_transcript):
  import openai
  instructPrompt = """
    Below is a podcast episode transcript. Provide a detailed summary of the episode, providing all key points and topics that are talked about.
    List each individual person mentioned, including details why they are talked about. Also include all significant performances and statistics that are
    talked about during the episode. Also mention all controversial opinions, ideas, or questions that were talked about during this episode. This should
    be an informative and interesting summary of the episode.
    """
  request = instructPrompt + podcast_transcript
  chatOutput = openai.ChatCompletion.create(model="gpt-3.5-turbo-16k",
                                            messages=[{"role": "system", "content": "You are a helpful assistant."},
                                                      {"role": "user", "content": request}
                                                      ]
                                            )
  podcastSummary = chatOutput.choices[0].message.content
  return podcastSummary

@stub.function(image=corise_image, secret=modal.Secret.from_name("my-openai-secret-2"))
def get_podcast_guest(podcast_transcript):
  import openai
  import wikipedia
  import json
  request = podcast_transcript[:11000]
  completion = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": request}],
    functions=[
    {
        "name": "get_podcast_guest_information",
        "description": """Get information on the episode guest or individual who is a significant topic
                          of the episode. Provide their full name, the organization they belong to, and
                          their role. Don't use the host of the show and skip advertisements.""",
        "parameters": {
            "type": "object",
            "properties": {
                "guest_name": {
                    "type": "string",
                    "description": "The full name of the guest or significant individual in the podcast.",
                },
                "guest_organization": {
                    "type": "string",
                    "description": "The name of the organization that the podcast guest or significant individual belongs to.",
                },
                "guest_title": {
                    "type": "string",
                    "description": "The role of the podcast guest or significant individual in their organization.",
                },
            },
            "required": ["guest_name"],
        },
    }
    ],
    function_call={"name": "get_podcast_guest_information"}
    )
  response_message = completion["choices"][0]["message"]

  # Extract relevant information
  podcast_guest = ""
  podcast_guest_org = ""
  podcast_guest_title = ""

  if response_message.get("function_call"):
    function_name = response_message["function_call"]["name"]
    function_args = json.loads(response_message["function_call"]["arguments"])
    podcast_guest=function_args.get("guest_name")
    podcast_guest_org=function_args.get("guest_organization")
    podcast_guest_title=function_args.get("guest_title")
  if (podcast_guest is not None):
    if (podcast_guest_org is None):
      podcast_guest_org = ""
    if (podcast_guest_title is None):
      podcast_guest_title = ""
    try:
      input = wikipedia.page(podcast_guest + " " + podcast_guest_org + " " + podcast_guest_title, auto_suggest=True)
      podcast_guest_summary = input.summary
    except wikipedia.exceptions.PageError:
      print (f'The page for guest "{podcast_guest}" does not exist on Wikipedia.')
      print (f"Due to possible misspellings, let's see what Wikipedia thinks we meant and try that.")
      try:
        # Get suggestion from Wikipedia and try it
        suggestion = wikipedia.suggest(podcast_guest)
        print(f'Suggestion: {suggestion}')
        if (suggestion is None):
          suggestion = podcast_guest
        input = wikipedia.page(suggestion + " " + podcast_guest_org + " " + podcast_guest_title, auto_suggest=True)
        podcast_guest_summary = input.summary
      except wikipedia.exceptions.PageError:
        print (f'The page for guest "{suggestion}" does not exist on Wikipedia.')
        podcast_guest_summary = "Not Available"
      except wikipedia.exceptions.DisambiguationError as e:
        print (f'The page for guest "{suggestion}" is ambiguous. Possible matches are:')
        print(e.options)
        podcast_guest_summary = "Not Available"
    except wikipedia.exceptions.DisambiguationError as e:
      print (f'The page for guest "{podcast_guest}" is ambiguous. Possible matches are:')
      print(e.options)
      podcast_guest_summary = "Not Available"
  else:
    podcast_guest = "Not Available"
    podcast_guest_org = "Not Available"
    podcast_guest_title = "Not Available"
    podcast_guest_summary = "Not Available"

  podcastGuest = {}
  podcastGuest['name'] = podcast_guest
  podcastGuest['org'] = podcast_guest_org
  podcastGuest['title'] = podcast_guest_title
  podcastGuest['summary'] = podcast_guest_summary
  return podcastGuest

@stub.function(image=corise_image, secret=modal.Secret.from_name("my-openai-secret-2"), timeout=375)
def process_podcast(url, path):
  output = {}
  podcast_details = get_transcribe_podcast.call(url, path)
  podcast_summary = get_podcast_summary.call(podcast_details['episode_transcript'])
  podcast_guest = get_podcast_guest.call(podcast_details['episode_transcript'])
  output['podcast_details'] = podcast_details
  output['podcast_summary'] = podcast_summary
  output['podcast_guest'] = podcast_guest
  return output

@stub.local_entrypoint()
def test_method(url, path):
  output = {}
  podcast_details = get_transcribe_podcast.call(url, path)
  print ("Podcast Summary: ", get_podcast_summary.call(podcast_details['episode_transcript']))
  print ("Podcast Guest Information: ", get_podcast_guest.call(podcast_details['episode_transcript']))

Writing /content/podcast/podcast_backend.py


In [72]:
## TESTING MY WIKIPEDIA SEARCHING METHOD ##

import wikipedia

# Possible wikipedia page titles
podcast_guest = 'Stacey Gatsilias'
podcast_guest_org = ''
podcast_guest_title = ''


try:
    input = wikipedia.page(podcast_guest + " " + podcast_guest_org + " " + podcast_guest_title, auto_suggest=True)
    podcast_guest_summary = input.summary
except wikipedia.exceptions.PageError:
    try:
        # Get the suggestion and try again
        suggestion = wikipedia.suggest(podcast_guest)
        print(f'Suggestion: {suggestion}')
        if (suggestion is None):
          suggestion = podcast_guest
        input = wikipedia.page(suggestion + " " + podcast_guest_org + " " + podcast_guest_title, auto_suggest=True)
        podcast_guest_summary = input.summary
    except wikipedia.exceptions.PageError:
        print (f'The page for guest "{suggestion}" does not exist on Wikipedia.')
        podcast_guest_summary = "Not Available"
    except wikipedia.exceptions.DisambiguationError as e:
        print (f'The page for guest "{suggestion}" is ambiguous. Possible matches are:')
        print(e.options)
        podcast_guest_summary = "Not Available"
except wikipedia.exceptions.DisambiguationError as e:
  print (f'The page for guest "{podcast_guest}" is ambiguous. Possible matches are:')
  print(e.options)
  podcast_guest_summary = "Not Available"

print(podcast_guest_summary)

Suggestion: None
The page for guest "Stacey Gatsilias" does not exist on Wikipedia.
Not Available


In [17]:
# Run and test the new integrated function

!modal run /content/podcast/podcast_backend.py --url https://feeds.megaphone.fm/locked-on-mets --path /content/podcast/

[?25l[34m⠋[0m Initializing...[2K[32m✓[0m Initialized. [37mView app at [0m[4;37mhttps://modal.com/apps/ap-5srGUq8Gbuk2kbZ9tOmtmv[0m
[2K[34m⠋[0m Initializing...
[2K[34m⠦[0m Creating objects...
[37m├── [0m[34m⠋[0m Creating get_transcribe_podcast...
[37m└── [0m[34m⠋[0m Creating mount /content/podcast/podcast_backend.py: Uploaded 0/0 inspected
[2K[1A[2K[1A[2K[1A[2K[34m⠏[0m Creating objects...
[37m├── [0m[34m⠸[0m Creating get_transcribe_podcast...
[2K[1A[2K[1A[2K[34m⠹[0m Creating objects...
[37m├── [0m[34m⠦[0m Creating get_transcribe_podcast...
[37m├── [0m[32m🔨[0m Created mount /content/podcast/podcast_backend.py
[2K[1A[2K[1A[2K[1A[2K[34m⠴[0m Creating objects...
[37m├── [0m[34m⠏[0m Creating get_transcribe_podcast...
[37m├── [0m[32m🔨[0m Created mount /content/podcast/podcast_backend.py
[37m├── [0m[34m⠸[0m Creating download_whisper...
[2K[1A[2K[1A[2K[1A[2K[1A[2K[34m⠇[0m Creating objects...
[37m├── [0m[34m⠹

### <b> Deploying the information extraction function (application) to Modal Labs </b>
- Deploying the final function will allow it to act as the back-end solution which runs on Modal's cloud, which can then be called or accessed by the front-end service

In [10]:
# Deploy the new files containing all backend functions to the cloud

!modal deploy /content/podcast/podcast_backend.py

[?25l[34m⠋[0m Creating objects...[2K[34m⠸[0m Creating objects...
[37m├── [0m[34m⠋[0m Creating get_transcribe_podcast...
[37m└── [0m[34m⠋[0m Creating mount /content/podcast/podcast_backend.py: Uploaded 0/0 inspected
[2K[1A[2K[1A[2K[1A[2K[34m⠦[0m Creating objects...
[37m├── [0m[34m⠸[0m Creating get_transcribe_podcast...
[37m├── [0m[34m⠸[0m Creating mount /content/podcast/podcast_backend.py: Building mount
[37m├── [0m[34m⠋[0m Creating download_whisper...
[37m└── [0m[34m⠋[0m Creating mount /content/podcast/podcast_backend.py: Uploaded 0/0 inspected
[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[34m⠏[0m Creating objects...
[37m├── [0m[34m⠦[0m Creating get_transcribe_podcast...
[37m├── [0m[32m🔨[0m Created mount /content/podcast/podcast_backend.py
[37m├── [0m[32m🔨[0m Created download_whisper.
[2K[1A[2K[1A[2K[1A[2K[1A[2K[34m⠹[0m Creating objects...
[37m├── [0m[32m🔨[0m Created get_transcribe_podcast.
[37m├── [0m[32m🔨[0m Crea

- Test the deployed app by providing a call with a new, unseen podcast

In [65]:
# Save information about the podcast used when building the model

import modal
import json

f = modal.Function.lookup("corise-podcast-project", "process_podcast")
output = f.call('https://feeds.megaphone.fm/locked-on-mets', '/content/podcast/')

# Save in JSON format to be used by the model
with open("/content/podcast/podcast-1.json", "w") as outfile:
  json.dump(output, outfile)

<ipython-input-65-63cefd9ce03c>:7: DeprecationError: 2023-08-16: `f.call(...)` is deprecated. It has been renamed to `f.remote(...)`
  output = f.call('https://feeds.megaphone.fm/locked-on-mets', '/content/podcast/')


In [50]:
# Testing the deployed application on a new podcast
# NY Giants

import modal
f = modal.Function.lookup("corise-podcast-project", "process_podcast")
output = f.call('https://www.omnycontent.com/d/playlist/0bdd4a2d-2e09-4198-8e56-aa4900702eb0/7b6324f5-0012-4f0a-a732-abdf00dae41d/1f6e9614-68e5-4030-b1dd-abdf00dae428/podcast.rss', '/content/podcast/')

# Save in JSON format to be used by the model
import json
with open("/content/podcast/podcast-3.json", "w") as outfile:
  json.dump(output, outfile)

<ipython-input-50-3a2a8e1bfa91>:5: DeprecationError: 2023-08-16: `f.call(...)` is deprecated. It has been renamed to `f.remote(...)`
  output = f.call('https://www.omnycontent.com/d/playlist/0bdd4a2d-2e09-4198-8e56-aa4900702eb0/7b6324f5-0012-4f0a-a732-abdf00dae41d/1f6e9614-68e5-4030-b1dd-abdf00dae428/podcast.rss', '/content/podcast/')


In [92]:
# Testing the deployed application on an additional new podcast
# NY Yankees

import modal
f = modal.Function.lookup("corise-podcast-project", "process_podcast")
output = f.call('http://feeds.megaphone.fm/LKN8732581786', '/content/podcast/')

# Save in JSON format to be used by the model
import json
with open("/content/podcast/podcast-2.json", "w") as outfile:
  json.dump(output, outfile)

<ipython-input-92-062096321b19>:5: DeprecationError: 2023-08-16: `f.call(...)` is deprecated. It has been renamed to `f.remote(...)`
  output = f.call('http://feeds.megaphone.fm/LKN8732581786', '/content/podcast/')


In [94]:
# Testing the deployed application on an additional new podcast
# NY Jets

import modal
f = modal.Function.lookup("corise-podcast-project", "process_podcast")
output = f.call('http://feeds.megaphone.fm/PPY3133279731', '/content/podcast/')

# Save in JSON format to be used by the model
import json
with open("/content/podcast/podcast-4.json", "w") as outfile:
  json.dump(output, outfile)

<ipython-input-94-d3986c49d3b2>:5: DeprecationError: 2023-08-16: `f.call(...)` is deprecated. It has been renamed to `f.remote(...)`
  output = f.call('http://feeds.megaphone.fm/PPY3133279731', '/content/podcast/')


## Part 3 - Deploying the front-end application

- Using Streamlit to create a front-end application that can be accessed by other users. Users will be able to input their desired RSS feed into the application to receive their desired summary.

In [None]:
!pip install streamlit

Collecting streamlit
  Downloading streamlit-1.25.0-py2.py3-none-any.whl (8.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.1/8.1 MB[0m [31m26.9 MB/s[0m eta [36m0:00:00[0m
Collecting pympler<2,>=0.9 (from streamlit)
  Downloading Pympler-1.0.1-py3-none-any.whl (164 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m164.8/164.8 kB[0m [31m23.0 MB/s[0m eta [36m0:00:00[0m
Collecting tzlocal<5,>=1.1 (from streamlit)
  Downloading tzlocal-4.3.1-py3-none-any.whl (20 kB)
Collecting validators<1,>=0.2 (from streamlit)
  Downloading validators-0.21.2-py3-none-any.whl (25 kB)
Collecting gitpython!=3.1.19,<4,>=3.0.7 (from streamlit)
  Downloading GitPython-3.1.32-py3-none-any.whl (188 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m188.5/188.5 kB[0m [31m24.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydeck<1,>=0.8 (from streamlit)
  Downloading pydeck-0.8.0-py2.py3-none-any.whl (4.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
# Create a file to be uploaded to github which will control the front end of
# our Streamlit app

# Note this code will be different as updates will be made directly on my Github

%%writefile /content/podcast/podcast_frontend.py
import streamlit as st
import modal
import json
import os

def main():
    st.title("Welcome to Nick's Dashboard for Generating Podcast Summaries")

    # Right section - Newsletter content
    st.header("Let's get started...")
    st.write("""Generate a summary about your favorite podcast's most recent episode on my page!
                Simply copy and paste the link to your podcast's RSS feed on the left side of your screen
                and click process. To find your podcast's RSS feed, search for your podcast's title on a
                podcast search engine website such as listennotes.com and find the section that says RSS
                feed. This should provide you with a URL to the podcast's RSS feed.""")

    available_podcast_info = create_dict_from_json_files('.')

    # Left section - Input fields
    st.sidebar.header("Podcast & RSS Feeds")

    # Dropdown box
    st.sidebar.subheader("Examples of How Your Summary Will Look")
    selected_podcast = st.sidebar.selectbox("Select from list of example podcast summaries below.", options=available_podcast_info.keys())

    if selected_podcast:
        podcast_info = available_podcast_info[selected_podcast]
        # Function to display podcast details
        display_podcast_details(podcast_info)

    # User Input box
    st.sidebar.subheader("Processing Your Podcast")
    url = st.sidebar.text_input("Paste the link to your desired podcast's RSS feed below.")

    process_button = st.sidebar.button("Process")
    st.sidebar.markdown("**Note**: Processing your podcast can take up to 5 minutes.")

    if process_button:

        # Call the function to process the URLs and retrieve podcast guest information
        podcast_info = process_podcast_info(url)
        st.session_state.processed_podcast_info = podcast_info
        # Display the podcast details
        display_podcast_details(podcast_info)

    if hasattr(st.session_state, 'processed_podcast_info'):
        display_podcast_details(st.session_state.processed_podcast_info)

def create_dict_from_json_files(folder_path):
    json_files = [f for f in os.listdir(folder_path) if f.endswith('.json')]
    data_dict = {}

    for file_name in json_files:
        file_path = os.path.join(folder_path, file_name)
        with open(file_path, 'r') as file:
            podcast_info = json.load(file)
            podcast_name = podcast_info['podcast_details']['podcast_title']
            # Process the file data as needed
            data_dict[podcast_name] = podcast_info

    return data_dict

def display_podcast_details(podcast_info):
    # Display the podcast title
    st.subheader("Podcast Episode Title")
    st.write(podcast_info['podcast_details']['episode_title'])

    # Display the podcast summary and the cover image in a side-by-side layout
    col1, col2 = st.columns([8, 2])

    with col1:
        # Display the podcast episode summary
        st.subheader("Episode Summary")
        st.write(podcast_info['podcast_summary'])

    with col2:
        st.image(podcast_info['podcast_details']['episode_image'], caption="Podcast Cover", width=300, use_column_width=True)

    # Display the podcast guest and their details in a side-by-side layout
    col3, col4 = st.columns([4, 6])

    with col3:
        st.subheader("Episode Guest or Significant Person")
        st.write(podcast_info['podcast_guest']['name'])

    with col4:
        st.subheader("Who are they?")
        st.write(podcast_info["podcast_guest"]['summary'])

def process_podcast_info(url):
    f = modal.Function.lookup("corise-podcast-project", "process_podcast")
    output = f.call(url, '/content/podcast/')
    return output

if __name__ == '__main__':
    main()

In [10]:
# Download the created files locally to upload to github

from google.colab import files

In [None]:
# Download the front end file
files.download('/content/podcast/podcast_frontend.py')

In [None]:
# Create a 'requirements' file

%%writefile /content/podcast/requirements.txt
streamlit
modal

Writing /content/podcast/requirements.txt


In [None]:
# Download the created requirements file locally to upload to github

files.download('/content/podcast/requirements.txt')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [95]:
# Download podcast JSON files to upload to Github. These will populate the
# streamlit without having to run the process to save API credits.

files.download('/content/podcast/podcast-1.json')
files.download('/content/podcast/podcast-2.json')
files.download('/content/podcast/podcast-3.json')
files.download('/content/podcast/podcast-4.json')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<b> NOTE:</b>  The remainder of the work required to deploy our front-end application are done on guthub and Streamlit's website.