# Environment Setup 

### Installing and Loading Packages 

In [None]:
%pip install --upgrade OpenAI

In [None]:
from openai import OpenAI
import getpass
import os
import boto3

# AWS Sagemaker Setup - Making an s3 bucket to Conveniently Store Our MP3 File

### First, run the command below to make a new bucket. Keep in mind that bucket names must be globally unique. 

In [27]:
 !aws s3 mb s3://qtm350finalproject

make_bucket: qtm350finalproject


### Following this, run the code below to list buckets in your account 

In [28]:
!aws s3 ls

2023-10-30 16:02:57 homeworknotebook-api-example-maggie
2023-10-19 14:25:24 image-api-example-maggie
2023-10-31 18:43:44 qtm347-project-recording
2023-11-30 02:40:42 qtm350finalproject
2023-09-25 18:51:31 qtm350maggie
2023-10-12 14:26:08 sagemaker-studio-263430346676-52bcb1wohbc


### Next, we need to add an mp3 zoom recording to this bucket. First, using the JupyterLab file viewer on the left, click the up arrow symbol to upload the zoom mp3 file to your Sagemaker instance. For example, I uploaded my zoom recording named `MLLectureRegularizationandSubsetSelection.mp3`. Now run the code below to move this file to your bucket. 

In [32]:
!aws s3 mv MLLectureRegularizationandSubsetSelection.mp3 s3://qtm350finalproject

move: ./MLLectureRegularizationandSubsetSelection.mp3 to s3://qtm350finalproject/MLLectureRegularizationandSubsetSelection.mp3


### The file should no longer appear in the JupyterLab file viewer. Let's make sure the mp3 file is now in our bucket! 

In [33]:
!aws s3 ls qtm350finalproject

2023-11-30 02:44:36   15441726 MLLectureRegularizationandSubsetSelection.mp3


# API Utilization 

### In order to complete the section below, go to https://openai.com/. Make an account and click on "API keys". Then click on "+ Create new secret key" and copy and paste the secret API key below. 

In [40]:
api_key = getpass.getpass()

 ········


In [73]:
%env OPENAI_API_KEY = $api_key

env: OPENAI_API_KEY=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


### Now, using python with the `boto3` library, run the code below to access and download our mp3 file! 

In [75]:
import boto3

client = OpenAI()

bucket_name = 'qtm350finalproject'
object_key = 'MLLectureRegularizationandSubsetSelection.mp3'

s3 = boto3.client('s3')

local_file_path = 'MLLectureRegularizationandSubsetSelection.mp3'  
s3.download_file(bucket_name, object_key, local_file_path)

### Using OpenAI Whisper to Transcribe Audio Files 

#### Here, we create a function to call the OpenAI Whisper API so that we can transcribe each audio file

In [76]:
with open(local_file_path, 'rb') as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        response_format="text"
    )

In [77]:
transcript

"Hi, everyone. So thank you very much for watching this video for the machine learning course. So for this lecture, I'm just going to be light and have a lab section that I hope this lab session can help you to finish the second homework. So in this lab section, we're going to learn how we can implement the success selection methods and regularization methods in Python. So the first part is we're going to see how we can implement the best success selection, forward selection, and backward selection in Python. Okay. So first, we're going to import the NumPy pandas and the matplotlib package. And then for this exercise, we're going to use the hitters data where we want to predict a baseball player's salary on the basis of various statistics associated with their performance in their previous year. So to start, let's first look at their data so we can load the data from the ISLP package. So what we want to do is we want to predict the salary based on some other variables. And then in this

### Here we output a text file for each audio file input. 

In [79]:
output_filename = 'MLLectureRegularizationandSubsetSelectionWhisperTranscript.txt'

with open(output_filename, "w") as text_file:
  text_file.write(transcript)

### Now, let's make a transcription bucket for our text files and move our text files into the s3 bucket! Make sure to also move your zoom transcription into the bucket! 

In [3]:
 !aws s3 mb s3://finaltranscriptionsbucket

make_bucket: finaltranscriptionsbucket


In [8]:
!aws s3 mv MLLectureRegularizationandSubsetSelectionZoomTranscript.txt s3://finaltranscriptionsbucket

move: ./MLLectureRegularizationandSubsetSelectionZoomTranscript.txt to s3://finaltranscriptionsbucket/MLLectureRegularizationandSubsetSelectionZoomTranscript.txt


In [10]:
!aws s3 mv MLLectureRegularizationandSubsetSelectionWhisperTranscript.txt s3://finaltranscriptionsbucket

move: ./MLLectureRegularizationandSubsetSelectionWhisperTranscript.txt to s3://finaltranscriptionsbucket/MLLectureRegularizationandSubsetSelectionWhisperTranscript.txt


### Let's check to see if our text files are there...

In [12]:
!aws s3 ls finaltranscriptionsbucket

2023-11-30 15:50:09      25563 MLLectureRegularizationandSubsetSelectionWhisperTranscript.txt
2023-11-30 15:49:14      30936 MLLectureRegularizationandSubsetSelectionZoomTranscript.txt
