<a href="https://colab.research.google.com/github/sudhir-voleti/MLBM/blob/master/AI_for_APIs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Class,

Namaste Salaam and Satsriakal. Welcome to our first colab drive-through.

Colab allows for a notebook-markdown form (just like the R markdowns we saw in Lec03). Below I shall use this markdown functionality to walk with you tghrough this example.

Forward, ahoy!

Below are prompts used on [Gemini 2.0 Experimental with Search](https://gemini.google.com/app/963711579c055f7d?hl=en-IN)

**First prompt (P1)**
> I have obtained a free API key from the OpenWeatherMaps website.
I need you to construct and send a query to the API and recive4 and format the response into a neat table..
How should i pass the required info to you reg API documentation? Will sharing the url be enough? screenshots of the documentation will do?


What did your AI say?


**Follow-up prompt (P2)**
> Here is the URL for API documentation for current forecast: "https://openweathermap.org/current"
I have obtained an API key. Construct a query for a city in Python (to be run in colab). Code should also receive response and print a neat output dataframe. In the beginning, the program should ask for and accept two user inputs: the API key and the city. Tell me what other info you need from my side.

### Next steps
Yielded the below code. Ensure you have your own API key, don't use mine.

In [None]:
import requests
import pandas as pd

# User input for API key and city
api_key = input("Please enter your OpenWeatherMap API key: ")
city_name = input("Please enter the city name (e.g., Hyderabad): ")

# Construct API query URL
base_url = "https://api.openweathermap.org/data/2.5/weather"
params = {
    'q': city_name,
    'appid': api_key,
    'units': 'metric'  # For temperature in Celsius
}

# Send API request and receive response
try:
    response = requests.get(base_url, params=params)
    response.raise_for_status()  # Raise HTTPError for bad responses (4xx or 5xx)
    weather_data = response.json()

    # Print neat output dataframe
    df = pd.DataFrame({
        'Parameter': [
            'City', 'Country', 'Latitude', 'Longitude', 'Weather Condition',
            'Description', 'Temperature (°C)', 'Feels Like (°C)', 'Humidity (%)',
            'Wind Speed (m/s)', 'Cloudiness (%)', 'Sunrise (UTC)', 'Sunset (UTC)'
        ],
        'Value': [
            weather_data['name'], weather_data['sys']['country'],
            weather_data['coord']['lat'], weather_data['coord']['lon'],
            weather_data['weather'][0]['main'], weather_data['weather'][0]['description'],
            weather_data['main']['temp'], weather_data['main']['feels_like'],
            weather_data['main']['humidity'], weather_data['wind']['speed'],
            weather_data['clouds']['all'], weather_data['sys']['sunrise'],
            weather_data['sys']['sunset']
        ]
    })

    print(df.to_string(index=False))

except requests.exceptions.HTTPError as http_err:
    print(f"HTTP error occurred: {http_err}")
except Exception as err:
    print(f"An error occurred: {err}")

Please enter your OpenWeatherMap API key: 13fc2c526a23adc910da505200aa84be
Please enter the city name (e.g., Hyderabad): Hyderabad
        Parameter      Value
             City  Hyderabad
          Country         IN
         Latitude    17.3753
        Longitude    78.4744
Weather Condition     Clouds
      Description few clouds
 Temperature (°C)      34.23
  Feels Like (°C)      31.95
     Humidity (%)         17
 Wind Speed (m/s)       3.09
   Cloudiness (%)         20
    Sunrise (UTC) 1741050157
     Sunset (UTC) 1741092804


## Third prompt featuring a variation (P3)
Nice, it works. Now, say I have a list of major cities in the world {Tokyo, Beijing, Delhi, Moscow, Paris, New York}.
I'd like to collect current weather data on each and put into a dataframe. Modify code to enable this pls.


In [None]:
import requests
import pandas as pd

# User input for API key
api_key = input("Please enter your OpenWeatherMap API key: ")

# List of major cities
cities = ["Tokyo", "Beijing", "Delhi", "Moscow", "Paris", "New York"]

# Base URL for OpenWeatherMap API
base_url = "https://api.openweathermap.org/data/2.5/weather"

# List to store weather data for all cities
all_cities_data = []

# Loop through each city and fetch weather data
for city_name in cities:
    params = {
        'q': city_name,
        'appid': api_key,
        'units': 'metric'  # For temperature in Celsius
    }

    try:
        response = requests.get(base_url, params=params)
        response.raise_for_status()
        weather_data = response.json()

        # Extract relevant weather parameters for each city
        city_data = {
            'City': weather_data['name'],
            'Country': weather_data['sys']['country'],
            'Latitude': weather_data['coord']['lat'],
            'Longitude': weather_data['coord']['lon'],
            'Weather Condition': weather_data['weather'][0]['main'],
            'Description': weather_data['weather'][0]['description'],
            'Temperature (°C)': weather_data['main']['temp'],
            'Feels Like (°C)': weather_data['main']['feels_like'],
            'Humidity (%)': weather_data['main']['humidity'],
            'Wind Speed (m/s)': weather_data['wind']['speed'],
            'Cloudiness (%)': weather_data['clouds']['all'],
            'Sunrise (UTC)': weather_data['sys']['sunrise'],
            'Sunset (UTC)': weather_data['sys']['sunset']
        }
        all_cities_data.append(city_data)

    except requests.exceptions.HTTPError as http_err:
        print(f"HTTP error occurred for {city_name}: {http_err}")
    except Exception as err:
        print(f"An error occurred for {city_name}: {err}")

# Create DataFrame from the collected data
if all_cities_data: # Check if there is any data to create dataframe
    df_all_cities = pd.DataFrame(all_cities_data)
    print(df_all_cities.to_string(index=False))
else:
    print("No weather data could be retrieved for any of the cities.")

Please enter your OpenWeatherMap API key: 13fc2c526a23adc910da505200aa84be
    City Country  Latitude  Longitude Weather Condition     Description  Temperature (°C)  Feels Like (°C)  Humidity (%)  Wind Speed (m/s)  Cloudiness (%)  Sunrise (UTC)  Sunset (UTC)
   Tokyo      JP   35.6895   139.6917              Snow            snow              0.60            -6.40            92             10.80              75     1741036073    1741077509
 Beijing      CN   39.9075   116.3972            Clouds      few clouds              5.94             3.28            22              3.54              13     1741041829    1741082932
   Delhi      IN   28.6667    77.2167              Haze            haze             22.05            21.43            43              3.09              20     1741050804    1741092761
  Moscow      RU   55.7522    37.6156            Clouds overcast clouds              4.16             0.39            68              4.79             100     1741061604    1741100967
   Pa

Neat, eh? And took like mere seconds or something?

The program collated and organized the response from the API into a neat, structured dataframe table which it displayed. Which btw brings up the Q....

### What does the API Response look like?

To find out, I asked Gemini the following:

> Pls read this colab notebook:"https://colab.research.google.com/drive/1UMHKg4bkCbUWc-cwJMAkBFc2SLBUA_V5?usp=sharing#scrollTo=P3NyAqTJnmre" I want to also display a few dozen lines of the JSON response received for demo purposes, after the second code chunk (the one for multiple cities). What would you suggest? I can add a code chunk after the second to accommodate this.


#### Real-time Debugging In-window
The code Gemini gave me in response to the above prompt had errors - a `NameError` and a `TypeError`. However, unlike in `R`, I didn't have to manually debug by copy-pasting the error message! Since Colab is from Google, the AI features internal to Colab analyzed the error and also suggested a workaround all within the same Colab window!

And here below is the final code I got. Jai ho.

In [None]:
import json

# Display sample JSON response (first few lines)
print("Sample JSON Response for the last city processed (for demo purposes):")
print(json.dumps(weather_data, indent=4, sort_keys=True)[:500]) # Prints first 500 characters of formatted JSON
print("-" * 30) # Separator for clarity

Sample JSON Response for the last city processed (for demo purposes):
{
    "base": "stations",
    "clouds": {
        "all": 100
    },
    "cod": 200,
    "coord": {
        "lat": 40.7143,
        "lon": -74.006
    },
    "dt": 1741093748,
    "id": 5128581,
    "main": {
        "feels_like": -1.05,
        "grnd_level": 1023,
        "humidity": 65,
        "pressure": 1025,
        "sea_level": 1025,
        "temp": 2.86,
        "temp_max": 3.45,
        "temp_min": 1.31
    },
    "name": "New York",
    "sys": {
        "country": "US",
        "id": 46
------------------------------


### Downstream Processing - Visualization

I took a snipshot of the multi-city weather report table from the second code chunk and asked gemini this:

**Detailed Vizualization prompt**
> Attached is a picture of the output dataframe from the API.
In the dataframe above, write me python code to plot and display temperature and windspeed for the different cities.
X-axis should be cities (going east to west). On Y-axis, make a line plot for each of Temp and windspeed (in different colors).
Make it interactive with plotly, so that I can get the values when cursor points to a plot curve.
Make the plot aesthetically appealing (fonts, background colors, labels and legends etc.)

Take a minute and carefully read the prompt. Always nice to appreciate what all we are asking of the model.

+++

Here's the code output Gemini gave me.

In [None]:
import pandas as pd
import plotly.graph_objects as go

# Data from the image (manually input, assuming this is the dataframe 'df_all_cities' from previous code)
data = {
    'City': ['Tokyo', 'Beijing', 'Delhi', 'Moscow', 'Paris', 'New York'],
    'Country': ['JP', 'CN', 'IN', 'RU', 'FR', 'US'],
    'Latitude': [35.6895, 39.9075, 28.6667, 55.7522, 48.8534, 40.7143],
    'Longitude': [139.6917, 116.3972, 77.2167, 37.6156, 2.3488, -74.0060],
    'Weather Condition': ['Snow', 'Clouds', 'Haze', 'Clouds', 'Clear', 'Clouds'],
    'Description': ['snow', 'few clouds', 'haze', 'overcast clouds', 'clear sky', 'overcast clouds'],
    'Temperature (°C)': [0.60, 5.94, 22.05, 4.16, 11.25, 2.86],
    'Feels Like (°C)': [-6.40, 3.28, 21.43, 0.39, 9.87, -1.05],
    'Humidity (%)': [92, 22, 43, 68, 55, 65],
    'Wind Speed (m/s)': [10.80, 3.54, 3.09, 4.79, 1.54, 4.47]
}
df_cities_weather = pd.DataFrame(data)

# Order cities from East to West based on Longitude (descending longitude)
df_cities_weather_sorted = df_cities_weather.sort_values(by='Longitude', ascending=False)

# City names and data for plotting
cities = df_cities_weather_sorted['City']
temperatures = df_cities_weather_sorted['Temperature (°C)']
wind_speeds = df_cities_weather_sorted['Wind Speed (m/s)']

# Create interactive line plot using Plotly
fig = go.Figure()

# Add trace for Temperature
fig.add_trace(go.Scatter(
    x=cities,
    y=temperatures,
    mode='lines+markers',
    name='Temperature (°C)',
    marker=dict(color='red', size=8),
    line=dict(color='red', width=2)
))

# Add trace for Wind Speed
fig.add_trace(go.Scatter(
    x=cities,
    y=wind_speeds,
    mode='lines+markers',
    name='Wind Speed (m/s)',
    marker=dict(color='blue', size=8),
    line=dict(color='blue', width=2)
))

# Update layout for aesthetic appeal
fig.update_layout(
    title={
        'text': '<b>Temperature and Wind Speed for Major Cities (East to West)</b>',
        'x':0.5, # Title position
        'xanchor': 'center', # Title position
        'yanchor': 'top',
        'font': dict(size=24, color='black', family="Arial, bold")
    },
    xaxis=dict(
        title='Cities (East to West)',
        titlefont=dict(size=16, color='black', family="Arial"),
        tickfont=dict(size=12, color='black', family="Arial"),
        showline=True,
        linewidth=2,
        linecolor='black',
        gridcolor='lightgray'
    ),
    yaxis=dict(
        title='Value',
        titlefont=dict(size=16, color='black', family="Arial"),
        tickfont=dict(size=12, color='black', family="Arial"),
        showline=True,
        linewidth=2,
        linecolor='black',
        gridcolor='lightgray'
    ),
    plot_bgcolor='aliceblue',
    paper_bgcolor='lavender',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1,
        font=dict(size=12, color='black', family="Arial")
    ),
    hovermode="x unified" # Show hover info for both traces together
)

# Display the plot
fig.show()

Well, what do you think?

The temps all seem low for Early march no?

Well, turns out its *current* weather and the timestamp is ~ 6am early morning.

P.S. I pulled data early morning whilst waiting for the Starship's 8th launch attempt - live coverage had 2M+ views on X! Launch got postponed hpowever, sigh.

This colab can be shared easily, saved and modified and can do all those other neat things that R markdowns could do.

[Here is this colab's public link](https://colab.research.google.com/drive/1UMHKg4bkCbUWc-cwJMAkBFc2SLBUA_V5?usp=sharing) which can be obtained by clicking on that big, blue **Share** button at the top right of this application.

So when I say the future pof sharing reports etc will be interactive and dynamic and reproducible, you can see where I'm coming from. Only.

## NewsAPI walk-with example

Am assuming you have already walked with me to [the newsAPI wala website](https://newsapi.org/), signed up and obtained your API key.

Again, I used Google's latest Experimental with Apps and Search wala model for this journey below.

**My first prompt**

> I have obtained an API key from https://newsapi.org/ and would like to send requests and receive data.
Here is the documentation url: "https://newsapi.org/docs"
This is the authentication and query construction page: "https://newsapi.org/docs/authentication"
Here is an example of a query: "GET https://newsapi.org/v2/everything?q=keyword&apiKey=68d5b47114834aca94a712d0a22b7305"
Can you tell me what kind of inputs the API seeks?
Can you write python code for Colab that I could run for any inputs such as keywords of my interests?

Here's what Gemini gave back:

In [None]:
import requests
import json

# Prompt for API key and keywords
api_key = input("Enter your NewsAPI API Key: ")
keywords = input("Enter keywords for your news search: ")

# Base URL for the NewsAPI everything endpoint
base_url = "https://newsapi.org/v2/everything"

# Parameters for the request
params = {
    'q': keywords,
    'apiKey': api_key
}

try:
    # Send GET request
    response = requests.get(base_url, params=params)
    response.raise_for_status() # Raise an exception for HTTP errors

    # Parse JSON response
    data = response.json()

    # Check for successful status
    if data['status'] == 'ok':
        print("Total articles found:", data['totalResults'])
        articles = data['articles']

        if articles:
            print("\nArticles:\n")
            for article in articles:
                print(f"Title: {article['title']}")
                print(f"URL: {article['url']}")
                print("-" * 40)
        else:
            print("No articles found matching your keywords.")
    else:
        print("Error in API request:")
        print(data) # Print the error response for debugging

except requests.exceptions.HTTPError as http_err:
    print(f"HTTP error occurred: {http_err}")
except Exception as err:
    print(f"An error occurred: {err}")

Enter your NewsAPI API Key: 68d5b47114834aca94a712d0a22b7305
Enter keywords for your news search: Zelensky
Total articles found: 8325

Articles:

Title: Starmer backs Zelensky after Trump criticism
URL: https://www.bbc.com/news/articles/clyrnjrjrr5o
----------------------------------------
Title: I'll back Ukraine in talks with Trump, Starmer tells Zelensky
URL: https://www.bbc.com/news/articles/c8rklvrnl7ro
----------------------------------------
Title: Zelensky calls for creation of an 'army of Europe'
URL: https://www.bbc.com/news/articles/cvgl27x74wpo
----------------------------------------
Title: Ukrainians back Zelensky after disastrous White House encounter
URL: https://www.bbc.com/news/articles/cgr2g4n4wvdo
----------------------------------------
Title: Zelensky says no peace deal without Ukraine involvement
URL: https://www.bbc.com/news/articles/ckg051z21zno
----------------------------------------
Title: 19 things Trump and his team did this week
URL: https://www.bbc.com/n

Neat, eh?

Now, can iterate as usual only.

**Follow-up Prompt**
> Say my keywords are 'Trump' and 'Zelensky'.
Prompt user to input API key and Keywords as before. Write code to display the top 10 rows of output as a neat HTML dataframe table that I could download as a csv if required.

Let's see what emerges.

In [None]:
import requests
import pandas as pd
import json
from IPython.display import HTML
from google.colab import files

def get_news_dataframe(api_key, keywords):
    """
    Fetches news articles from NewsAPI based on keywords and returns a Pandas DataFrame.

    Args:
        api_key: Your NewsAPI API key.
        keywords: Keywords to search for in news articles.

    Returns:
        pandas.DataFrame: DataFrame containing news articles, or None if an error occurs.
    """
    base_url = "https://newsapi.org/v2/everything"
    params = {
        "q": keywords,
        "apiKey": api_key,
        "pageSize": 100  # Increased page size to get more articles per request
    }
    articles_list = []
    try:
        response = requests.get(base_url, params=params)
        response.raise_for_status()
        data = response.json()

        if data['status'] == 'ok':
            articles = data['articles']
            for article in articles:
                articles_list.append({
                    'Title': article['title'],
                    'Description': article['description'],
                    'Source Name': article['source']['name'],
                    'Author': article['author'],
                    'Published At': article['publishedAt'],
                    'URL': article['url']
                })
        else:
            print("Error in API request:")
            print(data)
            return None

    except requests.exceptions.RequestException as e:
        print(f"Error fetching news: {e}")
        return None

    df = pd.DataFrame(articles_list)
    return df

# 1. Prompt user for API Key and Keywords
api_key = input("Enter your NewsAPI API Key: ")
keywords = input("Enter keywords for your news search (e.g., Trump Zelensky): ")

# 2. Fetch News Data and Create DataFrame
news_df = get_news_dataframe(api_key, keywords)

if news_df is not None:
    # 3. Display Top 12 Rows as HTML Table
    top_12_df = news_df.head(12) # Get the first 12 rows
    html_table = top_12_df.to_html(escape=False, index=False)
    display(HTML(html_table))

    # 4. Option to Download as CSV
    csv_data = news_df.to_csv(index=False)
    download_button = files.download(csv_data.encode('utf-8'), filename='news_articles.csv') # Corrected download call
    print("\nDownload CSV file (all rows) by clicking the button below:")
    display(download_button)

else:
    print("Could not retrieve news data.")

Enter your NewsAPI API Key: 68d5b47114834aca94a712d0a22b7305
Enter keywords for your news search (e.g., Trump Zelensky): Trump, Zelensky


Title,Description,Source Name,Author,Published At,URL
Starmer backs Zelensky after Trump criticism,"The prime minister spoke to Zelensky on the phone and expressed support for ""Ukraine's democratically elected leader"".",BBC News,,2025-02-19T20:48:09Z,https://www.bbc.com/news/articles/clyrnjrjrr5o
"I'll back Ukraine in talks with Trump, Starmer tells Zelensky",The prime minister called Ukraine's Volodymyr Zelensky for the second time in four days.,BBC News,,2025-02-22T16:13:52Z,https://www.bbc.com/news/articles/c8rklvrnl7ro
19 things Trump and his team did this week,"In his fifth week in office, Trump called Zelensky a dictator, sought more federal civil service cuts and dropped in on Nascar.",BBC News,,2025-02-22T03:36:13Z,https://www.bbc.com/news/articles/c93np4y0343o
Zelensky says no peace deal without Ukraine involvement,Ukraine's president says he will not be shut out of peace negotiations after Trump says he agreed to start talks with Putin.,BBC News,,2025-02-13T19:18:20Z,https://www.bbc.com/news/articles/ckg051z21zno
Watch in full: Angry exchange between Trump and Zelensky,The US president accused the Ukrainian leader of being disrespectful in a tense meeting in the Oval Office.,BBC News,,2025-02-28T18:15:09Z,https://www.bbc.com/news/videos/cdel2npwe50o
Trump commends Zelensky ahead of White House talks,"The US leader plays down earlier frictions and predicts a ""very good meeting"" on Friday.",BBC News,,2025-02-28T00:56:41Z,https://www.bbc.com/news/articles/cqjdd2ej4peo
How the Trump-Zelensky talks collapsed in 10 fiery minutes,There were major flashpoints throughout their meeting - here are four of the most animated.,BBC News,,2025-02-28T23:58:52Z,https://www.bbc.com/news/articles/c2erwgwy8vgo
Zelensky says Trump living in Russian 'disinformation space',Donald Trump's suggestion that Kyiv started the war and the Ukrainian leader is unpopular draws a sharp response.,BBC News,,2025-02-19T15:38:12Z,https://www.bbc.com/news/articles/cvg1508r8jro
Fact-checking Trump claims about war in Ukraine,President Donald Trump appeared to accuse Ukraine of being responsible for the war with Russia.,BBC News,,2025-02-19T12:37:01Z,https://www.bbc.com/news/articles/c9814k2jlxko
"Ukraine must strike minerals deal, says Trump adviser",The comments overshadowed Zelensky's meeting with US envoy Keith Kellogg in Kyiv.,BBC News,,2025-02-20T19:59:06Z,https://www.bbc.com/news/articles/ceqjn74gdwzo


TypeError: download() got multiple values for argument 'filename'

Ok, some issues with the download button,

Unsurprising, since Colab is in the cloud - file I/O (input/output) with the local disk is sometimes tricky.

**Qs to ponder upon:**

> What kind of downstream analysis for actionable insight etc could one do with the above output?

> Could we not make an app that takes API_KEY and Keywords and inputs and yields that cool dataframe as output, eh?

Chalo, dassit for now.

Voleti

### YouTube API access with AI handholding

Am I over-relying on the Gemini suite?

In the process of expanding thine exposure to different high quality AI models, kindly deign to try [Qwen2.5Chat](https://chat.qwenlm.ai/) for the following set of prompts. Pls sign in with gmail or something.

Walk with me over this prompt series pls:

First, exploratory prompt
> Is it possible to signup quickly and opbtain API key for youtube API? is there a free tier in that API? can you guide me through the process and the usage limits in the free tier?

What did your Qwen instance say? Somehow liked Qwern's no-nonsense and terse style, even though Gemini's highest tier is more accurate and reliable (from my limited experience with both).

Follow up prompt follows.
> Is the only input required that of the YT video URL? Can the API help fetch views, comments, captions etc. from a video?
Write python code for colab that asks for and accepts user input on the video URL, then uses the URL to create and send an API query request, receives response which it displays as a neat HTML dataframe.
Make sure the program first asks user for their API key as input, then asks for the video URL.
Ensure code implements functionality to fetch comments and their metadata and display as a neat HTML dataframe.

Parse the prompt. Lookup level of detail in woirkflow, instruction etc given. Always helps to have some idea in our minds about what we want the final product to look like, and can iteratively guide the AI to that place, only.

Now walk with me through the [API signup process at this site](https://console.cloud.google.com/?spm=5aebb161.2ef5001f.0.0.14b0c921ZmjLrv) as per Qwen's response. Requires google credentials.

Let's use this video URL on from tech podcaster Dwarkesh's recent interview of Satya Nadella as default: "https://www.youtube.com/watch?v=4GLSzuYXh6w"

Keep your API keys ready! Forward, Ahoy.


In [2]:
import requests
import pandas as pd
from IPython.display import display, HTML

# Function to extract video ID from URL
def extract_video_id(url):
    """
    Extracts the video ID from a YouTube URL.
    Example: https://www.youtube.com/watch?v=4GLSzuYXh6w -> 4GLSzuYXh6w
    """
    from urllib.parse import urlparse, parse_qs
    parsed_url = urlparse(url)
    if parsed_url.hostname == 'youtu.be':  # Shortened URL format
        return parsed_url.path[1:]
    if 'v' in parse_qs(parsed_url.query):  # Standard URL format
        return parse_qs(parsed_url.query)['v'][0]
    return None

# Get user input for the API key
api_key = input("Enter your YouTube Data API key: ")

# Get user input for the YouTube video URL
video_url = input("Enter the YouTube video URL: ")
video_id = extract_video_id(video_url)

if not api_key:
    print("Error: API key is required.")
elif not video_id:
    print("Invalid YouTube URL. Please provide a valid URL.")
else:
    # Construct the API request URL
    url = f"https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails,statistics&id={video_id}&key={api_key}"

    # Send GET request
    response = requests.get(url)

    # Check if the request was successful
    if response.status_code == 200:
        data = response.json()

        # Extract relevant information
        if data['items']:
            video = data['items'][0]
            snippet = video['snippet']
            statistics = video['statistics']

            # Create a dictionary for the video details
            video_details = {
                "Title": snippet.get('title', 'N/A'),
                "Channel": snippet.get('channelTitle', 'N/A'),
                "Published Date": snippet.get('publishedAt', 'N/A'),
                "Views": statistics.get('viewCount', 'N/A'),
                "Likes": statistics.get('likeCount', 'N/A'),
                "Comments": statistics.get('commentCount', 'N/A'),
                "Duration": video['contentDetails'].get('duration', 'N/A')
            }

            # Convert to DataFrame
            df = pd.DataFrame([video_details])

            # Display as an HTML table
            print("Video Details:")
            display(HTML(df.to_html(index=False)))
        else:
            print("No data found for the provided video ID.")
    else:
        print(f"Error: Unable to fetch data (Status Code: {response.status_code})")
        print(response.text)

Enter your YouTube Data API key: AIzaSyBGeAE30VL9uMUKGftYE6Fx8bn8LMRrVMo
Enter the YouTube video URL: https://www.youtube.com/watch?v=4GLSzuYXh6w
Video Details:


Title,Channel,Published Date,Views,Likes,Comments,Duration
Satya Nadella – Microsoft’s AGI Plan & Quantum Breakthrough,Dwarkesh Patel,2025-02-19T15:59:42Z,606008,13533,732,PT1H16M55S


#### Modifying code to fetch YT comments

Well, would be nice to also GET 'em comments, eh?

And metadata around 'em comments such as author name, timestamp, the works.

So wrote the following prompt query:

> Modify code to implement functionality to fetch comments and their metadata and display as a neat HTML dataframe.

P.S. Turns out YT video CCs (closed captions) require another API and a separate set of permissions, so drop it for now.

Below code was what it gave me.

In [3]:
import requests
import pandas as pd
from IPython.display import display, HTML

# Function to extract video ID from URL
def extract_video_id(url):
    """
    Extracts the video ID from a YouTube URL.
    Example: https://www.youtube.com/watch?v=4GLSzuYXh6w -> 4GLSzuYXh6w
    """
    from urllib.parse import urlparse, parse_qs
    parsed_url = urlparse(url)
    if parsed_url.hostname == 'youtu.be':  # Shortened URL format
        return parsed_url.path[1:]
    if 'v' in parse_qs(parsed_url.query):  # Standard URL format
        return parse_qs(parsed_url.query)['v'][0]
    return None

# Get user input for the API key
api_key = input("Enter your YouTube Data API key: ")

# Get user input for the YouTube video URL
video_url = input("Enter the YouTube video URL: ")
video_id = extract_video_id(video_url)

if not api_key:
    print("Error: API key is required.")
elif not video_id:
    print("Invalid YouTube URL. Please provide a valid URL.")
else:
    # Step 1: Fetch video details
    video_details_url = f"https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails,statistics&id={video_id}&key={api_key}"
    response = requests.get(video_details_url)

    if response.status_code != 200:
        print(f"Error fetching video details (Status Code: {response.status_code})")
        print(response.text)
    else:
        data = response.json()

        if not data['items']:
            print("No data found for the provided video ID.")
        else:
            video = data['items'][0]
            snippet = video['snippet']
            statistics = video['statistics']

            # Create a dictionary for the video details
            video_details = {
                "Title": snippet.get('title', 'N/A'),
                "Channel": snippet.get('channelTitle', 'N/A'),
                "Published Date": snippet.get('publishedAt', 'N/A'),
                "Views": statistics.get('viewCount', 'N/A'),
                "Likes": statistics.get('likeCount', 'N/A'),
                "Comments": statistics.get('commentCount', 'N/A'),
                "Duration": video['contentDetails'].get('duration', 'N/A')
            }

            # Display video details
            df_video = pd.DataFrame([video_details])
            print("Video Details:")
            display(HTML(df_video.to_html(index=False)))

            # Step 2: Fetch comments
            comments_url = f"https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId={video_id}&key={api_key}&maxResults=10"
            comments_response = requests.get(comments_url)

            if comments_response.status_code != 200:
                print(f"Error fetching comments (Status Code: {comments_response.status_code})")
                print(comments_response.text)
            else:
                comments_data = comments_response.json()

                if not comments_data['items']:
                    print("No comments found for this video.")
                else:
                    # Extract comment details
                    comments_list = []
                    for item in comments_data['items']:
                        comment_snippet = item['snippet']['topLevelComment']['snippet']
                        comments_list.append({
                            "Author": comment_snippet.get('authorDisplayName', 'N/A'),
                            "Comment": comment_snippet.get('textDisplay', 'N/A'),
                            "Published At": comment_snippet.get('publishedAt', 'N/A'),
                            "Like Count": comment_snippet.get('likeCount', 'N/A')
                        })

                    # Convert to DataFrame
                    df_comments = pd.DataFrame(comments_list)

                    # Display comments
                    print("\nTop Comments:")
                    display(HTML(df_comments.to_html(index=False)))

Enter your YouTube Data API key: AIzaSyBGeAE30VL9uMUKGftYE6Fx8bn8LMRrVMo
Enter the YouTube video URL: https://www.youtube.com/watch?v=4GLSzuYXh6w
Video Details:


Title,Channel,Published Date,Views,Likes,Comments,Duration
Satya Nadella – Microsoft’s AGI Plan & Quantum Breakthrough,Dwarkesh Patel,2025-02-19T15:59:42Z,606013,13533,732,PT1H16M55S



Top Comments:


Author,Comment,Published At,Like Count
@johnmayer2012,"I respect Nadella for a lot of things he has done for/through Microsoft, but this one throws me off. Sure, you can fool an average techie with quantum computing but you cannot fool others who know a little bit about quantum mechanics. The idea of an electron existing in infinite states is a theoretical riddle and not a practical utility. Holding a cryptic device and claiming that it does quantum computing is clearly an intent to wow the investors but nothing else. Furthermore, adding AGI on top of it makes this claim even more ridiculous.<br><br>Microsoft&#39;s board need a come-to-Jesus meeting on this whole topic. This one went a little overboard!",2025-03-04T17:41:25Z,1
@akandee,Loved most of it but especially where Satya brings sufficient context to the topic of cognitive labor - there would be room for higher level cognitive task/knowledge as exponentially knowledge workers are indoctrinated.,2025-03-03T22:33:37Z,0
@serbricksoncole7218,Dwarkesh &gt; Lex Friedman,2025-03-03T14:37:35Z,0
@ghormulla,Satya is boring,2025-03-03T12:45:46Z,0
@ParasaraSridharDuggirala,"Its absolutely disgusting how very very intelligent people like Dwarakesh just swallow hype-garbage 100% when coming from a billionaire. Please read the Majorana paper in Nature, the scientific claims laid in the paper are absolutely nowhere nowhere close to what was made out to be by Microsoft.",2025-03-03T01:08:05Z,0
@sh4dow-fan,this guy keeps offshoring the US workforce to India. Not a US friendly guy for sure!,2025-03-01T20:46:54Z,0
@laxmiheengama6528,Did you see that witchy lady,2025-03-01T07:09:50Z,0
@laxmiheengama6528,You saw that,2025-03-01T06:54:18Z,0
@laxmiheengama6528,R u going to do something cowards,2025-03-01T06:37:53Z,0
@laxmiheengama6528,Mr co pilot,2025-03-01T06:28:45Z,0


### Wikipedia Trends Tracker

Because, why not?

Besides, think of the value of information about page view counts for specific wikipedia pages on a daily basis for specified date ranges....

Let's head there right away. Here's moi first exploratory prompt to the machine:

> Say we want to access the wikipedia API through Python via Colab. Know any packages we could use?

What did your AI say? My AI instance, in that instance, strongly recommended a package called (surprise!) `wikipedia`. Here's the follow-up prompt:

> Nice. Could you write self-contained python code executable in Colab that uses this wikipedia package you mention to ask for and accept a wikipedia URL as user input, query the API and output associated page view metrics as a neat HTML dataframe. Make sure the program first asks user for the URL as input.

Note the level of detail in outlining a blueprint of the functionality or iinterface we are building.

**Note** Use this as our sample wiki URL: "https://en.wikipedia.org/wiki/Indian_School_of_Business"

Anyway, below code is from the AI:

In [4]:
import requests
import pandas as pd
from datetime import datetime, timedelta
from IPython.display import display, HTML

# Function to extract article title from Wikipedia URL
def extract_article_title(url):
    """
    Extracts the article title from a Wikipedia URL.
    Example: https://en.wikipedia.org/wiki/Indian_School_of_Business -> Indian_School_of_Business
    """
    from urllib.parse import urlparse
    parsed_url = urlparse(url)
    if parsed_url.hostname.endswith('wikipedia.org') and parsed_url.path.startswith('/wiki/'):
        return parsed_url.path.split('/wiki/')[-1]
    return None

# Get user input for the Wikipedia URL
wiki_url = input("Enter the Wikipedia URL: ")
article_title = extract_article_title(wiki_url)

if not article_title:
    print("Invalid Wikipedia URL. Please provide a valid URL.")
else:
    # Define the Wikimedia Pageviews API endpoint
    api_url = "https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/{project}/{access}/{agent}/{article}/daily/{start_date}/{end_date}"

    # Parameters
    project = "en.wikipedia"  # English Wikipedia
    access = "all-access"     # All access methods (desktop, mobile, etc.)
    agent = "user"            # Exclude bots
    start_date = (datetime.now() - timedelta(days=30)).strftime('%Y%m%d')  # Start date: 30 days ago
    end_date = datetime.now().strftime('%Y%m%d')                           # End date: today

    # Construct the API request URL
    url = api_url.format(
        project=project,
        access=access,
        agent=agent,
        article=article_title,
        start_date=start_date,
        end_date=end_date
    )

    # Send GET request
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }
    response = requests.get(url, headers=headers)

    # Check if the request was successful
    if response.status_code != 200:
        print(f"Error fetching page views (Status Code: {response.status_code})")
        print(response.text)
    else:
        data = response.json()

        if 'items' not in data:
            print("No page view data found for this article.")
        else:
            # Extract daily page views
            page_views = []
            for item in data['items']:
                date = item['timestamp'][:8]  # Extract YYYYMMDD
                views = item['views']
                page_views.append({
                    "Date": datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d'),
                    "Views": views
                })

            # Convert to DataFrame
            df = pd.DataFrame(page_views)

            # Display as an HTML table
            print(f"Daily Page Views for '{article_title.replace('_', ' ')}' (Past Month):")
            display(HTML(df.to_html(index=False)))

Enter the Wikipedia URL: https://en.wikipedia.org/wiki/Indian_School_of_Business
Daily Page Views for 'Indian School of Business' (Past Month):


Date,Views
2025-02-03,290
2025-02-04,291
2025-02-05,297
2025-02-06,266
2025-02-07,276
2025-02-08,239
2025-02-09,226
2025-02-10,256
2025-02-11,289
2025-02-12,248


Well, will modify code to give me better organized output. Hence this prompt:

> Nice. Modify code above to ask for and accept user input for a date range (start date and end-date) after the URL.
Is there any way we could add a calendar widget into the start and end date selection fields? If so, incorporate.
Display only the top dozen rows of the output table.

Note the new features we asked for and added.

For this exercise, let's look at these URLs for this time: "https://en.wikipedia.org/wiki/Bitcoin" for the date range 01-Jan to today?

Here's the code-gen result:

In [7]:
import requests
import pandas as pd
from datetime import datetime
from IPython.display import display, HTML
import ipywidgets as widgets

# Function to extract article title from Wikipedia URL
def extract_article_title(url):
    """
    Extracts the article title from a Wikipedia URL.
    Example: https://en.wikipedia.org/wiki/Indian_School_of_Business -> Indian_School_of_Business
    """
    from urllib.parse import urlparse
    parsed_url = urlparse(url)
    if parsed_url.hostname.endswith('wikipedia.org') and parsed_url.path.startswith('/wiki/'):
        return parsed_url.path.split('/wiki/')[-1]
    return None

# Get user input for the Wikipedia URL
wiki_url = input("Enter the Wikipedia URL: ")
article_title = extract_article_title(wiki_url)

if not article_title:
    print("Invalid Wikipedia URL. Please provide a valid URL.")
else:
    # Define calendar widgets for date selection
    print("Please select a date range:")
    start_date_picker = widgets.DatePicker(description='Start Date', disabled=False)
    end_date_picker = widgets.DatePicker(description='End Date', disabled=False)

    # Display the date pickers
    display(start_date_picker)
    display(end_date_picker)

    # Callback function to handle date submission
    def on_submit(button):
        # Get selected dates
        start_date = start_date_picker.value
        end_date = end_date_picker.value

        if start_date is None or end_date is None:
            print("Please select both start and end dates.")
            return

        if start_date > end_date:
            print("Start date cannot be later than end date.")
            return

        # Format dates as YYYYMMDD
        start_date_str = start_date.strftime('%Y%m%d')
        end_date_str = end_date.strftime('%Y%m%d')

        # Define the Wikimedia Pageviews API endpoint
        api_url = "https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/{project}/{access}/{agent}/{article}/daily/{start_date}/{end_date}"

        # Parameters
        project = "en.wikipedia"  # English Wikipedia
        access = "all-access"     # All access methods (desktop, mobile, etc.)
        agent = "user"            # Exclude bots

        # Construct the API request URL
        url = api_url.format(
            project=project,
            access=access,
            agent=agent,
            article=article_title,
            start_date=start_date_str,
            end_date=end_date_str
        )

        # Send GET request
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
        }
        response = requests.get(url, headers=headers)

        # Check if the request was successful
        if response.status_code != 200:
            print(f"Error fetching page views (Status Code: {response.status_code})")
            print(response.text)
        else:
            data = response.json()

            if 'items' not in data:
                print("No page view data found for this article.")
            else:
                # Extract daily page views
                page_views = []
                for item in data['items']:
                    date = item['timestamp'][:8]  # Extract YYYYMMDD
                    views = item['views']
                    page_views.append({
                        "Date": datetime.strptime(date, '%Y%m%d').strftime('%Y-%m-%d'),
                        "Views": views
                    })

                # Convert to DataFrame
                df = pd.DataFrame(page_views)

                # Display only the top 12 rows
                print(f"Top 12 Daily Page Views for '{article_title.replace('_', ' ')}':")
                display(HTML(df.head(12).to_html(index=False)))

    # Create a submit button
    submit_button = widgets.Button(description="Submit Dates")
    submit_button.on_click(on_submit)
    display(submit_button)

Enter the Wikipedia URL: https://en.wikipedia.org/wiki/Bitcoin
Please select a date range:


DatePicker(value=None, description='Start Date')

DatePicker(value=None, description='End Date')

Button(description='Submit Dates', style=ButtonStyle())

Top 12 Daily Page Views for 'Bitcoin':


Date,Views
2025-02-01,6333
2025-02-02,6275
2025-02-03,7934
2025-02-04,7306
2025-02-05,6675
2025-02-06,6611
2025-02-07,5549
2025-02-08,5001
2025-02-09,6032
2025-02-10,6735


And one can then plot neat visualizations for insight and so on, as we've done before.

Or perhaps that can be a nice in-class exercise.

#### In-class mini exercise

Say you want to plot a graph of those daily pageview numbers using code-gen by AI. How would you go about it?

Step 1 is always visualize in your mind what the output should look like, and work backwards from there. Ask yourself Qs such as:

1. What kind of plot? Lineplot, barplot, piechart etc.

2. What would be X-axis? Days?

3. What would be the Y-axis? Page views?

4. Any other aesthetics you want to specify or can leave to AI's defaults (e.g., colors, fonts, legend, title, interactivity, etc.)

**Your Task**

> Craft a prompt to take output of the last code chunk and visualize it aesthetically, in an interacctive manner.

You have 5 minutes. Team up with next seat neighbors and test drive this.

## An AI's API key with AI assist

Rubber meets road now... time to use an API key that supplies not data but intelligence on tap.

First things first, pls go [to the Mistral site](https://console.mistral.ai/home) and walk with me to signup for individual API keys.

I then asked Qwen2.5 some suggestions for simple applications and use-cases that can use an AI in the back-end to handle tasks.

> I have signed up for a free API key with mistral.  I want to sample and test-drive a few simple use-cases for using an LLM's API keys in a business or consumer setting. Suggest how.

Qwen gave me a list of 8 use-cases, of varying idea quality. I picked on one to expand further with. See below.

> Let's say I want try this idea: "Personalized Content Generation".
Write python code for colab that [1] asks for and accepts user inputs viz. API key,
[2] then creates a chatbot that asks for and accepts user input in natural language on target consumer profile or preferences (e.g., "young professionals interested in fitness"),
[3] and finally, shows recommended items for target consumer as a dataframe with 2 columns - 'items recommended' and 'reasoning'.
Here is the API documentation for chat interfaces in Mistral, just in case: "https://docs.mistral.ai/api/"

Note how we structured the input prompt further this time into numbered subtasks. Note also how it is a good idea to supply the URL of the API documentation page.

Had to do an iteration or two to debug errors and the final code I got is below:

In [9]:
# Import necessary libraries
import requests
import pandas as pd

# [1] Accept API Key from the user
API_KEY = input("Please enter your Mistral API key: ").strip()
API_URL = "https://api.mistral.ai/v1/chat/completions"  # Correct endpoint from Mistral API docs

# Function to call the Mistral API
def call_mistral_api(prompt, max_tokens=200):
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    data = {
        "model": "mistral-small",  # Specify the model (adjust based on available models)
        "messages": [
            {"role": "user", "content": prompt}
        ],
        "max_tokens": max_tokens
    }
    response = requests.post(API_URL, headers=headers, json=data)
    if response.status_code == 200:
        return response.json().get("choices", [{}])[0].get("message", {}).get("content", "No response generated.")
    else:
        print(f"Error: {response.status_code}, {response.text}")
        return None

# [2] Create a chatbot to accept user input about target consumer profile/preferences
def get_user_input():
    print("\nWelcome to the Personalized Content Generator!")
    print("Please describe the target consumer profile or preferences in natural language.")
    user_input = input("Example: 'Young professionals interested in fitness': ")
    return user_input.strip()

# Generate recommendations based on user input
def generate_recommendations(user_input):
    # Construct the prompt for the LLM
    prompt = (
        f"Based on the following target consumer profile or preferences: '{user_input}', "
        f"recommend 5 items or services and provide reasoning for each recommendation. "
        f"Format the output as a list of items with their reasoning, separated by colons (:)."
    )

    # Call the Mistral API
    response = call_mistral_api(prompt, max_tokens=200)
    if not response:
        return None

    # Parse the response into a DataFrame
    try:
        # Split the response into items and reasoning
        lines = response.split("\n")
        items = []
        reasoning = []
        for line in lines:
            if ":" in line:
                item, reason = line.split(":", 1)
                items.append(item.strip())
                reasoning.append(reason.strip())

        # Create a DataFrame
        df = pd.DataFrame({
            "items recommended": items,
            "reasoning": reasoning
        })
        return df
    except Exception as e:
        print(f"Error parsing response: {e}")
        return None

# Main execution flow
if __name__ == "__main__":
    # Step 1: Get API key
    if not API_KEY:
        print("API key is required to proceed.")
    else:
        # Step 2: Get user input
        user_input = get_user_input()

        # Step 3: Generate recommendations
        recommendations_df = generate_recommendations(user_input)

        # Display results
        if recommendations_df is not None:
            print("\nHere are the personalized recommendations:")
            display(recommendations_df)
        else:
            print("Failed to generate recommendations. Please try again.")

Please enter your Mistral API key: XpgdlVkOMEjExPwFkUNHfLy8mz7kKR70

Welcome to the Personalized Content Generator!
Please describe the target consumer profile or preferences in natural language.
Example: 'Young professionals interested in fitness': Middle aged desk jockey looking for 90s memorabilia

Here are the personalized recommendations:


Unnamed: 0,items recommended,reasoning
0,1. **Nostalgic 90s Desk Accessories**,"As a middle-aged desk jockey, this individual ..."
1,2. **Vintage 90s T-Shirts**,A vintage 90s t-shirt featuring a favorite ban...
2,3. **90s Music on Vinyl or Cassette**,"If the desk jockey is a music fan, they might ..."


Pay heed to the usage rates and so on when using APIs, esp in AI situations (costs have steeply fallen of late, less of a concern aajkal, I'd say).

Downstream analyses and applications abound.

And so and so forth. Will stoip here for now. Back to the slides!

Sudhir Voleti, ISB