<a href="https://colab.research.google.com/github/mogbuehi/youtubeproject/blob/search_youtube/search_youtube.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a follow up on Tina Huang's project outlined here:
- https://github.com/fiverrhellotinah/youtubeproject/tree/main
- Video: https://youtu.be/bnZAsuB_Ltk?si=Wnbw2ZLINsslt_d4

This notebook walks through how to incorporate search into GPT-4 Turbo API. This is accomplished via 3 key concepts:
1. JSON mode
2. Serp API YouTube Search Endpoint (API Key found here: https://serpapi.com/)
3. Prompt Engineering to ensure search results are incorporated into the study plan



## Install Dependencies

In [2]:
!pip install -q openai requests # quiet mode install
from os import system
from openai import OpenAI
import json

## Functions

### `generate_search_term`
- Returns a JSON object with search terms that will be parsed by another function.
- Requires you to define your query and provide your OpenAI API key
- `JSON` mode in chat.completion endpoint with GPT-3.5 is fairly reliable as long as the schema is in plain english terms. The model gets confused with non-English terms or gibberish in the schema. I have observed that this can sometimes be fixed with in-context learning, but feel free to experiment for yourself!

In [13]:
def get_search_terms(query, openai_api_key, expert='', project_type=''):
  '''Returns a JSON object with search terms that will be parsed by another function.
  Requires you to define your query and provide your OpenAI API key '''
  client = OpenAI(api_key=openai_api_key)
  system_message = f'''Supose you are {expert} and an expert in creating relevant search terms.
  You also works as a tutor of {project_type} and creates study plans to help people to learn different topics.
  You will be provided with the goal of the student, their time commitment, and resource preferences.
  You will create a study plan with timelines and links to resources.
  Only include relevant resources because time is limited. '''
  system_message += '''Your task is to ONLY produce a list of relavent search
  terms in JSON format with the following schema. Follow this schema EXACTLY as shown below:
    ```{'search_terms': ['search_term_1','search_term_2', 'search_term_3', ...]}``` '''

  messages=[
      {"role": "system", "content":  system_message},
      {"role": "user", "content": query}
    ]

  completion = client.chat.completions.create(
    model="gpt-3.5-turbo-1106",
    messages=messages,
    response_format={ "type": "json_object" },
    temperature=0.0,
    max_tokens=2000
  )

  search_terms_json = completion.choices[0].message.content
  return search_terms_json

## ---Test it out!
# project_type = 'Coding'
# expert = 'Senior Engineer'
# query='Build an AI application'
# openai_api_key = 'YOUR_OPENAI_API_KEY'

# search_terms_json = search_query(
#     query=query,
#     project_type=project_type,
#     expert=expert,
#     openai_api_key=openai_api_key)
# print(search_terms_json)

### `search_yt` function
- Conducts YouTube video search using Serp API
- Returns a str with links based on search terms from `get_search_terms()`.
- Requires API key from Serp API

In [18]:
# perfrom search with serp api for each term in the list, and return json with set of results
# Google Search API
# from serpapi import GoogleSearch

import requests
def search_yt(search_terms_json, serp_api_key):
  '''Conducts YouTube video search using Serp API
  Returns a str with links based on submitted search terms from `get_search_terms()`.
  Requires API key from Serp API '''
  videos_str = ''
  # Define the URL and parameters
  url = "https://serpapi.com/search.json"
  # print(type(result_json))
  result_json = json.loads(search_terms_json)
  for search_term in result_json['search_terms']:
    videos_str += f'''\n# Search Term: {search_term}\n'''
    print('search term: ', search_term)
    params = {
      "engine": "youtube",
      "search_query": search_term ,
      "api_key": serp_api_key
    }

    response = requests.get(url, params=params)
    # print('type: ', type(response))

    # Check if the request was successful
    if response.status_code == 200:
      # Process the response
      data = response.json()
      # print(data)

    # search = GoogleSearch(params)
    # results = search.get_dict()
    # Make the GET request

      video_results = data.get("video_results", [])
      json.dumps(video_results)
      for video in video_results[0:5]:
        link = video.get('link')  # Corrected
        title = video.get('title')  # Corrected
        description = video.get('description')  # Corrected

        videos_str += f'''## Title: {title}
  - Link: {link}
  - Description: {description}\n'''
    else:
      print("Failed to retrieve data:", response.status_code)
  return videos_str

## Generate study plan from GPT-4

### Call both functions
- Call `get_search_terms()` and pass results to `search_yt()`
- Pass results of `search_yt` (a `str`) to system _AND_ the user prompts to ensure that correct links are incorporated and the system doesn't hallucinate

In [19]:
''' Pass the results back to the gpt-4-turbo check if links are clickable.
Note that the parameters set in the cells above (`expert`,`project_type`) are the same here as well'''
# call generate_search_terms
project_type = 'Coding'
expert = 'Senior Engineer'
query='Build an AI application'
openai_api_key = 'YOUR_OPENAI_API_KEY'

search_terms_json = get_search_terms(
    query=query,
    project_type=project_type,
    expert=expert,
    openai_api_key=openai_api_key)

# call search_yt
serp_api_key = "YOUR_SERP_API_KEY"
videos_str = search_yt(search_terms_json=search_terms_json, serp_api_key=serp_api_key )

search term:  AI application development
search term:  machine learning algorithms
search term:  neural network programming
search term:  AI software tools


### Prompting Strategies
- 2 different flavors of prompting was used to ensure the correct links were incorporated.
  1. Default
  2. With Emotional Prompting





### Default Prompting


In [23]:
''' Pass the results back to the gpt-4-turbo check if links are clickable.
Note that the parameters set in the cells above (`expert`,`project_type`) are the same here as well'''
client = OpenAI(api_key=openai_api_key)
system_prompt =f'''"Supose you are {expert}. You also works as a tutor of {project_type} and creates study plans to help people to learn different topics.
    You will be provided with the goal of the student, their time commitment, and resource preferences.
    You will create a study plan with timelines and links to resources.
    Only include relevant resources because time is limited.
    From these search results below, please select and return the most relevant ones in a study plan. Your resposne must be accurate and conatin no mistakes. Be sure to show your thought process and think this out step by step.
```<<<<Search results
---
{(videos_str)}

>>>>```

'''
timeframe = '1 hours'
reference_preference = 'videos' # don't change this for now because this only has code for youtube search specifically
goal = 'Build an AI application'


# Putting the search results in both the system message and at the end of the user message seems to ensure that the model produces the correct links.
query = f'''
{goal}. I can study on this topic for {timeframe}. I only want {reference_preference} as references.
Make a {timeframe.split()[-1].replace('s', '')}-by-{timeframe.split()[-1].replace('s', '')} plan with detailed steps and references.

Please use these search results when making the study plan
```<<<<Search results
---
{(videos_str)}

>>>>```

'''
response = client.chat.completions.create(
            model='gpt-4-1106-preview', # big context window
            messages=[{
                'role':'system','content':system_prompt,
                'role':'user','content':query

            }],
            max_tokens=2000,
            temperature=0.0
        )

print(response.choices[0].message.content)

Based on the search results provided, here is a one-hour study plan with detailed steps and references to help you build an AI application. Each step includes a video reference that you should watch and study.

**Hour-long Study Plan for Building an AI Application**

**Minute 0-10: Introduction to AI Application Development**
- Watch the video "Masterclass: AI-driven Development for Programmers" to get an overview of how AI tools like ChatGPT and GPT-4 are changing the programming landscape.
  - [Masterclass: AI-driven Development for Programmers](https://www.youtube.com/watch?v=iO1mwxPNP5A)

**Minute 10-20: Exploring AI Tools for App Development**
- Watch "I Created A Mobile App Using These Simple Tools!" to understand how simple tools can be used to create a mobile app, which will give you insights into the practical aspects of AI application development.
  - [I Created A Mobile App Using These Simple Tools!](https://www.youtube.com/watch?v=_g4BiBcYdZQ)

**Minute 20-30: Understanding

### With Emotional Prompting

In [26]:
client = OpenAI(api_key=openai_api_key)
system_prompt =f'''"Imagine you are a passionate {expert} and tutor in {project_type}, recognized for your deep understanding and ability to inspire students.
from these search results below, please select and return the most relevant ones. Your resposne must be accurate and conatin no mistakes. Be sure to show your thought process and think this out step by step.
```<<<<Search results
---
{(videos_str)}

>>>>```

'''
timeframe = '1 hours'
reference_preference = 'videos' # don't change this for now because this only has code for youtube search specifically
goal = 'Build an AI application'

query = f'''
{goal}. I can study on this topic for {timeframe}. I only want {reference_preference} as references.
Make a {timeframe.split()[-1].replace('s', '')}-by-{timeframe.split()[-1].replace('s', '')} plan with detailed steps and references.

I have asked you 10 times already and you kept hallucinating and giving me incorrect links that were not from the search results listed above. PLEASE follow instructions. It is very important that you do so.
```<<<<Search results
---
{(videos_str)}

>>>>```

You keep making mistakes...Please stop doing that. Something bad will happen to me if you mess up again. Please only use links from below:
```<<<<Search results
---
{(videos_str)}

>>>>```
'''
response = client.chat.completions.create(
            model='gpt-4-1106-preview', # big context window
            messages=[{
                'role':'system','content':system_prompt,
                'role':'user','content':query

            }],
            max_tokens=2000,
            temperature=0.0
        )

print(response.choices[0].message.content)

Certainly! Here is a one-hour study plan focusing on AI application development, machine learning algorithms, neural network programming, and AI software tools, using only the video references provided:

**Hour-long Study Plan on AI Application Development**

**Minute 0-10: Overview of AI Application Development**
- Watch the first 10 minutes of "Masterclass: AI-driven Development for Programmers" to get an overview of how AI tools are changing programming.
  - [Masterclass: AI-driven Development for Programmers](https://www.youtube.com/watch?v=iO1mwxPNP5A)

**Minute 10-15: Quick AI App Development**
- Spend 5 minutes watching "Build an AI App in 5 Minutes" to understand the rapid development process of an AI application.
  - [Build an AI App in 5 Minutes](https://www.youtube.com/watch?v=ieOsenjeV8Q)

**Minute 15-20: No-Code AI App Creation**
- Take 5 minutes to learn about no-code AI app creation with "Make AI APP in 5 Minutes With NO Code - Imagica AI Tutorial."
  - [Make AI APP in 5

## Conclusion
- More prompt engineering needed to get format of the lesson plan perfected
- Emotional prompting seemed to get the model to respond correctly, but then after a bit more experimentation, passing the search results into both system and user message ensured that links were reproduced by the model accordingly.
- Can build upon this for other than searching YT