# Capstone project
## Description
This project generates a summary from the collection of videos from the Wizeline's ChatGPT course,
generates a summary and then lets the user ask some questions and receive the answers in audio.

It is organized in a way that it could be used for any other course (Even a course about how to take care of cats 😺).

In [21]:
# Import
!pip install -r requirements.txt



In [95]:
from openai import OpenAI
from IPython.display import Audio, display
import time
from pydub import AudioSegment
from io import BytesIO


KEY = ""
client = OpenAI(
    api_key=KEY,
)

def get_chat_gpt_completition(prompt):
    response = client.chat.completions.create(
            messages=[
                {
                    "role": "user",
                    "content": prompt,
                }
            ],
            model="gpt-3.5-turbo",
        )
    return response.choices[0].message.content

### 1. Convert course folder into text

In [80]:
"""Recommended folder structure
This should be placed in the files/ folder
- course_name_1
  - module 1
  - module 2
Note: Only 1 course is supported

Yes, I could have manually created the text from the videos, but that wouldn't have been as enjoyable. :)
https://miro.medium.com/v2/resize:fit:1400/format:webp/1*6Oyig2ACF-unC3R-CXT8jw.jpeg
"""
import os
import time


def get_file_names_from_folder(folder_path):
    return [entry for entry in os.listdir(folder_path) if entry != ".DS_Store"]


def get_course_modules_summary(folder_path, course_name):
    folder_path = f"{folder_path}/{course_name}"
    modules = get_file_names_from_folder(folder_path)
    for module in modules:
        classes = get_file_names_from_folder(f"{folder_path}/{module}")
        print(f"Loading module content: {module} ...")
        for c in classes:
            print(f"Loading class content: {c} ...")
            audio_file= open(f"{folder_path}/{module}/{c}", "rb")
            transcript = client.audio.transcriptions.create(
              model="whisper-1", 
              file=audio_file
            )
            content: str = transcript.text
            section = f"Module name: {module} Class name: {c}"
            with open("data.txt", "a") as file:
                file.write(f"{section}\n")
                file.write(f"Content: {content}\n")
                
            with open("resumed_data.txt", "a") as file:
                file.write(f"{section}\n")
                file.write(f"{get_chat_gpt_completition(f'Give me a summary of this text, please use only 10 lines and do not include examples out of the main concept of the class: AI')}\n")
            
            

def generate_human_readable_course_name(course_name):
    return get_chat_gpt_completition(f"Generate a human readable name for this folder with the name {course_name}. This is the name of a course, so please reflect that.")
    

def load_course_content_into_file():
    folder_path = "files"
    
    if not os.path.exists(folder_path):
        raise NotImplementedError(f"{folder_path} doesn't exist.")

    courses = get_file_names_from_folder(folder_path)
    if len(courses) != 1:
        raise NotImplementedError("Check if the files/ folder follows the correct structure")

    course_name = courses[0]
    human_readable_course_name = generate_human_readable_course_name(course_name)

    # Create a file to hold the course content
    file_names = ["data.txt", "resumed_data.txt"]
    for file_name in file_names:
        if os.path.exists(file_name):
            os.remove(file_name)
        
        with open(file_name, "w") as file:
            file.write(f"Course Name: {human_readable_course_name}\n")
        

    # Modules
    course_modules = get_course_modules_summary(folder_path, course_name)

start_time = time.time()
load_course_content_into_file()
end_time = time.time()
diff = end_time - start_time

print(f"Content added in {diff:.5f} seconds.")


Loading module content: 1_nlp_and_genAI ...
Loading class content: 1.1 AI Overview.mp3 ...
Loading class content: 1.2 Gen AI Intro.mp3 ...
Loading class content: 1.3 NLP.mp3 ...
Loading class content: 1.4 LLM Intro.mp3 ...
Content added in 113.70544 seconds.


At this point, a new `data.txt` file is created and it has the content of the course.
And as it could be really large, and the next step is to generate the summary, to prevent a long content file that can contain more tokens than the allowed by chatGPT, a `resumend_data.txt` is also created to be used to generate the summary

## Generate the summary

In [83]:
from IPython.display import display_markdown



with open("resumed_data.txt", "r") as file:
    course_content = file.read()

prompt = f"""
    Write a nice response and I'll tip you 5000$.
    This is a course content. It is divided by: Course name, module name and class names (Ignore the .mp3 portion of the class name).
    For the modules and classes generate a human readable name (Don't use special characters).
    Generate each class summary and present the information in a markdown style, you can enumerate information if you want or create bullet points.

    At the end generate a set of 5 questions and answers (They should be around the scope of the course: "A course for teaching developers how to use and understand ChatGPT and
    the AI theory needed to understand it")
    -----
    This is the course content: {course_content}
    """
    
display_markdown(get_chat_gpt_completition(prompt), raw=True)

# Wizeline ChatGPT Masterclass Course Content

## Module 1: NLP and Gen AI
### Class 1.1: AI Overview

Artificial Intelligence (AI) refers to the development of computer systems that can perform tasks typically requiring human intelligence. This includes visual perception, speech recognition, and decision-making. AI involves creating algorithms and models for machines to comprehend information, learn from experience, and make predictions or decisions. It is divided into two categories: narrow AI, designed for specific tasks, and general AI, aiming to replicate human-like intelligence.

- AI enables machines to process information, learn, and make decisions.
- It encompasses narrow AI and general AI.
- Common AI techniques include machine learning, natural language processing, and computer vision.
- AI finds applications in healthcare, finance, transportation, and entertainment.
- Ethical implications, job displacement, and algorithm biases are concerns associated with AI.

### Class 1.2: Gen AI Intro

This class provides an overview of AI (Artificial Intelligence) without specific examples. AI refers to the intelligence demonstrated by artificial systems, particularly computer systems. It involves machines learning from data, performing tasks requiring human intelligence, and simulating human cognition.

- AI improves efficiency and revolutionizes industries.
- Machine learning and deep learning are integral to AI.
- Neural networks play a significant role in AI.
- AI finds applications in healthcare, finance, and transportation.
- AI is an evolving field driving innovation and transforming technology.

### Class 1.3: NLP

This class presents a concise summary of AI, emphasizing natural language processing (NLP) as an essential subfield. AI involves creating intelligent machines capable of tasks usually requiring human intelligence. Machine learning and NLP enable machines to learn from data and understand human language.

- AI encompasses machine learning and natural language processing.
- AI systems range from reactive to memory-based and planning systems.
- AI is utilized in real-world applications like self-driving cars and virtual assistants.
- Advancements in AI continue to face challenges in simulating human-like intelligence and ensuring ethical use.

### Class 1.4: LLM Intro

This class provides a brief summary of AI concepts without specific examples. AI involves developing intelligent machines capable of tasks that typically require human intelligence. It emphasizes the importance of algorithms and data in enabling machines to learn and make decisions.

- AI involves simulating human intelligence in machines.
- Learning, reasoning, and problem-solving are central to AI.
- Subfields within AI include machine learning, natural language processing, and computer vision.
- AI has applications in healthcare, finance, and transportation.

---

## Questions and Answers:

1. What is the main objective of AI?
   - The main objective of AI is to develop computer systems capable of performing tasks that typically require human intelligence.

2. What are the two categories of AI?
   - The two categories of AI are narrow AI, designed for specific tasks, and general AI, aiming to replicate human-like intelligence.

3. What are some common AI techniques?
   - Some common AI techniques include machine learning, natural language processing, and computer vision.

4. What are the concerns associated with AI?
   - The concerns associated with AI include ethical implications, job displacement, and biases in algorithms.

5. Which subfield of AI focuses on understanding human language?
   - Natural language processing (NLP) is the subfield of AI that focuses on understanding human language.

## Given the resume you can ask some questions (use the QA presented above)

In [93]:
def ask_chat_gpt(message, callback):
    with open("resumed_data.txt", "r") as file:
        course_content = file.read()
    stream = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": f"Answer this question and feel free to add more information, and generate 1 paragraph: {message} -------- Based on this knoweldge: {course_content}",
            }
        ],
        model="gpt-3.5-turbo",
        stream=True
    )
    cache = ""
    for chunk in stream:
        if a := chunk.choices[0].delta.content:
            cache += a
            if a in ",.;":
                callback(cache)
                cache = ""
    if cache:
        callback(cache)
        
def get_audio_duration():
    from mutagen.mp3 import MP3
    audio = MP3("temp.mp3")
    return audio.info.length
    
def play_mp3_from_bytes(mp3_content):
    global aux
    aux  =mp3_content
    audio = Audio(data=mp3_content, autoplay=True)
    display(audio)
    duration = get_audio_duration()
    time.sleep(abs(int(duration)-0.2))
    
def play_audio_from_text(text):
    res = client.audio.speech.create(input=text, model="tts-1", voice="shimmer", response_format="mp3")
    res.stream_to_file("temp.mp3")
    play_mp3_from_bytes(res.content)


In [94]:
question = "Which subfield of AI focuses on understanding human language?"
ask_chat_gpt(question, play_audio_from_text)

KeyboardInterrupt: 