### Installing necessary packages

In [None]:
%pip install youtube-transcript-api google-generativeai chromadb

### Importing Modules 

In [4]:

from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import TextFormatter

import google.generativeai as genai

import chromadb
from chromadb.utils import embedding_functions

import os


### Setting up api-key

In [5]:
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
genai.configure(api_key=GOOGLE_API_KEY)

### Model and Vector database set-up

In [14]:
genai_model = genai.GenerativeModel('models/gemini-1.5-flash')
chroma_client = chromadb.PersistentClient(path="the_vectordb")
gemini_ef  = embedding_functions.GoogleGenerativeAiEmbeddingFunction(api_key=GOOGLE_API_KEY)
chroma_collection = chroma_client.get_or_create_collection(name='youtube_notes', embedding_function=gemini_ef)

In [67]:
youtube_video_id = 'eMlx5fFNoYc'
y_vid_id2 = 'bBC-nXj3Ng4'
prompt = "Extract key notes from video transcript: "

### Youtube transcript acquisition

The transcript of a video is collected with the aid of YouTubeTranscriptApi and is written on the file.

In [68]:
transcript = YouTubeTranscriptApi.get_transcript(y_vid_id2, languages=['en','en-US','en-GB'])
transcript = TextFormatter().format_transcript(transcript)
# print(transcript)

with open("temp_transcript.txt", "w") as file:
    file.write(transcript)

### Note generation

We utilize the generative model and the initial prompt to extract key points from the YouTube video's transcript, which we gathered in the previous step.

In [69]:
response = genai_model.generate_content(prompt + transcript, stream=False)

# Escape all asterisks in the response text
escaped_response = response.text.replace('*', '')

# Format the escaped response text
formatted_response = f"Generated Notes:\n{'='*20}\n{escaped_response}\n{'='*20}\n"

# Write the formatted response to the file
with open("temp_notes.txt", "w") as file:
    file.write(formatted_response)

### Populating Chroma-db
we insert the notes into our chroma-db collection

In [70]:
with open("temp_notes.txt", "r") as file:
    notes = file.read()

chroma_collection.upsert(
    documents=[notes],
    ids=[youtube_video_id]
)

# result = chroma_collection.get(y_vid_id2, include=['documents'])
# result

### Example queries and their result 

In [74]:
query_text = "what's blockchain?"
n_results = 5

results = chroma_collection.query(
    query_texts=[query_text],
    n_results=n_results,
    include=['documents','distances','metadatas']
)
# print(results)
for i in range(len(results['ids'][0])):
    id = results['ids'][0][i]
    document = results['documents'][0][i]

    print(f"\nResult {i + 1}")
    print("---------------------------")
    print(f"Video URL: https://youtu.be/{id}")
    print("---------------------------")
    # print(f"Document:\n{document}")
    print("---------------------------")


Result 1
---------------------------
Video URL: https://youtu.be/eMlx5fFNoYc
---------------------------
---------------------------

Result 2
---------------------------
Video URL: https://youtu.be/wjZofJX0v4M
---------------------------
---------------------------

Result 3
---------------------------
Video URL: https://youtu.be/hQH4-5o0BMM
---------------------------
---------------------------


In [73]:
prompt = "Answer the following QUESTION using DOCUMENT as context."
for i in range(len(results['documents'][0])):
    prompt += f"QUESTION: {query_text}"
    prompt += f"DOCUMENT: {results['documents'][0][i]}"
    
    response = genai_model.generate_content(prompt, stream=False)
    print(response.text)

Based on the provided text, a blockchain is a distributed, tamper-proof ledger.  It's a chain of "blocks," each containing a batch of transactions.  These blocks are linked together cryptographically, making it extremely difficult to alter any single block without recalculating the hashes for all subsequent blocks.  The security and trustworthiness of the blockchain stem from the immense computational effort ("proof of work") required to create and add new blocks, making fraudulent transactions extremely improbable.  Everyone maintains a copy of the ledger, and consensus on the correct version is achieved by selecting the longest chain (the one requiring the most computational work).

The provided text does not explain blockchain.  The first document describes Bitcoin and uses blockchain as a component of its explanation, but the second document focuses entirely on the architecture of transformer neural networks and has no mention of blockchain.

None of the provided documents contain 