### 라마인덱스 준비

In [1]:
# 패키지 설치
!pip install llama-index==0.11.20
!pip install llama-index-llms-gemini
!pip install llama-index-embeddings-huggingface



In [2]:
import os
from google.colab import userdata

# 환경 변수 준비(좌측 하단의 열쇠 아이콘으로 GOOGLE_API_KEY 설정)
os.environ["GOOGLE_API_KEY"] = userdata.get("GOOGLE_API_KEY")

In [3]:
import logging
import sys

# 로그 레벨 설정
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG, force=True)

In [4]:
from llama_index.core import Settings
from llama_index.llms.gemini import Gemini
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# LLM 준비
Settings.llm = Gemini(
    model_name="models/gemini-1.5-flash",
    safety_settings={
        "HARM_CATEGORY_HARASSMENT": "BLOCK_NONE",
        "HARM_CATEGORY_HATE_SPEECH": "BLOCK_NONE",
        "HARM_CATEGORY_SEXUALLY_EXPLICIT" : "BLOCK_NONE",
        "HARM_CATEGORY_DANGEROUS_CONTENT" : "BLOCK_NONE"
    }
)

# 임베딩 모델 준비
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-m3"
)

INFO:numexpr.utils:NumExpr defaulting to 2 threads.
DEBUG:asyncio:Using selector: EpollSelector
DEBUG:pydot:pydot initializing
DEBUG:pydot:pydot 3.0.2
DEBUG:pydot.dot_parser:pydot dot_parser module initializing
DEBUG:pydot.core:pydot core module initializing
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:43053
INFO:tornado.access:200 GET /v1beta/models/gemini-1.5-flash?%24alt=json%3Benum-encoding%3Dint (127.0.0.1) 1470.03ms
DEBUG:urllib3.co

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /BAAI/bge-m3/resolve/main/config_sentence_transformers.json HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /BAAI/bge-m3/resolve/main/README.md HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /BAAI/bge-m3/resolve/main/modules.json HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /BAAI/bge-m3/resolve/main/sentence_bert_config.json HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /BAAI/bge-m3/resolve/main/config.json HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /BAAI/bge-m3/resolve/main/model.safetensors HTTP/1.1" 404 0
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /BAAI/bge-m3/resolve/main/tokenizer_config.json HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "GET /api/models/BAAI/bge-m3/revision/main HTTP/1.1" 200 4411
DEBUG:urllib3.conne

### 웹 페이지를 사용한 질의 응답

In [5]:
# 패키지 설치
!pip install llama-index-readers-web

Collecting pyee==12.0.0 (from playwright<2.0,>=1.30->llama-index-readers-web)
  Using cached pyee-12.0.0-py3-none-any.whl.metadata (2.8 kB)
Using cached pyee-12.0.0-py3-none-any.whl (14 kB)
Installing collected packages: pyee
  Attempting uninstall: pyee
    Found existing installation: pyee 11.1.1
    Uninstalling pyee-11.1.1:
      Successfully uninstalled pyee-11.1.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pyppeteer 2.0.0 requires pyee<12.0.0,>=11.0.0, but you have pyee 12.0.0 which is incompatible.[0m[31m
[0mSuccessfully installed pyee-12.0.0


In [13]:
from llama_index.readers.web import BeautifulSoupWebReader

# 데이터 로더 준비
reader = BeautifulSoupWebReader()

# 문서 불러오기
documents = reader.load_data(urls=["https://deepmind.google/about/"])

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): deepmind.google:443
DEBUG:urllib3.connectionpool:https://deepmind.google:443 "GET /about/ HTTP/1.1" 200 None


In [14]:
from llama_index.core import VectorStoreIndex

# 인덱스와 쿼리 엔진 준비
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

DEBUG:llama_index.core.node_parser.node_utils:> Adding chunk: About - Google DeepMind























...
DEBUG:llama_index.core.node_parser.node_utils:> Adding chunk: Publications
          
          
        

 —...
DEBUG:llama_index.core.node_parser.node_utils:> Adding chunk: Episode 1
Unreasonably Effective AI with Demis ...
DEBUG:llama_index.core.node_parser.node_utils:> Adding chunk: Imagen 3
Our highest quality text-to-image mode...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [15]:
# 질의 응답
response = query_engine.query("Google DeepMind에 대해 알려주세요.")
print(response)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

DEBUG:llama_index.core.indices.utils:> Top 2 nodes:
> [Node 7a1d75d3-dd2b-447b-8998-f83b2b9da2fe] [Similarity score:             0.742006] About - Google DeepMind






























      Jump to Content
    















G...
> [Node 098df942-a78b-4f62-ad18-f5ea2fc7a700] [Similarity score:             0.703026] Episode 1
Unreasonably Effective AI with Demis Hassabis



              Watch on YouTube
       ...
DEBUG:urllib3.connectionpool:Resetting dropped connection: localhost
INFO:tornado.access:200 POST /v1beta/models/gemini-1.5-flash:generateContent?%24alt=json%3Benum-encoding%3Dint (127.0.0.1) 2507.36ms
DEBUG:urllib3.connectionpool:http://localhost:43053 "POST /v1beta/models/gemini-1.5-flash:generateContent?%24alt=json%3Benum-encoding%3Dint HTTP/1.1" 200 1236
Google DeepMind는 인류에게 도움이 되는 AI를 만드는 것을 목표로 하는 회사입니다. Google DeepMind는 AI가 세상에 도움이 되도록 만들어지고 사용되어야 한다고 생각하며, AI 생태계를 사회를 더 잘 반영하도록 만드는 것을 목표로 합니다. 또한, Google DeepMind는 AI 분야의 가장 복잡하고 흥미로운 과제들을 연구하고 있으며, AI 분야의 혁

### 유튜브 동영상을 사용한 질의 응답

In [9]:
# 패키지 설치
!pip install llama-hub-youtube-transcript
!pip install llama-index-readers-youtube-transcript

Collecting pyee<12.0.0,>=11.0.0 (from pyppeteer>=0.0.14->requests-html->llama-hub-youtube-transcript)
  Using cached pyee-11.1.1-py3-none-any.whl.metadata (2.8 kB)
Using cached pyee-11.1.1-py3-none-any.whl (15 kB)
Installing collected packages: pyee
  Attempting uninstall: pyee
    Found existing installation: pyee 12.0.0
    Uninstalling pyee-12.0.0:
      Successfully uninstalled pyee-12.0.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
playwright 1.48.0 requires pyee==12.0.0, but you have pyee 11.1.1 which is incompatible.[0m[31m
[0mSuccessfully installed pyee-11.1.1




In [10]:
from llama_index.readers.youtube_transcript import YoutubeTranscriptReader

# 데이터 로더 준비
reader = YoutubeTranscriptReader()

# 동영상 불러오기
documents = reader.load_data(
    ytlinks=["https://www.youtube.com/watch?v=jV1vkHv4zq8"]
)

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.youtube.com:443
DEBUG:urllib3.connectionpool:https://www.youtube.com:443 "GET /watch?v=jV1vkHv4zq8 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.youtube.com:443
DEBUG:urllib3.connectionpool:https://www.youtube.com:443 "GET /api/timedtext?v=jV1vkHv4zq8&ei=FykeZ4PkFp7XsfIPsYH3gQ0&caps=asr&opi=112496729&exp=xbt&xoaf=5&hl=en&ip=0.0.0.0&ipbits=0&expire=1730055047&sparams=ip,ipbits,expire,v,ei,caps,opi,exp,xoaf&signature=5916FC0BF1C846A2A44446BA774BD8880D8395AD.2CF38CF0E44A0B08C76588AF13A1E3A08E9E1769&key=yt8&lang=en HTTP/1.1" 200 None


In [11]:
from llama_index.core import VectorStoreIndex

# 인덱스와 쿼리 엔진 준비
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

DEBUG:llama_index.core.node_parser.node_utils:> Adding chunk: [soft music begins] [Sundar Pichai
speaking] Yo...
DEBUG:llama_index.core.node_parser.node_utils:> Adding chunk: [Lila Ibrahim speaking]
Safety and responsibili...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [12]:
# 질의 응답
response = query_engine.query("이 동영상이 전달하고 싶은 내용은 무엇인가요?")
print(response)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

DEBUG:llama_index.core.indices.utils:> Top 2 nodes:
> [Node a4229a15-7c94-4efe-8e35-3e5cb18d68dc] [Similarity score:             0.493245] [Lila Ibrahim speaking]
Safety and responsibility
has to be built-in from the beginning.
And at G...
> [Node 0e7377f6-6612-4f1e-b894-0e0ad0749f6c] [Similarity score:             0.473339] [soft music begins] [Sundar Pichai
speaking] You know, one of the reasons
we got interested in AI...
DEBUG:urllib3.connectionpool:Resetting dropped connection: localhost
DEBUG:urllib3.connectionpool:http://localhost:43053 "POST /v1beta/models/gemini-1.5-flash:generateContent?%24alt=json%3Benum-encoding%3Dint HTTP/1.1" 200 1562
이 동영상은 구글이 개발한 새로운 인공지능 모델인 '제미니'를 소개하고 있습니다. 제미니는 텍스트, 코드, 오디오, 이미지, 비디오 등 다양한 형태의 정보를 이해하고 처리할 수 있는 다중 모달 모델입니다. 제미니는 기존 모델보다 훨씬 뛰어난 성능을 보여주며, 다양한 분야에서 전문가 수준의 능력을 발휘합니다. 또한, 구글은 제미니를 안전하고 책임감 있게 개발하기 위해 노력하고 있으며, 다양한 정책과 테스트를 통해 부정적인 결과를 방지하고 있습니다. 
INFO:tornado.access:200 POST /v1beta/models/gemini-1.5-flash:generateContent?%24alt=json%3B