<a href="https://colab.research.google.com/github/jaydenchoe/ragas-test/blob/main/generate_RAGAS_QnA_samples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Generate RAGAS synthetic documents**

In [1]:
!pip install pyarrow==14.0.1
!pip install requests==2.31.0
!pip install cudf-cu12==24.4.1 ibis-framework==8.0.0 google-colab==1.0.0
!pip install datasets==2.19.0
!pip install --upgrade langchain-openai
!pip install pypdf



In [2]:
!pip install --quiet \
  chromadb \
  langchain \
  langchain_chroma \
  optuna \
  plotly \
  polars \
  ragas

In [3]:
# Importing the packages
from functools import reduce
import json
import os
import requests
import warnings

import chromadb
from chromadb.api.models.Collection import Collection as ChromaCollection
from datasets import load_dataset, Dataset
from getpass import getpass
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_core.runnables.base import RunnableSequence
from langchain_community.document_loaders import WebBaseLoader, PolarsDataFrameLoader
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from operator import itemgetter
import optuna
import pandas as pd
import plotly.express as px
import polars as pl
from ragas import evaluate
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
    answer_correctness
)
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context, conditional



In [4]:
# Providing api key for OPENAI
from google.colab import userdata

# Managing secrets
# - If using Colab please use Colab Secrets
# - If running outside Colab please provide secrets as environmental variables
COLAB = os.getenv("COLAB_RELEASE_TAG") is not None

if COLAB:
  from google.colab import userdata, data_table
  # Secrets
  OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
  os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
  runtime_info = "Colab runtime"

  # Enabling Colab's data formatter for pandas
  data_table.enable_dataframe_formatter()
elif OPENAI_API_KEY := os.environ.get('OPENAI_API_KEY'):
  # Secrets
  runtime_info = "Non Colab runtime"
else:
  OPENAI_API_KEY = getpass("OPENAI_API_KEY")
  os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
  runtime_info = "Non Colab runtime"

print(runtime_info)

Colab runtime


In [5]:
# Getting example docs into vectordb
urls = ["https://en.wikipedia.org/wiki/Large_language_model"]

wikis_loader = WebBaseLoader(urls)
wikis = wikis_loader.load()
#wikis[0]

from langchain.document_loaders import PyPDFLoader

# PDF 파일의 경로를 지정합니다. 실제 경로로 변경해주세요.
pdf_path = "ENN SDK combined document.pdf"

# PyPDFLoader를 사용하여 PDF 파일을 로드합니다.
pdf_loader = PyPDFLoader(pdf_path)

# PDF 내용을 로드합니다.
pdf_pages = pdf_loader.load()

# 첫 번째 페이지의 내용을 출력합니다 (선택사항).
print(pdf_pages[0].page_content)

ENN SDK Quick S tart Guide
Abstract
This guide provides basic instructions for using Exynos Neural Network Software Development Kit (ENN SDK). This guide explains the method to convert Neural Network
(NN) models to Neural Network Container (NNC) models. It also describes the execution of NNC models on Exynos devices.
Introduction
ENN SDK  allows users to convert the trained TensorFlow Lite  neural network models to a format that can run efficiently in Samsung Exynos  hardware. ENN SDK contains
ENN SDK service to convert trained NN models and ENN framework for executing converted models on Exynos platforms.
This guide covers the basics of using ENN SDK service  and executing NN models with ENN framework.
Basic W orkflow
Following figure illustrates the three steps for converting and executing an NN model:
Model Conv ersion
To convert T ensorFlow Lite models, ENN SDK provides an online conversion tool through the Samsung Exynos Developer Society . This online conversion tool allows users

In [14]:
# Examining question evolution types evailable in ragas library
llm35 = ChatOpenAI(model="gpt-3.5-turbo")
llm4 = ChatOpenAI(model="gpt-4-turbo")
generator_llm = llm35
critic_llm = llm4
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", deployment="text-embedding-3-small")

example_generator=None
example_generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings,
    chunk_size=1024
)

# Change resulting question type distribution
list_of_distributions = [{simple: 1}, {reasoning: 1}, {multi_context: 1}, {conditional: 1}]

In [None]:
# This step COSTS $$$ ...
# Generating the example evolutions
#avoid_costs = True
avoid_costs = False

if not avoid_costs:
  # Running ragas to get examples of question evolutions
  question_evolution_types = list(map(lambda x: example_generator.generate_with_langchain_docs(pdf_pages, 10, x), list_of_distributions))
  print(question_evolution_types)
  question_evolution_types_pd = reduce(lambda x, y: pd.concat([x, y], axis=0), [x.to_pandas() for x in question_evolution_types])
  print(question_evolution_types_pd)
  question_evolution_types_pd = question_evolution_types_pd.loc[:, ["evolution_type", "question", "ground_truth"]]
else:
  # Downloading examples for question evolutions discussed in the article:
  question_evolution_types_pd  = pl.read_csv(
    "https://gist.github.com/gox6/bfd422a6f203ba73f081b08c9bb25e66/raw/example-question-evolution-types-in-ragas.csv",
    separator=",",
).drop("index").to_pandas()

embedding nodes:   0%|          | 0/18 [00:00<?, ?it/s]



Generating:   0%|          | 0/10 [00:00<?, ?it/s]



In [None]:
if COLAB:
  display(data_table.DataTable(question_evolution_types_pd, include_index=False, num_rows_per_page=100))
else:
  display(question_evolution_types_pd)