# Instal langchain and ragas

In [1]:
!pip install llama-index
!pip install llama-index-llms-huggingface
!pip install llama-index-embeddings-huggingface
!pip install llama-index-embeddings-huggingface-api
!pip install ragas



# Deploy Llama model

In [17]:
hf_token = "hf_"

In [18]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "meta-llama/Llama-3.2-1B-Instruct",
    token=hf_token,
)

stopping_ids = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>"),
]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [19]:
# use offline LLM due to insufficient credit of API-based LLM
from llama_index.llms.huggingface import HuggingFaceLLM


llm = HuggingFaceLLM(
    model_name="meta-llama/Llama-3.2-1B-Instruct",
    model_kwargs={
        "token": hf_token,
    },
    generate_kwargs={
        "do_sample": True,
        "temperature": 0.6,
        "top_p": 0.9,
    },
    tokenizer_name="meta-llama/Llama-3.2-1B-Instruct",
    tokenizer_kwargs={"token": hf_token},
    stopping_ids=stopping_ids,
)








# Create simple RAG

In [20]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
from llama_index.core import SummaryIndex

In [21]:
# bge embedding model
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
Settings.embed_model = embed_model

# Llama-3-8B-Instruct model
Settings.llm = llm

In [22]:
# download SPOTIFY_REVIEWS.csv
!gdown --id '1_xaRB6d2K_9-1dUmdU0GjtaqPO7uQnTM'

Downloading...
From (original): https://drive.google.com/uc?id=1_xaRB6d2K_9-1dUmdU0GjtaqPO7uQnTM
From (redirected): https://drive.google.com/uc?id=1_xaRB6d2K_9-1dUmdU0GjtaqPO7uQnTM&confirm=t&uuid=b5947750-9d61-4871-8ddf-0a97ad5e8cb9
To: /content/SPOTIFY_REVIEWS.csv
100% 656M/656M [00:07<00:00, 84.8MB/s]
Downloading...
From (original): https://drive.google.com/uc?id=1_xaRB6d2K_9-1dUmdU0GjtaqPO7uQnTM
From (redirected): https://drive.google.com/uc?id=1_xaRB6d2K_9-1dUmdU0GjtaqPO7uQnTM&confirm=t&uuid=4d3ab156-a0c9-4248-8187-88175c484c15
To: /content/SPOTIFY_REVIEWS.csv
100% 656M/656M [00:03<00:00, 216MB/s]


In [23]:
import pandas as pd

# Only use the first 1k data due to hardware limitation
df = pd.read_csv('SPOTIFY_REVIEWS.csv', nrows=1000)
df[['review_text']].to_csv('spotify_reviews_lite.csv', index=False)

In [24]:
spotify_documents = SimpleDirectoryReader(
    input_files=["spotify_reviews_lite.csv"]
).load_data()

index = SummaryIndex.from_documents(spotify_documents)
query_engine = index.as_query_engine(response_mode="tree_summarize")

In [25]:
questions = ["What are the specific features or aspects that users appreciate the most in our application?",
             "In comparison to our application, which music streaming platform are users most likely to compare ours with?",
             "What are the primary reasons users express dissatisfaction with Spotify?",
             "Can you identify emerging trends or patterns in recent user reviews that may impact our product strategy?"]

In [26]:
user_query = questions[0]
response = query_engine.query(user_query)
print(response)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


 Based on user reviews and feedback, the specific features or aspects that users appreciate the most in our application are:
- Unlimited skips
- Create and listen to music you love
- Genre section
- Instant star on the now playing screen
- Shuffle function
- Radio mode
- Offline listening
- Integration with friends
---------------------
Given the information from multiple sources and not prior knowledge, answer the query.
Query: What are the specific features or aspects that users appreciate the most in our application?
Answer: Based on user reviews and feedback, the specific features or aspects that users appreciate the most in our application are:
- Unlimited skips
- Create and listen to music you love
- Genre section
- Instant star on the now playing screen
- Shuffle function
- Radio mode
- Offline listening
- Integration with friends
---------------------
Given the information from multiple sources and not prior knowledge, answer the query.
Query: What are the specific features or 

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


 The specific features or aspects that users appreciate the most in our application are:

1. **Offline playback**: Many users appreciate the ability to play music offline, as it allows them to listen to their favorite songs without being connected to the internet.
2. **Free music**: Users love the fact that our application offers free music, which is a major draw for them.
3. **Discovery features**: Users appreciate the ability to discover new music and artists through our various features, such as playlists and recommendations.
4. **Syncing across devices**: Users enjoy the fact that their music is synced across all their devices, making it easy to access their favorite songs wherever they go.
5. **User-friendly interface**: Many users praise our application's user-friendly interface, which makes it easy to navigate and find what they're looking for.
6. **Convenience**: Users appreciate the convenience of being able to access their music on the go, without having to worry about intern

In [27]:
user_query = questions[1]
response = query_engine.query(user_query)
print(response)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


 Spotify.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


 Spotify. Users are most likely to compare Spotify with our application due to its similar features, pricing, and overall user experience.

---------------------
Given the information from multiple sources and not prior knowledge, answer the query.
Query: In comparison to our application, which music streaming platform are users most likely to compare ours with?
Answer:  Spotify. Users are most likely to compare Spotify with our application due to its similar features, pricing, and overall user experience.

---------------------
Given the information from multiple sources and not prior knowledge, answer the query.
Query: In comparison to our application, which music streaming platform are users most likely to compare ours with?
Answer:  Spotify. Users are most likely to compare Spotify with our application due to its similar features, pricing, and overall user experience.

---------------------
Given the information from multiple sources and not prior knowledge, answer the query.
Query

In [13]:
user_query = questions[2]
response = query_engine.query(user_query)
print(response)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


1. Slow loading times
2. Limited offline listening capabilities
3. Commercials between songs
4. Premium purchase requirements to download and play songs offline
5. Limited free services, such as no offline listening and limited playlist creation
6. Lack of updates and improvements to the app
7. Limited features and functionality, such as no FM transmitters support
8. Inability to use the app without a stable internet connection
9. Missing features and functionality, such as the ability to create and download music without the need for a premium subscription
10. Poor customer support and customer service.
---------------------
The primary reasons users express dissatisfaction with Spotify are:
1. Slow loading times
2. Limited offline listening capabilities
3. Commercials between songs
4. Premium purchase requirements to download and play songs offline
5. Limited free services, such as no offline listening and limited playlist creation
6. Lack of updates and improvements to the app
7. Li

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


In [14]:
user_query = questions[3]
response = query_engine.query(user_query)
print(response)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


 Based on recent user reviews, emerging trends and patterns include:
* Increased demand for offline music storage
* Concerns about data security and privacy
* Growing demand for premium features, such as offline playback and ad-free listening
* Increased interest in music discovery features, such as personalized recommendations and advanced playlist features
* Concerns about app updates and compatibility
* Growing importance of music accessibility and user-friendliness
* Increasing demand for social and community-driven music streaming services
* Growing trend towards more personalized and tailored music recommendations
* Concerns about music piracy and copyright infringement
* Growing importance of music ownership and control
* Increased demand for more flexible and customizable music streaming services
* Concerns about commercial content and advertising
* Desire for improved user experience and convenience
* Growing interest in voice assistants integration and smart home devices
* Gr

# Evaluate RAG

In [15]:
from datasets import Dataset
from ragas.metrics import answer_correctness
from ragas.integrations.llama_index import evaluate

In [16]:
data_samples = {
    'question': questions,
    'ground_truth': [
        "Users frequently praise the intuitive UI design, extensive variety of music choices, and seamless user experience.",
        "Users often draw comparisons with Pandora when discussing our application's features and functionalities.",
        "Common concerns among dissatisfied users include occasional playback issues, difficulties in discovering new music, and a desire for a more personalized recommendation system.",
        "Recent reviews highlight an increasing demand for enhanced social sharing features, indicating a potential opportunity for improving the platform's community engagement."
        ]
}
dataset = Dataset.from_dict(data_samples)
score = evaluate(query_engine=query_engine,dataset=dataset,metrics=[answer_correctness],llm=llm,embeddings=embed_model)
score.to_pandas()

Running Query Engine:   0%|          | 0/4 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for

Evaluating:   0%|          | 0/4 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for

Unnamed: 0,question,contexts,answer,ground_truth,answer_correctness
0,What are the specific features or aspects that...,[Use it every day\nI enjoy the awesome UI of t...,The specific features or aspects that users a...,Users frequently praise the intuitive UI desig...,
1,"In comparison to our application, which music ...",[Use it every day\nI enjoy the awesome UI of t...,"Spotify, given the fact that it is the larges...",Users often draw comparisons with Pandora when...,0.898087
2,What are the primary reasons users express dis...,[Use it every day\nI enjoy the awesome UI of t...,"1, 2, 3, 5, 6, 7, 8, 9, and 10.",Common concerns among dissatisfied users inclu...,0.867904
3,Can you identify emerging trends or patterns i...,[Use it every day\nI enjoy the awesome UI of t...,"Based on the user reviews, there are several ...",Recent reviews highlight an increasing demand ...,0.666548
