# Hotel Review Analyzer with Llama 3.2

This notebook uses Llama 3.2 (1B or 3B) for analyzing hotel reviews with RAG.

## Setup Requirements:
1. Accept the Llama license at https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct
2. Login with: `huggingface-cli login`
3. Install required packages (see cell below)

In [None]:
# Install required packages
import sys
!{sys.executable} -m pip install langchain-community langchain
!{sys.executable} -m pip install pandas
!{sys.executable} -m pip install sentence-transformers
!{sys.executable} -m pip install faiss-cpu
!{sys.executable} -m pip install transformers
!{sys.executable} -m pip install accelerate
!{sys.executable} -m pip install torch


Collecting langchain-community
  Downloading langchain_community-0.3.31-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain
  Using cached langchain-0.3.27-py3-none-any.whl.metadata (7.8 kB)
Collecting langchain-core<2.0.0,>=0.3.78 (from langchain-community)
  Downloading langchain_core-0.3.79-py3-none-any.whl.metadata (3.2 kB)
Collecting SQLAlchemy<3.0.0,>=1.4.0 (from langchain-community)
  Downloading sqlalchemy-2.0.44-cp311-cp311-macosx_10_9_x86_64.whl.metadata (9.5 kB)
Collecting requests<3.0.0,>=2.32.5 (from langchain-community)
  Using cached requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting PyYAML<7.0.0,>=5.3.0 (from langchain-community)
  Downloading pyyaml-6.0.3-cp311-cp311-macosx_10_13_x86_64.whl.metadata (2.4 kB)
Collecting aiohttp<4.0.0,>=3.8.3 (from langchain-community)
  Downloading aiohttp-3.13.0-cp311-cp311-macosx_10_9_x86_64.whl.metadata (8.1 kB)
Collecting tenacity!=8.4.0,<10.0.0,>=8.1.0 (from langchain-community)
  Using cached tenacity-9.1.2-py3-none

## Step 1: Load and Prepare Data

In [2]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
import pandas as pd

# Load reviews
print("Loading reviews...")
df = pd.read_csv("tripadvisor_hotel_reviews.csv")

# Split text into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
docs = splitter.create_documents(df["Review"].tolist())

print(f"Total documents: {len(docs)}")
print(f"Sample: {docs[0].page_content[:100]}...")

Loading reviews...
Total documents: 65399
Sample: nice hotel expensive parking got good deal stay hotel anniversary, arrived late evening took advice ...


## Step 2: Create Embeddings and FAISS Index

In [6]:
from sentence_transformers import SentenceTransformer
import numpy as np
import torch
import pickle
import os

# Extract text from documents
docs_text = [doc.page_content for doc in docs]

# Load embedding model
print("Loading embedding model...")
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Check if we have cached embeddings
cache_file = 'embeddings_cache.pkl'
if os.path.exists(cache_file):
    print("Loading cached embeddings...")
    with open(cache_file, 'rb') as f:
        embeddings = pickle.load(f)
    print("✓ Loaded cached embeddings!")
else:
    # Create embeddings (this takes ~78 minutes, so we'll cache it)
    print("Creating embeddings (this may take a few minutes)...")
    print("Note: This will be cached for future runs")
    embeddings = embedding_model.encode(
        docs_text, 
        show_progress_bar=True,
        device='cpu',
        convert_to_tensor=True  # Get as tensor first
    )
    
    # Manually convert tensor to numpy (workaround for numpy availability issue)
    # Convert to list first, then to numpy array to avoid PyTorch-NumPy incompatibility
    embeddings = np.array(embeddings.cpu().tolist(), dtype='float32')
    
    # Save to cache
    print("Saving embeddings to cache...")
    with open(cache_file, 'wb') as f:
        pickle.dump(embeddings, f)
    print("✓ Embeddings cached for future runs!")

print(f"Embeddings shape: {embeddings.shape}")

Loading embedding model...
Creating embeddings (this may take a few minutes)...
Note: This will be cached for future runs


Batches: 100%|██████████| 2044/2044 [17:53<00:00,  1.90it/s]


Saving embeddings to cache...
✓ Embeddings cached for future runs!
Embeddings shape: (65399, 384)


In [7]:
import faiss

# Create FAISS index
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)

# Keep mapping from FAISS id → original text
id_to_text = {i: doc.page_content for i, doc in enumerate(docs)}

print(f"Index created with {index.ntotal} vectors of dimension {dimension}")

Index created with 65399 vectors of dimension 384


## Step 3: Define Semantic Search Function

In [12]:
def semantic_search(query, k=10):
    """Search for top-k most relevant reviews"""
    # Encode query with tensor conversion workaround
    query_tensor = embedding_model.encode([query], convert_to_tensor=True, device='cpu')
    query_emb = np.array(query_tensor.cpu().tolist(), dtype='float32')
    distances, indices = index.search(query_emb, k)
    return [id_to_text[i] for i in indices[0]]

# Test the search
test_results = semantic_search("clean rooms and friendly staff", k=3)
print("Test search results:")
for i, result in enumerate(test_results, 1):
    print(f"{i}. {result[:100]}...")

Test search results:
1. staff efficient friendly lacked warm touch.the room 5th floor small dark clean nicely furnished, vie...
2. hotel staff friendly helpful, rooms small not comfortable clean, say hotel just okay price right,...
3. friendly clean great location room clean great location helpful friendly staff.our room overlooking ...


## Step 4: Load Llama 3.2 Model

**Note:** This will download ~1GB for the 1B model or ~3GB for the 3B model.

**Options:**
- `meta-llama/Llama-3.2-1B-Instruct` - Lightweight, fast (recommended for laptops)
- `meta-llama/Llama-3.2-3B-Instruct` - Better quality, slower

**Warning:** This may take several minutes and could crash if you don't have enough RAM.

In [13]:
!{sys.executable} -m pip install llama-stack

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting llama-stack
  Downloading llama_stack-0.2.12-py3-none-any.whl.metadata (17 kB)
Collecting fastapi<1.0,>=0.115.0 (from llama-stack)
  Downloading fastapi-0.118.3-py3-none-any.whl.metadata (28 kB)
Collecting fire (from llama-stack)
  Downloading fire-0.7.1-py3-none-any.whl.metadata (5.8 kB)
Collecting jsonschema (from llama-stack)
  Downloading jsonschema-4.25.1-py3-none-any.whl.metadata (7.6 kB)
Collecting llama-stack-client>=0.2.12 (from llama-stack)
  Downloading llama_stack_client-0.2.12-py3-none-any.whl.metadata (15 kB)
Collecting openai>=1.66 (from llama-stack)
  Downloading openai-2.3.0-py3-none-any.whl.metadata (29 kB)
Collecting python-jose (from llama-stack)
  Downloading python_jose-3.5.0-py2.py3-none-any.whl.metadata (5.5 kB)
Collecting rich (from llama-stack)
  Downloading rich-14.2.0-py3-none-any.whl.metadata (18 kB)
Collecting starlette (from llama-stack)
  Downloading starlette-0.48.0-py3-none-any.whl.metadata (6.3 kB)
Collecting termcolor (from llama-stack)
  

In [None]:
from langchain_community.llms import Ollama

# Use Ollama (much simpler - no authentication needed!)
print("Connecting to Ollama...")
llm = Ollama(model="llama3.2")

# Test the connection
print("Testing Ollama connection...")
test_response = llm.invoke("Say 'Hello, I'm ready!' in one sentence.")
print(f"Test response: {test_response}")

print("✓ Ollama connected successfully!")

Loading meta-llama/Llama-3.2-1B-Instruct...
This may take a few minutes...


OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct.
403 Client Error. (Request ID: Root=1-68e9b04f-3c5965076d97cbdf1107b113;6880c30e-2d14-435e-82bd-60d5a4935ff9)

Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/resolve/main/config.json.
Access to model meta-llama/Llama-3.2-1B-Instruct is restricted and you are not in the authorized list. Visit https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct to ask for access.

## Step 5: Define Summarization Functions

In [None]:
def llama_summarize(reviews_text, question):
    """Use Llama (via Ollama) to summarize reviews and answer questions"""
    
    # Simpler prompt for Ollama
    prompt = f"""You are analyzing hotel reviews. Provide concise bullet-point summaries.

Based on these hotel reviews:
{reviews_text}

Question: {question}

Provide key insights in bullet points only."""
    
    # Generate response using Ollama
    response = llm.invoke(prompt)
    
    return response

In [None]:
def rag_query(query, top_k=10, max_chars=1500):
    """
    RAG pipeline:
    1. Retrieve top-k relevant reviews
    2. Truncate to fit context window
    3. Generate summary with Llama
    """
    # Step 1: Retrieve relevant reviews
    print(f"🔍 Searching for: '{query}'...")
    top_reviews = semantic_search(query, k=top_k)
    
    # Step 2: Combine and truncate reviews
    combined_reviews = "\n---\n".join(top_reviews)
    if len(combined_reviews) > max_chars:
        combined_reviews = combined_reviews[:max_chars] + "..."
    
    # Step 3: Generate summary
    print("🤖 Generating summary...")
    summary = llama_summarize(combined_reviews, query)
    
    return summary

## Step 6: Test with Example Queries

In [None]:
# Query 1: Positive reviews
query = "What do most positive reviews have in common?"
print(f"Query: {query}\n")
result = rag_query(query, top_k=10)
print(f"\nAnswer:\n{result}")

In [None]:
# Query 2: Negative reviews
query = "What do most negative reviews complain about?"
print(f"Query: {query}\n")
result = rag_query(query, top_k=10)
print(f"\nAnswer:\n{result}")

In [None]:
# Query 3: Hotel staff
query = "How is the hotel staff described?"
print(f"Query: {query}\n")
result = rag_query(query, top_k=10)
print(f"\nAnswer:\n{result}")

In [None]:
# Query 4: Cleanliness
query = "What about cleanliness and room conditions?"
print(f"Query: {query}\n")
result = rag_query(query, top_k=10)
print(f"\nAnswer:\n{result}")

## Step 7: Ask Your Own Questions

In [None]:
# Ask your own question
my_question = "Is the hotel good for families with kids?"

result = rag_query(my_question, top_k=10)
print(f"\nAnswer:\n{result}")

---

## Notes and Tips

### Model Options:
- **Llama-3.2-1B-Instruct**: Lightweight (~1GB RAM), fast inference, good for laptops
- **Llama-3.2-3B-Instruct**: Better quality (~3GB RAM), moderate speed

### Authentication:
To use Llama models, you need to:
1. Accept the license at https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct
2. Login with: `huggingface-cli login` (or set `HF_TOKEN` environment variable)

### Performance Tips:
- **Reduce `top_k`** if generation is slow (try 5-7 instead of 10)
- **Use `max_chars`** to limit context size (default: 1500)
- **Adjust `max_new_tokens`** in the generator for shorter/longer responses
- **Use GPU** if available by setting `torch_dtype=torch.float16`

### Troubleshooting:
- **Kernel crashes**: Your system doesn't have enough RAM. Try the 1B model instead of 3B.
- **Slow generation**: Reduce `top_k` to 5-7, or use fewer reviews.
- **Repetitive output**: Increase `repetition_penalty` to 1.2-1.3.
- **Authentication errors**: Make sure you've accepted the license and logged in.

### Alternative: Use Ollama (Easier!)
If you want to avoid the complexity of HuggingFace authentication:

```python
# 1. Install Ollama: https://ollama.ai
# 2. Run in terminal: ollama pull llama3.2
# 3. Use in code:

from langchain_community.llms import Ollama
llm = Ollama(model="llama3.2")
response = llm("Summarize these reviews: ...")
```