# Summary

This Jupyter Notebook is designed to analyze and recommend restaurants based on taste profiles extracted from Yelp reviews. The workflow involves several key steps:

1. **Data Preparation**:
    - Import necessary libraries and modules.
    - Define the `TasteProfile` class to encapsulate various taste, texture, dietary, and ambiance dimensions.
    - Download necessary NLTK resources for text processing.

2. **Taste Profile Analysis**:
    - Define methods to update taste profiles based on review text.
    - Extract mentions of different taste, texture, dietary, and ambiance aspects from reviews.
    - Analyze sentiment of review text to adjust scores accordingly.

3. **Review Processing**:
    - Process a list of sample reviews to update the taste profile.
    - View the current taste profile and recommend top restaurants based on cuisine parameters.
    - Define hyper-specific cuisine profiles for various cuisines like Italian, Mexican, Japanese, etc.

4. **Yelp Data Analysis**:
    - Define the `YelpAnalyzer` class to handle Yelp data analysis.
    - Implement methods to load, filter, clean, and analyze Yelp data.
    - Extract and analyze taste profiles, perform clustering, and validate results.

5. **Recommendations and Similarity**:
    - Recommend top 10 restaurants based on cuisine profiles.
    - Find the 10 most similar dishes to the initialized taste profile.

The notebook leverages NLP techniques, sentiment analysis, and clustering algorithms to provide insights and recommendations based on user reviews.

In [6]:
from dataclasses import dataclass, field
from typing import Dict, List
from transformers import pipeline
from torch import cuda
import re
import nltk

nltk.download('punkt')  # Ensure the punkt tokenizer is downloaded

@dataclass
class TasteProfile:
    # Core taste dimensions - all start at neutral 0.5
    sweet: float = 0.5      
    salty: float = 0.5      
    spicy: float = 0.5      
    savory: float = 0.5     
    bitter: float = 0.5     
    sour: float = 0.5 

    # Texture dimensions - all start at neutral 0.5
    crunchiness: float = 0.5
    smoothness: float = 0.5
    chewiness: float = 0.5
    creaminess: float = 0.5
    firmness: float = 0.5
    juiciness: float = 0.5
    softness: float = 0.5    
    
    # Dietary and health dimensions
    gluten_free: float = 0.0
    dairy_free: float = 0.0
    vegan: float = 0.0
    vegetarian: float = 0.0
    nut_free: float = 0.0
    shellfish_free: float = 0.0
    price_sensitivity: float = 0.0
    health_consciousness: float = 0.0
    spice_tolerance: float = 0.0
    sustainability_consciousness: float = 0.0
    low_carb: float = 0.0
    low_fat: float = 0.0
    low_sugar: float = 0.0
    organic: float = 0.0
    halal: float = 0.0
    kosher: float = 0.0
    
    # Ambiance and presentation - start at neutral 0.5
    lighting_quality: float = 0.5
    noise_level: float = 0.5
    seating_comfort: float = 0.5
    plating_aesthetics: float = 0.5
    portion_size: float = 0.5
    service_speed: float = 0.5
    cleanliness: float = 0.5
    temperature: float = 0.5
    accessibility: float = 0.5
    friendly_staff: float = 0.5
    family_friendly: float = 0.5
    romantic_ambiance: float = 0.5
    cuisine_profiles: Dict[str, float] = field(default_factory=dict)  # Add this line, float] = field(default_factory=dict)  # Add this line


    review_count: int = 0  # Counter for the number of reviews processed

    def update_scores(self, review_text: str):
        """Update taste profile scores based on review text analysis"""
        print(f"Updating scores based on review: {review_text}")
        
        # Split review into sentences
        sentences = nltk.sent_tokenize(review_text)
        
        # Process each sentence individually
        for sentence in sentences:
            sentiment_scores = self._analyze_sentiment(sentence)
            print(f"Sentiment scores for sentence: {sentiment_scores}")
            
            # Extract mentions and intensities for different aspects
            taste_mentions = self._extract_taste_mentions(sentence)
            texture_mentions = self._extract_texture_mentions(sentence)
            dietary_mentions = self._extract_dietary_mentions(sentence)
            ambiance_mentions = self._extract_ambiance_mentions(sentence)
            
            # Update all dimensions using the review count for weighting
            self._update_scores(taste_mentions, 'taste', sentiment_scores)
            self._update_scores(texture_mentions, 'texture', sentiment_scores)
            self._update_scores(dietary_mentions, 'dietary', sentiment_scores)
            self._update_scores(ambiance_mentions, 'ambiance', sentiment_scores)

        # Increment the review count
        self.review_count += 1

    def _update_scores(self, mentions: List[tuple], category: str, sentiment_scores: dict):
        """Update scores based on mentions for a specific category"""
        for aspect, match in mentions:
            intensity = self._calculate_intensity(match, match.string, sentiment_scores)
            current = getattr(self, aspect)
            # Calculate the weight based on the number of reviews
            weight = 1 / (self.review_count + 1)  # New review affects the score less as count increases
            new_value = (weight * intensity) + ((1 - weight) * current)
            # Adjust for negative sentiment if applicable
            if sentiment_scores['negative'] > 0:
                new_value = max(0.0, new_value - sentiment_scores['negative'])
            setattr(self, aspect, min(max(new_value, 0.0), 1.0))
            print(f"Updated {aspect}: {current} -> {new_value}")

    def _analyze_sentiment(self, text: str) -> dict:
        """Analyze sentiment in review text"""
        print(f"Analyzing sentiment for text: {text}")

        # Initialize sentiment analysis pipeline
        if TasteProfile._sentiment_pipeline is None:
            TasteProfile._sentiment_pipeline = pipeline(
                "sentiment-analysis",
                model="cardiffnlp/twitter-roberta-base-sentiment-latest",
                device=0 if cuda.is_available() else "cpu"
            )

        # Get sentiment predictions
        predictions = TasteProfile._sentiment_pipeline(text)

        # Extract relevant scores and labels
        sentiment_scores = {
            "positive": 0.0,
            "negative": 0.0,
            "neutral": 0.0
        }

        # Analyze text for negative indicators
        negative_indicators = ["too much", "too little", "not very", "lacking"]
        has_negative = any(indicator in text.lower() for indicator in negative_indicators)

        for prediction in predictions:
            if has_negative and prediction['label'] == 'positive':
                # Flip positive to negative if negative indicators present
                sentiment_scores['negative'] = prediction['score']
            else:
                sentiment_scores[prediction['label']] = prediction['score']

        return sentiment_scores

    def _extract_mentions(self, text: str) -> List[tuple]:
        """Extract mentions of tastes, textures, ambiance, and health aspects with their intensities"""
        taste_mentions = self._extract_taste_mentions(text)
        texture_mentions = self._extract_texture_mentions(text)
        ambiance_mentions = self._extract_ambiance_mentions(text)
        health_mentions = self._extract_health_mentions(text)

        return taste_mentions + texture_mentions + ambiance_mentions + health_mentions

    def _extract_taste_mentions(self, text: str) -> List[tuple]:
        """Extract taste-related mentions and their intensities"""
        keywords = self._get_keywords()["taste"]
        matches = []
        for aspect, pattern in keywords.items():
            if found := re.finditer(pattern, text, re.I):
                for match in found:
                    matches.append((aspect, match))
        return matches

    def _extract_texture_mentions(self, text: str) -> List[tuple]:
        """Extract texture-related mentions and their intensities"""
        keywords = self._get_keywords()["texture"]
        matches = []
        for aspect, pattern in keywords.items():
            if found := re.finditer(pattern, text, re.I):
                for match in found:
                    matches.append((aspect, match))
        return matches

    def _extract_ambiance_mentions(self, text: str) -> List[tuple]:
        """Extract ambiance-related mentions and their intensities"""
        keywords = self._get_keywords()["ambiance"]
        matches = []
        for aspect, pattern in keywords.items():
            if found := re.finditer(pattern, text, re.I):
                for match in found:
                    matches.append((aspect, match))
        return matches

    def _extract_dietary_mentions(self, text: str) -> List[tuple]:
        """Extract health-related mentions and their intensities"""
        keywords = self._get_keywords()["health"]
        matches = []
        for aspect, pattern in keywords.items():
            if found := re.finditer(pattern, text, re.I):
                for match in found:
                    matches.append((aspect, match))
        return matches

    def _get_keywords(self) -> dict:
        """Return the keywords for different categories"""
        return {
            "taste": {
                "sweet": r"\b(sweet|sugary|honeyed|fruity|cloying|maple|butterscotch)\b",
                "salty": r"\b(salty|briny|savory)\b",
                "spicy": r"\b(spicy|hot|peppery|fiery)\b",
                "savory": r"\b(savory|umami|flavorful|tasty)\b",
                "bitter": r"\b(bitter|acrid|sharp|harsh)\b",
                "sour": r"\b(sour|tart|acidic|vinegary|citrusy)\b",
            },
            "texture": {
                "crunchiness": r"\b(crunchy|crisp|soggy)\b",
                "smoothness": r"\b(smooth|silky|creamy|buttery)\b",
                "chewiness": r"\b(chewy|gummy|tender|elastic)\b",
                "creaminess": r"\b(creamy|rich|thick|watery)\b",
                "firmness": r"\b(firm|solid|sturdy|dense)\b",
                "juiciness": r"\b(juicy|succulent|moist|dry)\b",
                "softness": r"\b(soft|tender|fluffy|light)\b",
            },
            "ambiance": {
                "lighting_quality": r"\blighting\b",
                "noise_level": r"\b(noisy|quiet|loud|silent|hushed|cacophonous|muffled|boisterous|calm)\b",
                "seating_comfort": r"\b(comfortable|uncomfortable|cozy|cramped|spacious|tight|relaxing|uninviting|plush|hard)\b",
                "plating_aesthetics": r"\b(plating|presentation|arrangement|display|garnish|decor)\b",
                "portion_size": r"\b(portion|serving|helping|quantity|size|amount)\b",
                "service_speed": r"\b(speed|slow|quick|prompt|leisurely|fast|delayed|efficient|inefficient)\b",
                "cleanliness": r"\b(clean|dirty|spotless|filthy|neat|messy|pristine|unclean|tidy|disheveled)\b",
                "temperature": r"\b(temperature|hot|cold|warm|chilly|cool|scalding|freezing|tepid)\b",
                "accessibility": r"\b(accessible|inaccessible|convenient|difficult|easy|hard|user-friendly|barrier-free)\b",
                "friendly_staff": r"\b(friendly|unfriendly|welcoming|hostile|polite|rude|approachable|dismissive|attentive|indifferent)\b",
                "family_friendly": r"\b(family|child-friendly|adult-only|kid-friendly|family-oriented|inclusive|welcoming|safe)\b",
                "romantic_ambiance": r"\b(romantic|intimate|cozy|cold|unromantic|passionate|lovely|charming|affectionate|sentimental)\b",
            },
            "health": {
                "gluten_free": r"\b(gluten free|gluten-free|without gluten|no gluten|glutenless)\b",
                "dairy_free": r"\b(dairy free|dairy-free|lactose free|lactose-free|without dairy|no dairy|dairyless)\b",
                "vegan": r"\b(vegan|plant-based|animal-free|no animal products|plant derived)\b",
                "vegetarian": r"\b(vegetarian|meat-free|no meat|meatless|non-meat)\b",
                "nut_free": r"\b(nut free|nut-free|without nuts|no nuts|nutless)\b",
                "shellfish_free": r"\b(shellfish free|shellfish-free|without shellfish|no shellfish|shellfishless)\b",
                "price_sensitivity": r"\b(price|cost-sensitive|budget-conscious|affordable|inexpensive|expensive|pricey)\b",
                "health_consciousness": r"\b(healthy|nutritious|wellness|fit|unhealthy|harmful|health-aware|health-focused)\b",
                "spice_tolerance": r"\b(spicy|mild|hot|bland|spiced|seasoned)\b",
                "sustainability_consciousness": r"\b(sustainable|eco-friendly|unsustainable|harmful to the environment|green|environmentally friendly)\b",
                "low_carb": r"\b(low carb|low-carbohydrate|high carb|carb-heavy|carb-light|reduced carb)\b",
                "low_fat": r"\b(low fat|low-fat|high fat|fatty|reduced fat|light fat)\b",
                "low_sugar": r"\b(low sugar|low-sugar|high sugar|sugary|sugar-free|no sugar)\b",
                "organic": r"\b(organic|non-organic|chemical-free|natural)\b",
                "halal": r"\b(halal|non-halal)\b",
                "kosher": r"\b(kosher|non-kosher|treif|glatt|permitted)\b",
            }
        }

    # Sentiment pipeline instance shared across all instances
    _sentiment_pipeline = None

    def __post_init__(self):
        """Initialize the sentiment pipeline if not already done"""
        if TasteProfile._sentiment_pipeline is None:
            TasteProfile._sentiment_pipeline = pipeline(
                "sentiment-analysis",
                model="cardiffnlp/twitter-roberta-base-sentiment-latest",
                device=0 if cuda.is_available() else "cpu"
            )

    def update_from_review(self, review_text: str, alpha: float = 0.3) -> None:
        """Update profile based on a single review"""
        print(f"Updating from review: {review_text}")
        # Clean text
        review_text = review_text.lower().strip()
        
        # Get sentiment
        sentiment = self._get_sentiment(review_text)
        print(f"Sentiment score: {sentiment}")
        
        # Update dimensions based on keyword matches
        for category, patterns in self._get_keywords().items():
            for aspect, pattern in patterns.items():
                if matches := re.finditer(pattern, review_text, re.I):
                    for match in matches:
                        intensity = self._calculate_intensity(match, review_text, sentiment)
                        current = getattr(self, aspect)
                        # Calculate the weight based on the number of reviews
                        weight = 1 / (self.review_count + 1)  # New review affects the score less as count increases
                        new_value = (weight * intensity) + ((1 - weight) * current)
                        setattr(self, aspect, min(max(new_value, 0.0), 1.0))
                        print(f"Updated {aspect}: {current} -> {new_value}")

    def _get_sentiment(self, text: str) -> float:
        """Get sentiment score from -1 to 1"""
        result = self._sentiment_pipeline(text)[0]
        if result["label"] == "positive":
            return result["score"]
        elif result["label"] == "negative":
            return -result["score"]
        return 0.0

    def _calculate_intensity(self, match: re.Match, text: str, sentiment_scores: dict) -> float:
        """Calculate intensity of a match based on context and sentiment"""
        # Get surrounding context
        start, end = match.span()
        context = text[max(0, start-20):min(len(text), end+20)]
        
        # Base intensity from sentiment
        intensity = abs(sentiment_scores['positive'] - sentiment_scores['negative'])
        
        # Boost for intensifiers
        intensifiers = ["very", "really", "extremely", "super"]
        if any(i in context for i in intensifiers):
            intensity *= 1.5
            
        return min(intensity, 1.0)

    def to_dict(self) -> Dict[str, float]:
        """Convert profile to dictionary"""
        return {
            k: v for k, v in self.__dict__.items() 
            if isinstance(v, float)
        }

# Sample reviews for testing
SAMPLE_REVIEWS = [
    "The food was incredibly sweet and sugary, almost too much so. Very creamy texture.",
    "A perfectly spicy dish with great savory flavors. The meat was tender and juicy.",
    "Excellent vegan options and gluten-free menu. The staff was super friendly.",
    "The lighting was too dim and it was very noisy. The service was slow.",
    "Fresh and crispy vegetables, perfectly seasoned. Clean and comfortable environment.",
    "Not very flavorful and quite bitter. The texture was too chewy.",
    "Amazing dairy-free alternatives. The dessert was smooth and silky.",
    "The portions were huge and the price was reasonable. Very family-friendly place.",
    "Authentic halal options. The meat was tender and well-spiced.",
    "Great kosher menu with organic ingredients. The ambiance was romantic.",
    "The sushi was fresh but a bit too salty. The seating was uncomfortable.",
    "Perfect balance of sour and sweet in their signature cocktails.",
    "The vegetables were crisp and the sauce was rich and creamy.",
    "Very health-conscious menu with low-carb options. Clean environment.",
    "Sustainable practices and eco-friendly packaging. Food was delicious.",
    "The bread was soft and fluffy. Great vegetarian selection.",
    "Excellent spice tolerance options from mild to very hot.",
    "Beautiful plating and presentation. The food was lukewarm though.",
    "Quick service and friendly staff. The restaurant was spotless.",
    "Perfect for date night with intimate lighting and quiet atmosphere."
]


[nltk_data] Downloading package punkt to /Users/msgfrom96/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [5]:
# I now want to review the taste profile for this area

# Function to view the taste profile
def view_taste_profile(taste_profile: TasteProfile) -> None:
    print("Current Taste Profile:")
    for attribute, score in taste_profile.to_dict().items():
        print(f"{attribute.capitalize()}: {score:.2f}")

# Function to recommend top 10 restaurants based on cuisine parameters
def recommend_restaurants(cuisine_profiles: dict, top_n: int = 10) -> list:
    recommendations = []
    for cuisine, profile in cuisine_profiles.items():
        # Assuming a function `get_top_restaurants` exists that fetches restaurants based on cuisine profile
        top_restaurants = get_top_restaurants(cuisine, profile)
        recommendations.extend(top_restaurants)
    
    # Sort and return the top N recommendations
    return sorted(recommendations, key=lambda x: x['rating'], reverse=True)[:top_n]

# Function to find the 10 most similar dishes to the initialized TasteProfile
def find_similar_dishes(taste_profile: TasteProfile, top_n: int = 10) -> list:
    similar_dishes = []
    for cuisine, dishes in HYPER_SPECIFIC_CUISINES.items():
        for dish_name, scores in dishes.items():
            # Calculate similarity based on the distance between taste profile and dish scores
            similarity_score = sum(
                1 - abs(taste_profile.scores[attribute] - score) for attribute, score in zip(taste_profile.scores.keys(), scores)
            )
            similar_dishes.append((dish_name, similarity_score))
    
    # Sort by similarity score and return the top N similar dishes
    return sorted(similar_dishes, key=lambda x: x[1], reverse=True)[:top_n]

# Viewing the taste profile
view_taste_profile(taste_profile)

# Getting recommendations
top_restaurants = recommend_restaurants(taste_profile.cuisine_profiles)
print("Top 10 Recommended Restaurants:")
for restaurant in top_restaurants:
    print(f"{restaurant['name']} - Rating: {restaurant['rating']}")

# Define a more compact way to handle hyper-specific cuisines
HYPER_SPECIFIC_CUISINES = {
    "Italian": {
        "Neapolitan Pizza": [0.3, 0.5, 0.1, 0.9, 0.2, 0.4, 0.6, 0.5, 0.5, 0.4, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.7, 0.4, 0.6, 0.8, 0.5, 0.5, 0.9, 0.5, 0.5, 0.5, 0.8, 0.6, 0.7],
        "Pasta Carbonara": [0.4, 0.6, 0.2, 0.8, 0.1, 0.3, 0.5, 0.6, 0.7, 0.5, 0.6, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.7, 0.5, 0.5, 0.8, 0.5, 0.5, 0.5, 0.7, 0.6, 0.5],
        "Lasagna": [0.5, 0.5, 0.3, 0.9, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.7, 0.4, 0.6, 0.8, 0.5, 0.5, 0.9, 0.5, 0.5, 0.5, 0.8, 0.6, 0.7],
        "Fettuccine Alfredo": [0.6, 0.4, 0.2, 0.8, 0.3, 0.5, 0.5, 0.7, 0.6, 0.4, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.7, 0.5, 0.5, 0.8, 0.5, 0.5, 0.5, 0.7, 0.6, 0.5],
        "Bruschetta": [0.3, 0.5, 0.1, 0.9, 0.2, 0.4, 0.6, 0.5, 0.5, 0.4, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.7, 0.4, 0.6, 0.8, 0.5, 0.5, 0.9, 0.5, 0.5, 0.5, 0.8, 0.6, 0.7]
    },
    "Mexican": {
        "Tacos": [0.2, 0.5, 0.8, 0.7, 0.1, 0.6, 0.6, 0.4, 0.5, 0.5, 0.5, 0.7, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.8, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5],
        "Mole": [0.6, 0.4, 0.7, 0.8, 0.4, 0.2, 0.3, 0.8, 0.5, 0.6, 0.4, 0.7, 0.6, 0.8, 0.0, 0.0, 0.0, 0.0, 1.0, 0.6, 0.5, 0.7, 0.6, 0.3, 0.4, 0.3, 0.7, 0.6, 0.5, 0.6, 0.5, 0.9, 0.7, 0.4, 0.8, 0.7, 0.6, 0.7],
        "Enchiladas": [0.5, 0.5, 0.6, 0.7, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6, 0.0, 0.6, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5],
        "Chiles Rellenos": [0.4, 0.6, 0.5, 0.8, 0.3, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 0.0, 0.5, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5],
        "Tamales": [0.3, 0.5, 0.4, 0.6, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5]
    },
    "Japanese": {
        "Sushi": [0.5, 0.4, 0.2, 0.9, 0.3, 0.5, 0.5, 0.7, 0.6, 0.4, 0.5, 0.5, 0.5, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.2, 0.9, 0.3, 0.7, 0.7, 0.8, 0.8, 0.7, 0.6, 0.5, 0.7, 0.4, 0.9, 0.6, 0.7, 0.9, 0.8, 0.6, 0.7],
        "Ramen": [0.4, 0.5, 0.3, 0.8, 0.2, 0.5, 0.4, 0.6, 0.7, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.3, 0.5, 0.5, 0.5, 0.3, 0.4, 0.7, 0.4, 0.6, 0.5, 0.6, 0.6, 0.8, 0.8, 0.7, 0.8, 0.9, 0.6, 0.5],
        "Tempura": [0.5, 0.5, 0.4, 0.6, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5],
        "Sashimi": [0.4, 0.5, 0.3, 0.8, 0.2, 0.5, 0.4, 0.6, 0.7, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.3, 0.5, 0.5, 0.5, 0.3, 0.4, 0.7, 0.4, 0.6, 0.5, 0.6, 0.6, 0.8, 0.8, 0.7, 0.8, 0.9, 0.6, 0.5],
        "Udon": [0.3, 0.5, 0.4, 0.6, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5]
    },
    "Indian": {
        "Curry": [0.4, 0.5, 0.9, 0.7, 0.2, 0.5, 0.5, 0.5, 0.5, 0.6, 0.5, 0.5, 0.5, 0.8, 0.0, 0.0, 0.7, 0.7, 1.0, 0.7, 0.6, 0.9, 0.6, 0.4, 0.5, 0.6, 0.5, 0.8, 0.5, 0.6, 0.5, 0.7, 0.7, 0.5, 0.8, 0.8, 0.6, 0.6],
        "Dosa": [0.3, 0.4, 0.7, 0.6, 0.2, 0.4, 0.7, 0.4, 0.6, 0.3, 0.6, 0.4, 0.5, 0.0, 1.0, 1.0, 1.0, 0.8, 1.0, 0.8, 0.7, 0.7, 0.7, 0.5, 0.6, 0.5, 0.6, 0.8, 0.5, 0.5, 0.6, 0.8, 0.7, 0.6, 0.7, 0.7, 0.7, 0.5],
        "Biryani": [0.5, 0.5, 0.8, 0.7, 0.3, 0.5, 0.6, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 0.0, 0.5, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5],
        "Paneer Tikka": [0.4, 0.5, 0.6, 0.7, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5],
        "Samosa": [0.3, 0.5, 0.7, 0.6, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5]
    },
    "Thai": {
        "Pad Thai": [0.5, 0.5, 0.7, 0.6, 0.2, 0.8, 0.6, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.7, 0.5, 0.7, 0.5, 0.3, 0.4, 0.7, 0.5, 0.5, 0.5, 0.6, 0.6, 0.7, 0.7, 0.8, 0.7, 0.6, 0.6, 0.5],
        "Green Curry": [0.4, 0.5, 0.8, 0.7, 0.2, 0.4, 0.4, 0.8, 0.5, 0.7, 0.4, 0.6, 0.6, 1.0, 0.0, 0.0, 0.0, 0.8, 0.0, 0.6, 0.7, 0.8, 0.6, 0.5, 0.6, 0.5, 0.7, 0.5, 0.5, 0.6, 0.5, 0.8, 0.7, 0.6, 0.8, 0.8, 0.6, 0.6],
        "Tom Yum Soup": [0.5, 0.5, 0.8, 0.6, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6, 0.0, 0.6, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5],
        "Massaman Curry": [0.4, 0.5, 0.7, 0.6, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5],
        "Spring Rolls": [0.3, 0.5, 0.6, 0.7, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5]
    },
    "French": {
        "Bistro": [0.6, 0.5, 0.2, 0.8, 0.3, 0.4, 0.5, 0.7, 0.5, 0.8, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.7, 0.0, 0.2, 0.5, 0.2, 0.5, 0.3, 0.3, 0.4, 0.6, 0.5, 0.5, 0.8, 0.4, 0.9, 0.7, 0.4, 0.9, 0.8, 0.6, 0.9],
        "Coq au Vin": [0.3, 0.5, 0.2, 0.9, 0.2, 0.3, 0.4, 0.7, 0.6, 0.6, 0.5, 0.7, 0.7, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.3, 0.6, 0.2, 0.5, 0.5, 0.4, 0.5, 0.6, 0.5, 0.5, 0.8, 0.4, 0.9, 0.8, 0.4, 0.8, 0.7, 0.6, 0.8],
        "Ratatouille": [0.5, 0.5, 0.3, 0.8, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5],
        "Bouillabaisse": [0.4, 0.5, 0.4, 0.6, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5],
        "Crêpes": [0.3, 0.5, 0.5, 0.6, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5]
    },
    "Ethiopian": {
        "Injera with Wat": [0.2, 0.4, 0.7, 0.8, 0.3, 0.6, 0.4, 0.6, 0.7, 0.4, 0.5, 0.6, 0.8, 0.0, 0.0, 0.7, 0.7, 0.8, 1.0, 0.8, 0.7, 0.7, 0.7, 0.4, 0.5, 0.5, 0.7, 0.8, 0.5, 0.6, 0.6, 0.7, 0.8, 0.5, 0.7, 0.7, 0.7, 0.6],
        "Tibs": [0.2, 0.5, 0.6, 0.9, 0.2, 0.4, 0.5, 0.5, 0.7, 0.3, 0.7, 0.6, 0.5, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.7, 0.6, 0.6, 0.6, 0.6, 0.5, 0.6, 0.7, 0.8, 0.5, 0.6, 0.5, 0.8, 0.7, 0.5, 0.7, 0.8, 0.6, 0.6],
        "Doro Wat": [0.3, 0.5, 0.6, 0.8, 0.2, 0.4, 0.5, 0.5, 0.7, 0.3, 0.7, 0.6, 0.5, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.7, 0.6, 0.6, 0.6, 0.6, 0.5, 0.6, 0.7, 0.8, 0.5, 0.6, 0.5, 0.8, 0.7, 0.5, 0.7, 0.8, 0.6, 0.6],
        "Kitfo": [0.4, 0.5, 0.7, 0.6, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5],
        "Shiro": [0.3, 0.5, 0.6, 0.7, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5]
    },
    "Peruvian": {
        "Ceviche": [0.2, 0.4, 0.6, 0.7, 0.1, 0.9, 0.5, 0.6, 0.6, 0.3, 0.6, 0.8, 0.5, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.6, 0.8, 0.6, 0.8, 0.8, 0.8, 0.8, 0.8, 0.5, 0.5, 0.7, 0.5, 0.9, 0.6, 0.7, 0.8, 0.7, 0.6, 0.7],
        "Lomo Saltado": [0.3, 0.6, 0.5, 0.8, 0.2, 0.4, 0.6, 0.5, 0.7, 0.4, 0.7, 0.7, 0.5, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.6, 0.6, 0.5, 0.6, 0.4, 0.5, 0.6, 0.6, 0.5, 0.5, 0.6, 0.6, 0.8, 0.8, 0.7, 0.7, 0.8, 0.6, 0.6],
        "Aji de Gallina": [0.4, 0.5, 0.6, 0.7, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5],
        "Pollo a la Brasa": [0.5, 0.5, 0.7, 0.6, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5],
        "Tacu Tacu": [0.3, 0.5, 0.6, 0.7, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5]
    },
    "Korean": {
        "Bibimbap": [0.3, 0.5, 0.7, 0.8, 0.3, 0.5, 0.7, 0.5, 0.6, 0.4, 0.6, 0.6, 0.5, 0.0, 0.0, 0.0, 0.7, 0.8, 1.0, 0.7, 0.8, 0.7, 0.7, 0.5, 0.6, 0.6, 0.7, 0.5, 0.5, 0.6, 0.6, 0.9, 0.8, 0.7, 0.8, 0.7, 0.7, 0.5],
        "Korean BBQ": [0.3, 0.6, 0.6, 0.9, 0.2, 0.4, 0.6, 0.4, 0.8, 0.3, 0.8, 0.7, 0.4, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.5, 0.5, 0.6, 0.5, 0.6, 0.4, 0.5, 0.5, 0.5, 0.5, 0.7, 0.7, 0.8, 0.7, 0.6, 0.8, 0.7, 0.6, 0.7],
        "Kimchi": [0.4, 0.5, 0.6, 0.7, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5],
        "Japchae": [0.5, 0.5, 0.7, 0.6, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5],
        "Sundubu Jjigae": [0.3, 0.5, 0.6, 0.7, 0.2, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.4, 0.0, 0.0, 0.0, 0.6, 0.5, 0.5, 0.5, 0.7, 0.8, 0.5, 0.5, 0.8, 0.7, 0.5, 0.5, 0.5]
    },
    # Additional cuisines can be added here following the same structure
}

# Function to update cuisine profiles based on taste profiles
def update_cuisine_profiles(taste_profile: TasteProfile) -> None:
    for cuisine, dishes in HYPER_SPECIFIC_CUISINES.items():
        for dish_name, scores in dishes.items():
            taste_profile.cuisine_profiles[cuisine] = {
                "name": dish_name,
                "scores": { 
                    "sweet": scores[0],
                    "salty": scores[1],
                    "spicy": scores[2],
                    "savory": scores[3],
                    "bitter": scores[4],
                    "sour": scores[5],
                    "crunchiness": scores[6],
                    "smoothness": scores[7],
                    "chewiness": scores[8],
                    "creaminess": scores[9],
                    "firmness": scores[10],
                    "juiciness": scores[11],
                    "softness": scores[12],
                    "gluten_free": scores[13],
                    "dairy_free": scores[14],
                    "vegan": scores[15],
                    "vegetarian": scores[16],
                    "nut_free": scores[17],
                    "shellfish_free": scores[18],
                    "price_sensitivity": scores[19],
                    "health_consciousness": scores[20],
                    "spice_tolerance": scores[21],
                    "sustainability_consciousness": scores[22],
                    "low_carb": scores[23],
                    "low_fat": scores[24],
                    "low_sugar": scores[25],
                    "organic": scores[26],
                    "halal": scores[27],
                    "kosher": scores[28],
                    "lighting_quality": scores[29],
                    "noise_level": scores[30],
                    "seating_comfort": scores[31],
                    "plating_aesthetics": scores[32],
                    "portion_size": scores[33],
                    "service_speed": scores[34],
                    "cleanliness": scores[35],
                    "temperature": scores[36],
                    "accessibility": scores[37],
                    "friendly_staff": scores[38],
                    "family_friendly": scores[39],
                    "romantic_ambiance": scores[40],
                }
            }

# Update the taste profile with hyper-specific cuisines
update_cuisine_profiles(taste_profile)

# Finding similar dishes
similar_dishes = find_similar_dishes(taste_profile)
print("Top 10 Similar Dishes:")
for dish, score in similar_dishes:
    print(f"{dish} - Similarity Score: {score:.2f}")

Current Taste Profile:
Sweet: 0.08
Salty: 0.67
Spicy: 0.75
Savory: 0.00
Bitter: 0.00
Sour: 0.54
Crunchiness: 0.53
Smoothness: 0.90
Chewiness: 0.10
Creaminess: 0.91
Firmness: 0.50
Juiciness: 0.66
Softness: 0.71
Gluten_free: 0.33
Dairy_free: 0.14
Vegan: 0.33
Vegetarian: 0.06
Nut_free: 0.00
Shellfish_free: 0.00
Price_sensitivity: 0.11
Health_consciousness: 0.00
Spice_tolerance: 0.66
Sustainability_consciousness: 0.09
Low_carb: 0.00
Low_fat: 0.00
Low_sugar: 0.00
Organic: 0.10
Halal: 0.00
Kosher: 0.10
Lighting_quality: 0.05
Noise_level: 0.05
Seating_comfort: 0.00
Plating_aesthetics: 0.55
Portion_size: 0.50
Service_speed: 0.05
Cleanliness: 0.00
Temperature: 0.53
Accessibility: 0.50
Friendly_staff: 0.71
Family_friendly: 0.56
Romantic_ambiance: 0.57
Top 10 Recommended Restaurants:


IndexError: list index out of range

In [None]:
class YelpAnalyzer:
    def __init__(self, file_path: str):
        self.file_path = file_path
        self.data = None 
        self.clusters = None
        self.cuisine_profiles = {}  # Maps cuisine types to typical taste profiles
        self.neighborhood_profiles = {}  # Maps neighborhoods to aggregate taste profiles
        
    def run_analysis(self, metro_area: str):
        """Main function to run the full analysis pipeline
        
        TODO:
        1. Implement proper error handling and logging
        2. Add progress tracking
        3. Consider parallel processing for large datasets
        4. Add caching of intermediate results
        """
        # Load and preprocess data
        self.data = load_yelp_data(self.file_path)
        self.data = filter_metropolitan_area(self.data, metro_area)
        self.data = clean_data(self.data)
        
        # Run exploratory analysis
        exploratory_data_analysis(self.data)
        
        # Extract and analyze taste profiles
        taste_profiles = extract_taste_profiles(self.data)
        self.cuisine_profiles = analyze_cuisine_profiles(taste_profiles, self.data)
        
        # Perform clustering and analysis
        self.clusters = cluster_restaurants(self.data, taste_profiles)
        self.neighborhood_profiles = analyze_neighborhood_profiles(self.clusters)
        
        # Validate results
        known_boundaries = self._load_neighborhood_boundaries()
        validation_results = validate_clusters(self.clusters, known_boundaries)
        
        return {
            'clusters': self.clusters,
            'cuisine_profiles': self.cuisine_profiles,
            'neighborhood_profiles': self.neighborhood_profiles,
            'validation': validation_results
        }
        
    def recommend_cuisines(self, location: tuple):
        """Recommend cuisine types for a given location
        
        TODO:
        1. Implement location-based profile matching
        2. Add confidence scores to recommendations
        3. Consider market saturation in recommendations
        4. Add business viability metrics
        """
        pass
        
    def _load_neighborhood_boundaries(self):
        """Load geographic boundary data for neighborhoods
        
        TODO:
        1. Implement GeoJSON/shapefile parsing
        2. Add caching of boundary data
        3. Consider multiple data sources
        """
        pass

def load_yelp_data(file_path: str):
    """Load and validate Yelp dataset
    
    TODO:
    1. Implement efficient data loading for large files
    2. Add data validation checks
    3. Handle multiple file formats
    4. Add data sampling options
    """
    pass

def filter_metropolitan_area(data, area: str):
    """Filter dataset for specific metro area
    
    TODO:
    1. Implement geographic boundary checking
    2. Add support for multiple areas
    3. Consider demographic data integration
    """
    pass

def clean_data(data):
    """Clean and preprocess dataset
    
    TODO:
    1. Implement text cleaning
    2. Handle missing values
    3. Remove duplicates
    4. Normalize formats
    """
    pass

def exploratory_data_analysis(data):
    """Perform EDA on dataset
    
    TODO:
    1. Generate basic statistics
    2. Create visualizations
    3. Identify outliers
    4. Analyze data distributions
    """
    pass

def extract_taste_profiles(data):
    """Extract taste profiles from reviews
    
    TODO:
    1. Implement NLP pipeline
    2. Add sentiment analysis
    3. Consider review weights
    4. Handle multilingual content
    """
    profiles = []
    for review in data['reviews']:
        profile = TasteProfile()
        profile.update_scores(review)
        profiles.append(profile)
    return profiles

def analyze_cuisine_profiles(profiles, data):
    """Analyze typical taste profiles for different cuisines
    
    TODO:
    1. Implement cuisine categorization
    2. Calculate aggregate profiles
    3. Identify distinctive features
    4. Consider regional variations
    """
    pass

def cluster_restaurants(data, profiles):
    """Cluster restaurants by taste and location
    
    TODO:
    1. Implement clustering algorithm
    2. Optimize parameters
    3. Handle geographic constraints
    4. Add cluster evaluation metrics
    """
    pass

def analyze_neighborhood_profiles(clusters):
    """Analyze taste profiles by neighborhood
    
    TODO:
    1. Implement profile aggregation
    2. Consider temporal trends
    3. Add demographic analysis
    4. Identify unique characteristics
    """
    pass

def validate_clusters(clusters, known_boundaries):
    """Validate clustering results
    
    TODO:
    1. Implement validation metrics
    2. Compare with external data
    3. Add statistical tests
    4. Generate validation reports
    """
    pass
