## ** What is the average cost for dining in Bangalore restaurants? Does the median cost differ significantly from the mean?**

In [3]:
import pandas as pd

df = pd.read_csv("C:\\Projects\\DataAnalysisProjects\\ZomatoRestaurantDataAnalysis\\CleanedData\\cleaned_data.csv")

mean_cost = df["cost"].mean()
median_cost = df["cost"].median()

print(f"Mean Cost: {mean_cost}")
print(f"Median Cost: {median_cost}")

if mean_cost > median_cost:
    print("The average cost is higher than the middle value, meaning there are some expensive restaurants raising the average.")
elif mean_cost < median_cost:
    print("The average cost is lower than the middle value, meaning there are more affordable restaurants pulling the average down.")
else:
    print("The average and middle cost are the same, meaning restaurant prices are evenly distributed.")


Mean Cost: 625.1171910330124
Median Cost: 500.0
The average cost is higher than the middle value, meaning there are some expensive restaurants raising the average.


## ** Which restaurant cost is the most common among customers?**

In [13]:
import statistics

most_common_cost = statistics.mode(df['cost'])
common_restaurants = df[df['cost'] == most_common_cost]['restaurants'].unique()[:10]  # Limit to 10 restaurants

print(f"The most common restaurant cost among customers is {most_common_cost}.")
print("some of the 10 restaurants with this cost:")
for restaurant in common_restaurants:
    print(restaurant)

The most common restaurant cost among customers is 400.
some of the 10 restaurants with this cost:
360 Atoms Restaurant And Cafe
The Vintage Cafe
Fast And Fresh
Hotboxit
Corner House Ice Cream
XO Belgian Waffle
Kabab Magic
Frozen Bottle
Polar Bear
Floured-Baked With Love


## **What is the cheapest and most expensive restaurant cost in the dataset?**

In [15]:
min_cost = df['cost'].min()
max_cost = df['cost'].max()

cheapest_restaurants = df[df['cost'] == min_cost]['restaurants'].unique()[:10]  # Limiting to 10 names
expensive_restaurants = df[df['cost'] == max_cost]['restaurants'].unique()[:10]  # Limiting to 10 names

print(f"The cheapest restaurant cost is {min_cost}.")
print("Restaurants with the cheapest cost:")
for restaurant in cheapest_restaurants:
    print(restaurant)

print(f"\nThe most expensive restaurant cost is {max_cost}.")
print("Restaurants with the most expensive cost:")
for restaurant in expensive_restaurants:
    print(restaurant)


The cheapest restaurant cost is 40.
Restaurants with the cheapest cost:
Srinidhi Sagar Food Line
Srinidhi Sagar
Srinidhi Sagar Deluxe

The most expensive restaurant cost is 6000.
Restaurants with the most expensive cost:
Le Cirque Signature - The Leela Palace


## **1. Does higher cost of dining lead to restaurant ratings?**

In [11]:
import pandas as pd
from scipy.stats import ttest_ind

df=pd.read_csv("C:\\Projects\\DataAnalysisProjects\\ZomatoRestaurantDataAnalysis\\CleanedData\\cleaned_data.csv")
median_cost=df['cost'].median()
high_cost=df[df['cost']>median_cost]['rating']
low_cost=df[df['cost']<=median_cost]['rating']

t_stat,p_value=ttest_ind(high_cost,low_cost,equal_var=False)

print(f"T-Statistics: {t_stat:.4f}")
print(f"P-Value: {p_value:.4f}")

if p_value<0.05:
    print("There is a significant relationship between dining cost and restaurant rating.")
else:
    print("There is no significant relationship between dining cost and restaurant rating.")
    

T-Statistics: 47.6845
P-Value: 0.0000
There is a significant relationship between dining cost and restaurant rating.


# Data Analysis Report: Relationship Between Dining Cost and Restaurant Ratinggs.

### Data Source:
The data used in this analysis comes from a cleaned Zomato restaurant dataset. Key columns:
- **Cost**: The cost of dining at the restaurant.
- **Rating**: The rating given to the restaurant.

### Methodology:
1. **Data Preprocessing**:  
   - The dataset was cleaned to remove any missing or invalid values.
   - We split restaurants into two groups based on the median cost:
     - **High-Cost Restaurants**: Restaurants with a cost higher than the median.
     - **Low-Cost Restaurants**: Restaurants with a cost less than or equal to the median.

2. **Statistical Test**:  
   A **two-sample t-test** was performed to check if thereâ€™s a significant difference between the ratings of high-cost and low-cost restaurants. We used `equal_var=False` to account for unequal variances between the groups.

### Results:
- **T-Statistic**: 47.6845
- **P-Value**: 0.0000

Since the p-value is much smaller than 0.05, we reject the null hypothesis, which means there is a significant difference in ratings between high-cost and low-cost restaurants.

### Conclusion:
There **is a significant relationship** between dining cost and restaurant rating. High-cost restaurants tend to have higher ratings compared to low-cost restaurants.

### Implications:
- **For restaurant owners**: Increasing dining cost may lead to higher ratings, though other factors (like service and food quality) matter too.
- **For consumers**: Higher-cost restaurants may have higher ratings, but it's important to consider other factors when choosing a restaurant.
- **For marketers**: Emphasize the value of dining experiences in higher-cost restaurants to attract customers.

## **2. Which cuisines receive the highest and lowest rating?**

In [26]:
from scipy.stats import f_oneway

grouped_cuisines=[df[df['cuisines']==cuisine]['rating'] for cuisine in df['cuisines'].unique()]

f_stat,p_value=f_oneway(*grouped_cuisines)

cuisine_ratings=df.groupby('cuisines')['rating'].mean()
highest_rated_cuisine=cuisine_ratings.idxmax()
lowest_rated_cuisine=cuisine_ratings.idxmin()

print(f"T-statistics: {f_stat:.4f}")
print(f"P-Value: {p_value:.4f}")

if p_value < 0.05:
    print("There is a significant difference in ratings between different cuisines.")
else:
    print("There is no significant difference in ratings between different cuisines.")

print(f"Highest Rated Cuisine: {highest_rated_cuisine} ({cuisine_ratings.max():.2f})")
print(f"Lowest Rated Cuisine: {lowest_rated_cuisine} ({cuisine_ratings.min():.2f})")


T-statistics: 5.7339
P-Value: 0.0000
There is a significant difference in ratings between different cuisines.
Highest Rated Cuisine: Asian, Chinese, Thai, Momos (4.90)
Lowest Rated Cuisine: American, Cafe, Chinese, Italian, Desserts (0.00)


# Data Analysis Report: Which Cuisines receives highest and lowest ratings


### 1. Methodology
- An **ANOVA test** was performed to check if cuisine types significantly impact restaurant ratings.
- The **mean rating** for each cuisine was calculated.
- The cuisines with the **highest** and **lowest** ratings were identified.

### 2. Results
#### ANOVA Test Results
- **T-Statistic:** 5.7339  
- **P-Value:** 0.0000  
- **Conclusion:** Since the p-value is **less than 0.05**, we conclude that there is a **significant difference** in ratings between different cuisines.

### 3.Highest and Lowest Rated Cuisines
- **Highest Rated Cuisine:** Asian, Chinese, Thai, Momos (**4.90**)  
- **Lowest Rated Cuisine:** American, Cafe, Chinese, Italian, Desserts (**0.00**)

### 4. Conclusion
- The results confirm that **cuisine type influences restaurant ratings**.
- Customers rate **Asian, Chinese, Thai, and Momos cuisine the highest**.
- **American, Cafe, Chinese, Italian, and Desserts cuisine received the lowest rating (0.00)**, indicating customer dissatisfaction.

### 5. Recommendations
- Restaurants serving **high-rated cuisines** can **leverage this preference** to attract more customers.
- Businesses offering **low-rated cuisines** should investigate **customer feedback** to improve their ratings.
- Future analysis could include **a post-hoc test** to determine specific differences between cuisines.


## **3. Are expensive restaurants more engaging(getting more votes)?**

In [35]:
from scipy.stats import pearsonr

correlation,p_value=pearsonr(df['cost'],df['votes'])

print(f"Correlation Coefficient: {correlation:.4f}")
print(f"P-Value: {p_value:.4f}")

if p_value<0.05:
    print("There is a significant correlation between restaurant cost and engagement (votes).")
else:
    print("There is no significant correlation between restaurant cost and engagement (votes).")
    

Correlation Coefficient: 0.3799
P-Value: 0.0000
There is a significant correlation between restaurant cost and engagement (votes).


## Analysis of Restaurant Cost and Engagement (Votes)

### 1. **Objective**
To determine if there is a significant correlation between the **cost of a restaurant** and the **number of votevotesunt)** it receives.

### 2. **Methodology**
- A **Pearson correlation test** was performed to examine the relationship between:
  - The **cost** of the restaurant (`cost`)
  - The **number of vovotesng_count`)
- The correlation coefficient and p-value were computed to assess the strength and statistical significance of the relationship.

### 3. **Results**
#### **Pearson Correlation Test Results**
- **Correlation Coefficient:** 0.3799  
- **P-Value:** 0.0000  
- **Conclusion:** Since the p-value is **less than 0.05**, we conclude that there is a **significant correlation** between the cost of a restaurant and the number of votes it receives.

### 4. **Interpretation**
- The **positive correlation** of 0.3799 indicates that as the **cost** of a restaurant increases, the **number of votes** it receives tends to increase as well, though the relationship is **moderate**.
- This suggests that **expensive restaurants** are likely to receive **more votes**, but the relationship is not very strong.

### 5. **Conclusion**
- There is a **statistically significant moderate positive correlation** between restaurant cost and the number of votes.
- **Higher-cost restaurants** tend to receive **more votes**, but the correlation is not strong enough to say that cost alone influences the number of votes.

### 6. **Recommendations**
- **Expensive restaurants** may benefit from receiving more votes due to their higher cost, but **other factors** (such as quality, service, and reputation) likely also play a role in generating customer engagement.
- Future analyses could explore how **restaurant quality**, **service reviews**, and **location** influence engagement (votes) and better understand what drives higher ratings.