<a href="https://colab.research.google.com/github/itzdineshx/cognifyz_internship/blob/main/level2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **COGNIFYZ DATA SCIENCE INTERNSHIP**

## **LEVEL 2**

## About the Level

Level 2 of the Cognifyz Data Science Internship focuses on the following engaging tasks:

1. **Table Booking and Online Delivery**
2. **Price Range Analysis**
3. **Feature Engineering**

### Task 1: Table Booking and Online Delivery

- Determine the percentage of restaurants that offer table booking and online delivery, adding a new layer of business insight to the dataset.
- Compare the average ratings of restaurants with table booking and those without to uncover hidden customer preferences.
- Analyse the availability of online delivery among restaurants with different price ranges, which could reveal how pricing strategies impact delivery services.

### Task 2: Price Range Analysis

- Determine the most common price range among all the restaurants to understand general market positioning.
- Calculate the average rating for each price range, shedding light on how price affects perceived quality.
- Identify the colour that represents the highest average rating among different price ranges, utilizing visualization techniques for better clarity.

### Task 3: Feature Engineering

- Extract additional features from existing columns, such as the length of the restaurant name or address, adding unique perspectives to the analysis.
- Create new features like "Has Table Booking" or "Has Online Delivery" by encoding categorical variables, enhancing the dataset's predictive power.


# **Task 1: Table Booking and Online Delivery**

In [None]:
#importing all the necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [None]:
#accessing the data
df = pd.read_csv("/content/Dataset .csv")
df.head()

In [None]:
#checking for null values
df.isnull().sum()

In [None]:
df.describe()

In [None]:
#accessing the labels
df.columns

In [None]:
#filling the missing values
rest_data = df['Cuisines'].fillna('Unknown', inplace=True)

In [None]:
#re-checking for null values
df.isnull().sum()

In [None]:
table_booking = df['Has Table booking'].value_counts()
table_booking

In [None]:
table_booking_yes = df[df['Has Table booking']=='Yes'].value_counts()
table_booking_yes

In [None]:
table_booking_No = df[df['Has Table booking']=='No']
table_booking_No.value_counts()

In [None]:
table_booking_yes_perc = (len(df[df['Has Table booking']=='Yes']) / len(df)) * 100
print(f"table booking yes percentage: {table_booking_yes_perc:.2f}%")


In [None]:
table_booking_No_perc = (len(df[df['Has Table booking']=='No']) / len(df)) * 100
print(f"table booking No percentage : {table_booking_No_perc:.2f}%")

In [None]:
# Determine the percentage of restaurants that offer table booking
table_booking_percentage = df['Has Table booking'].value_counts(normalize=True) * 100
print(f"Percentage of restaurants offering table booking:")
print(table_booking_percentage)


In [None]:
#Create a pie chart for 'Has Table booking'
table_booking_counts = df['Has Table booking'].value_counts()
plt.figure(figsize=(8, 6))
plt.pie(table_booking_counts, labels=table_booking_counts.index, autopct='%1.1f%%', startangle=90)
plt.title('Percentage of Restaurants with Table Booking')
plt.show()


# **Task 2: Price Range Analysis**

In [None]:
# Determine the percentage of restaurants that offer online delivery
online_delivery_percentage = df['Has Online delivery'].value_counts(normalize=True) * 100
print(f"Percentage of restaurants offering online delivery:")
print(online_delivery_percentage)

In [None]:
#Create a pie chart for 'Has Online delivery'
table_booking_counts = df['Has Online delivery'].value_counts()
plt.figure(figsize=(8, 6))
plt.pie(table_booking_counts, labels=table_booking_counts.index, autopct='%1.1f%%', startangle=90)
plt.title('Percentage of Restaurants with Online delivery')
plt.show()


Compare the average ratings of restaurants
with table booking and those without

In [None]:
# Compare the average ratings of restaurants with table booking and those without
average_rating_with_table_booking = df[df['Has Table booking'] == 'Yes']['Aggregate rating'].mean()

average_rating_without_table_booking = df[df['Has Table booking'] == 'No']['Aggregate rating'].mean()

In [None]:
print(f"Average rating of restaurants with table booking: {average_rating_with_table_booking:.2f}%")

print(f"Average rating of restaurants without table booking: {average_rating_without_table_booking:.2f}%")


In [None]:
# availability of online delivery among restaurants
online_delivery_by_price_range = df.groupby('Price range')['Has Online delivery'].value_counts(normalize=True).unstack() * 100

print("Percentage of restaurants offering online delivery by price range in %:")
print(online_delivery_by_price_range)

In [None]:
# Plot the availability of online delivery by price range
online_delivery_by_price_range.plot(kind='bar')
plt.xlabel('Price Range')
plt.ylabel('Percentage of Restaurants')
plt.title('Percentage of Restaurants Offering Online Delivery by Price Range')
plt.show()

Determine the most common price range
among all the restaurants.

In [None]:
most_common_price_range = df['Price range'].mode().iloc[0]
print(f"The most common price range among all restaurants is: {most_common_price_range}")


Calculate the average rating for each price
range.


In [None]:
# Calculating the average rating for each price range.
average_rating_by_price_range = df.groupby('Price range')['Votes'].mean()
print("Average Rating by Price Range:")
print(average_rating_by_price_range)

Identify the color that represents the highest
average rating among different price ranges.

In [None]:
# represents the highest average rating
highest_ratings = average_rating_by_price_range.idxmax()
print(f"Color representing the highest average rating: {highest_ratings}")

In [None]:
# Plotting the average rating by price range
average_rating_by_price_range.plot(kind='bar')
plt.xlabel('Price range')
plt.ylabel('Aggregate rating')
plt.title('Average Rating by Price Range')
plt.show()

# **Task 3: Feature Engineering**

In [None]:
# Create a new column for the length of the restaurant name
df['Restaurant Name Length'] = df['Restaurant Name'].str.len()

In [None]:
# Create a new column for the length of the restaurant address
df['Restaurant Address Length'] = df['Address'].str.len()

In [None]:
# Display the updated DataFrame with the new columns
print(df[['Restaurant Name', 'Restaurant Name Length', 'Address', 'Restaurant Address Length']].head())

In [None]:
df.columns

In [None]:
# Create new features using one-hot encoding for 'Has Table booking' and 'Has Online delivery'
df = pd.get_dummies(df, columns=['Has Table booking', 'Has Online delivery'], prefix=['TableBooking', 'OnlineDelivery'])

# Display the updated DataFrame with the new features
print(df.head())


In [None]:
df['Rating color'].values

In [None]:
# Creating a pie chart for 'Rating color'
rating_color_counts = df['Rating color'].value_counts()
plt.figure(figsize=(8, 6))
plt.pie(rating_color_counts, labels=rating_color_counts.index, autopct='%1.1f%%', startangle=90)
plt.title('Percentage of Restaurants by Rating Color')
plt.show()


#**RESULTS**

### Task 1: Table Booking and Online Delivery

The percentage of restaurants that offer table booking is **12.12%**, while **25.66%** of restaurants provide online delivery services. Clearly, restaurants offering online delivery have a higher adoption rate compared to those offering table booking. Interestingly, restaurants with table booking tend to have a **higher average rating (3.44)** than those without (**2.56**), suggesting that offering table booking may enhance customer satisfaction.

Moreover, the availability of online delivery is notably higher among restaurants in the medium price range, compared to those with low and high prices. This insight can be critical for businesses deciding whether to offer delivery services based on their price range.

Below is a bar plot to visually represent the data (visualization not included in the text version).

### Task 2: Price Range Analysis

The most common price range among all the restaurants is **1**. However, price range **4** boasts the highest average rating of **3.82**, followed by price range **3** with an average rating of **3.68**. Price range **2** has an average rating of **2.94**, and price range **1** has the lowest average rating of **2.00**.

A bar plot further visualizes these ratings, with the highest average rating indicated in red (visualization not included in the text version).

### Task 3: Feature Engineering

In this task, I created two new columns—**“Restaurant Name Length”** and **“Address Length”**—based on the character count of restaurant names and addresses, respectively. These new features can offer deeper insights into customer perception and operational complexity.

I also encoded the columns **"Has Table Booking"** and **"Has Online Delivery"** with binary values, assigning **“1”** for **“Yes”** and **“0”** for **“No”**, to streamline analysis for predictive modeling.


#**Conclusion**

This level of the project emphasized the significance of leveraging **advanced data science techniques** to optimize analysis. The price range analysis uncovered insights into both the common price range and the one with the highest average rating, identifying potential revenue-maximizing opportunities while maintaining competitive pricing strategies.

Additionally, the implementation of **feature engineering** enriched the dataset with valuable predictors, enhancing both the performance and interpretability of predictive models that will be developed. These enhancements aim to significantly improve business decision-making.
