# **Exploratory Data Analysis with Ramen Ratings**

**author :- komal gupta**

**Problem statement**
* Everyone loves instant noodles for its convenience, low cost and taste. But what makes instant noodles great? Are there particular factors that we can identify that make one brand of instant noodles conclusively better than another? More importantly, can we use machine learning to predict which instant noodles will taste the best, even without tasting it?

**Content**
* Columns in this dataset include Brand, Variety (the product name), Country, and Style (whether it is packaged in a cup, bowl or tray). Stars indicate the ramen quality, as assessed by the reviewer, on a 5-point scale

**you can find the datasets on the kaggle website**

# **Aim of the project**
* With globalization and easy way of shipping across borders, consumer choice has increased. Given the many number of instant ramen options available, the aim of the project is to understand the highest and the lowest variety and brand of the ramen according to the reviews

**importing library**
* pandas
* numpy
* matplotlib
* seaborn

In [None]:
# importing numpy be used as an efficient multi-dimensional container of generic data.
import numpy as np 

#importing pandas it allows you to perform data manipulation create, manipulate and wrangle the data in python.
import pandas as pd 

# Import Libraries for plotting 

#Matplotlib is a Python 2D plotting library
import matplotlib.pyplot as plt 

# Seaborn is high-level interface for drawing attractive and informative statistical graphics.
import seaborn as sns 

In [None]:
# adding the datasets of ramen-ratings to the notebook
df = pd.read_csv("../input/ramen-ratings/ramen-ratings.csv")

In [None]:
# checking the first five rows
df.head()

**shape**
* checking how many rows and coloumns are present in the dataset
* shape[0] is rows
* shape[1] is columns

In [None]:
print("There are {} rows and {} columns are present in the Data Set".format(df.shape[0],df.shape[1]))

**so as we can see we have 2580 rows and 7 columns present in the datasets**

**info()**
* to know the information of the datasets

In [None]:
df.info()

In [None]:
df.dtypes

In [None]:
df.describe(include = 'all')

**the countries of Asia and USA have the most ramen ratings. In the first plot, we plot all countries, but that looks very messy. With the second plot we can just look at the top 10, to get a better plot.**

In [None]:
df['Stars'].value_counts()

in the above output we have some columns have unrated so we will replace it with NaN value

In [None]:
df = df.replace({'Stars':{'Unrated':np.nan}})
df['Stars'].value_counts()

In [None]:
df['Stars'] = df.Stars.astype('float')

In [None]:
#With this code we look at the number of ratings by countries and order them.
df["Country"].value_counts().plot.bar()

In [None]:
df["Country"].value_counts().head(10).plot.bar()

In [None]:
df["Variety"].value_counts().head(10).plot.bar()

**Different styles of Ramen**
* With this plot we see that most of the ramen style are of the instant pack variety, with bowl and cup in second. Box, can and bar are nearly existent.

In [None]:
df["Style"].value_counts().plot.bar()

In [None]:
df2 = df.groupby('Brand').mean()
df2.sort_values('Stars', ascending = False).head(10)

In [None]:
df2 = df.groupby('Brand').mean()
df2.sort_values('Stars', ascending = False).tail(5)

In [None]:
df3 = df.groupby('Variety').mean()
df3.sort_values('Review #', ascending = False).head(10)

In [None]:
df3 = df.groupby('Variety').mean()
df3.sort_values('Review #', ascending = False).tail(5)

In [None]:
d5 = df2.corr()
sns.heatmap(d5, annot = True)

# conclusion
* the highest rated brand of ramen is kimura and lowest rated is US canning 
* the highest rated variety of ramen is T's Restaurant Tantanmen and lowest rated is Tom yum chilli flavour