# Introduction

> * How many calories does the average McDonald's value meal contain?
> * How much do beverages, like soda or coffee, contribute to the overall caloric intake? 
> * Does ordered grilled chicken instead of crispy increase a sandwich's nutritional value? 
> * What about ordering egg whites instead of whole eggs? 
> * What is the least number of items could you order from the menu to meet one day's nutritional requirements?

<font color=red>
Content:
    
 <font color=blue>   
 1. [Load and check data](#1)
 1. [Variables description](#2)
      * [Univariate Variable Analysis](#3)
        * [Categorical Variable Analysis](#4)
        * [Numerical Variable Analysis](#5)
 1. [Basic Data Analysis](#6)
 1. [Outlier Detection](#7)
 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt #for visualizing
plt.style.use('seaborn-whitegrid') #style with grid
import seaborn as sns
from collections import Counter
import warnings
warnings.filterwarnings("ignore") #close warnings
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

<a id="1"></a>
## **Load and check data**

In [None]:
data=pd.read_csv('/kaggle/input/nutrition-facts/menu.csv')
#show first 5 data
data.head()
#show from sixth column to end column all records
data.iloc[:,6:]

<a id="2"></a>
## Variable description

1. Category: category of menus
    * Coffee & Tea          95
    * Breakfast             42
    * Smoothies & Shakes    28
    * Beverages             27
    * Chicken & Fish        27
    * Beef & Pork           15
    * Snacks & Sides        13
    * Desserts               7
    * Salads                 6
1. **Item:** Products available on mc donalds (Total: 260 items)
1. **Serving Size:** portion of the product
1. **Calories:** calories of the item
1. **Calories from Fat:** fat in calories of the item	
1. **Total Fat:** total fat in item
1. **Total Fat(%Daily Value):** the per. of total fat to be taken daily
1. **Saturated Fat:** saturated fat in item
1. **Saturated Fat(%DailyValue):** the per. of s. fat to be taken daily
1. **Trans Fat:** trans fat in item
1. **Cholesterol:** cholesterol
1. **Cholesterol(%DailyValue):** the per. of cho. to be taken daily
1. **Sodium:** sodium in item
1. **Sodium (% Daily Value):** the per. of sodium to be taken daily
1. **Carbohydrates:** carbohydrates in item
1. **Carbohydrates(%DailyValue):** the per. of carb. to be taken daily
1. **Dietary Fiber:** dietary fiber in item
1. **Dietary Fiber(%DailyValue):** the per. of d.fiber to be taken daily
1. **Sugars:** sugar in item
1. **Protein:** protein in item
1. **Vitamin A (% Daily Value):** the per. of vit A to be taken daily
1. **Vitamin C (% Daily Value):** the per. of vit C to be taken daily
1. **Calcium (% Daily Value):** the per. of calcium to be taken daily
1. **Iron (% Daily Value):** the per. of iron to be taken daily

**DATA TYPES**

*float64(3):*    
    * Total Fat  * Saturated Fat   * Trans Fat 
*int64(18):*
    * Calories                       * Calories from Fat   
    * Total Fat(%Daily Value)        * Saturated Fat(%Daily Value) 
    * Cholesterol                    * Cholesterol(%Daily Value)   
    * Sodium                         * Sodium(%Daily Value)    
    * Carbohydrates                  * Carbohydrates(%Daily Value) 
    * Dietary Fiber                  * Dietary Fiber(%Daily Value) 
    * Sugars                         * Protein
    * Vitamin A(%Daily Value)        * Vitamin C(%Daily Value)     
    * Calcium (%Daily Value)         * Iron(%Daily Value)
*object(3):*
    * Category         * Item       * Serving Size

In [None]:
data.info()

<a id="3"></a>
## Univariate Variable Analysis
  * Categorical Variable Analysis: Category, Item, Serving Size
  * Numerical Variable Analysis:Calories, Calories from Fat, Total Fat, Total Fat(%Daily Value), Saturated Fat, Saturated Fat(%Daily Value), Trans Fat, Cholesterol, Cholesterol(%Daily Value), Sodium, Sodium(%DailyValue), Carbohydrates, Carbohydrates(%DailyValue), Dietary Fiber, Dietary Fiber(%Daily Value), Sugars, Protein, Vitamin A(% Daily Value), Vitamin C(%Daily Value), Calcium(%Daily Value),Iron(%Daily Value)

<a id="4"></a>
### Categorical Variable Analysis

In [None]:
def bar_plot(variable):
    """
        input: variable ex:"Item"
        output: bar plot & value count
    """
    var=data[variable]
    #count number of categorical variable
    varValue=var.value_counts()
    #visualizing
    plt.figure(figsize=(13,3))
    plt.bar(varValue.index,varValue)
    plt.xticks(varValue.index,varValue.index.values)
    plt.ylabel("Frequency")
    plt.title(variable)
    plt.show()
    print("{}: \n{}".format(variable,varValue))

In [None]:
categories=["Category", "Serving Size"]
for c in categories:
    bar_plot(c)

<a id="5"></a>
### Numerical Variable Analysis

In [None]:
def plot_hist(variable):
    plt.figure(figsize=(9,3))
    plt.hist(data[variable], bins=100)
    plt.xlabel(variable)
    plt.ylabel("Frequency")
    plt.title("{} distribution with hist.".format(variable))
    plt.show()

In [None]:
numericVar=["Calories", "Cholesterol", "Protein","Total Fat","Sugars","Carbohydrates" ]
for n in numericVar:
    plot_hist(n)

<a id="6"></a>
## Basic Data Analysis
* Category- Calories
* Category- Protein
* Category- Sugars
* Category- Total Fat

In [None]:
#Convert object to numeric in category attribute
data= data.replace({"Coffee & Tea": 0, "Breakfast":1, "Smoothies & Shakes": 2,"Beverages":3,
                    "Chicken & Fish": 4, "Beef & Pork":5, "Snacks & Sides": 6, "Desserts":7,"Salads":8})
data

In [None]:
#Category- Calories
#Coffee & Tea:0      #Breakfast:1     #Smoothies & Shakes:2    #Beverages:3
#Chicken & Fish: 4   #Beef & Pork:5   #Snacks & Sides: 6       #Desserts:7    #Salads:8
data[["Category", "Calories"]].groupby(["Category"],as_index=False).mean().sort_values(by="Calories", ascending=False)

In [None]:
#Category- Proteins
#Coffee & Tea:0      #Breakfast:1     #Smoothies & Shakes:2    #Beverages:3
#Chicken & Fish: 4   #Beef & Pork:5   #Snacks & Sides: 6       #Desserts:7    #Salads:8
data[["Category", "Protein"]].groupby(["Category"],as_index=False).mean().sort_values(by="Protein", ascending=False)

In [None]:
#Category- Sugars
#Coffee & Tea:0      #Breakfast:1     #Smoothies & Shakes:2    #Beverages:3
#Chicken & Fish: 4   #Beef & Pork:5   #Snacks & Sides: 6       #Desserts:7    #Salads:8
data[["Category", "Sugars"]].groupby(["Category"],as_index=False).mean().sort_values(by="Sugars", ascending=False)

In [None]:
#Category- Total Fat
#Coffee & Tea:0      #Breakfast:1     #Smoothies & Shakes:2    #Beverages:3
#Chicken & Fish: 4   #Beef & Pork:5   #Snacks & Sides: 6       #Desserts:7    #Salads:8
data[["Category", "Total Fat"]].groupby(["Category"],as_index=False).mean().sort_values(by="Total Fat", ascending=False)

<a id="7"></a>
## Outlier Detection

In [None]:
def detection_outliers(df,features):
    outlier_indices=[]
    
    for c in features:
        #Q1
        Q1=np.percentile(df[c],25)
        #Q3
        Q3=np.percentile(df[c],25)
        #IQR
        IQR= Q3-Q1
        #outlier step
        outlier_step= IQR*1.5
        #detection outlier and index
        outlier_list_col=df[(df[c]<Q1-outlier_step) | (df[c]>Q3+outlier_step)].index
        #store indeces
        outlier_indices.extend(outlier_list_col)
    
    outlier_indices=Counter(outlier_indices)
    multiple_outliers=list(i for i,v in outlier_indices.items() if v>2)
    
    return multiple_outliers

In [None]:
data.loc[detection_outliers(data,["Protein","Sugars","Calories","Total Fat"])]

In [None]:
#drop outliers
data=data.drop(detection_outliers(data,["Protein","Sugars","Calories","Total Fat"]),axis=0).reset_index(drop=True)