# **Exploratory Data Analysis: Worldwide Meat Consumption**

## Domain: Food and Beverage

## Introduction:

Meat consumption is related to living standards, diet, livestock production and consumer prices, as well as macroeconomic uncertainty and shocks to GDP. Compared to other commodities, meat is characterised by high production costs and high output prices. Meat demand is associated with higher incomes and a shift - due to urbanisation - to food consumption changes that favour increased proteins from animal sources in diets. While the global meat industry provides food and a livelihood for billions of people, it also has significant environmental and health consequences for the planet. This indicator is presented for beef and veal, pig, poultry, and sheep. 

Meat consumption is measured in thousand tonnes of carcass weight (except for poultry expressed as ready to cook weight) and in kilograms of retail weight per capita. Carcass weight to retail weight conversion factors are: 0.7 for beef and veal, 0.78 for pigmeat, and 0.88 for both sheep meat and poultry meat.

(Source: https://data.world/oecd/meat-consumption/workspace/project-summary?agentid=oecd&datasetid=meat-consumption)


## Objective
Exploratory Data Analysis of the worldwide meat consumption to get insights about the data.

## Data Feilds
1. Location: The country code
1. Subject: The type of meat ('BEEF' 'PIG' 'POULTRY' 'SHEEP' etc.)
1. Measure:
  1. KG_CAP: KG per person annually
  1. THND_TONNE: Annual consumption in thousand of tonnes
1. Time: The Year the data recorded
1. Value: The Value, according to the Measure

# Importing Libaries

In [None]:
# !pip install plotly # ==4.2.1

In [None]:
# Importing libraries

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import numpy as np # linear algebra
import matplotlib.pyplot as plt
%matplotlib inline 

import seaborn as sns
import plotly.express as px
from IPython.display import HTML

import cufflinks as cf
cf.go_offline(connected=None)
import plotly.express as px

# Loading Data

In [None]:
# # Mounting Google drive to access the file

# from google.colab import drive
# drive.mount('/content/drive', force_remount=True)

# print("Google Drive Mounted Succssfully!")

In [None]:
# Reading the csv file and creating dataframe

df=pd.read_csv("../input/meatconsumption/meat_consumption_worldwide.csv")


In [None]:
# # Unmounting Google drive

# drive.flush_and_unmount()
# print("Google Drive Unmounted Succssfully!")

In [None]:
# printing sample data

df.head()

# Exploratory Data Analysis (EDA)


## Information about dataset

In [None]:
# size of dataset

df.shape

In [None]:
# information about columns, datatype etc.

df.info()

Observation:
1. Dataset have 13,760 rows and 5 columns
1. Datatye of the columns Location, Subject and Measure is object type
1. Time columns is of int64 datatype and Value column is of float64 datatype
1. All the columns are of not-null type

## Statistical analysis (Numeric columns only)

In [None]:
# Select duplicate rows except first occurrence based on all columns

duplicateRowsDf = df[df.duplicated()]
 
print("Duplicate Rows except first occurrence based on all columns are :")
print(len(duplicateRowsDf))

In [None]:
# Satistical analysis of Numerical columns

df.describe()

Observation:
1. Dataset does not have any duplicate rows
1. Both the numeric columns have same number of values i.e. 13,760; so there is no missing value in either columns
1. The Time column has minimum value 1990 and maximum value 2026. So this dataset contains value start from 1990 to 2026. This means it also have predicted consumption values.
1. There are chances of outliers in the column Value which may require treatment

## Analysis of Non-Numeric columns

In [None]:
# Analysis of Non-numerical columns

df.describe(include=['O'])

Observation:
1. There are 3 non-numeric columns i.e. Location, Subject and Measure
1. All the 3 columns has 13,760 count that means there is no missing value in any of the columns
1. Indonesia has highest number of records of meat consumption
1. The highest number records of meat consumed is Poultry which is measured in thousand tonnes. 

## NULL analysis

In [None]:
# checking for number of null records

df.isnull().sum()

Observation:
1. None of the columns have any null values

## General Analysis

In [None]:
# Number of unique value for each column

df.nunique()

In [None]:
# Number of unique countries

print(df['LOCATION'].unique())
print("Number of unique countries: %s" % (df['LOCATION'].nunique()))

In [None]:
# type of meat

print(df['SUBJECT'].unique())
print("Type of meat: %s" % (df['SUBJECT'].nunique()))

## Data Visualization

In [None]:
sns.distplot(df['Value'],kde=False, bins=None)
plt.title('Distribution of Values of meat consumption')
plt.show()

In [None]:
sns.distplot(np.log1p(df['Value']))
plt.show()

In [None]:
dfx = pd.get_dummies(df,columns=['MEASURE'])
dfx

In [None]:
sns.heatmap(dfx.corr(),annot=True)
plt.show()

In [None]:
# Total meat consumption meat type-wise

import random
import matplotlib.colors as mcolors

by_c = df.groupby('SUBJECT')[['Value']].sum().reset_index().sort_values('Value',ascending=False)

labels = by_c["SUBJECT"]
sections = by_c["Value"]
colors = None # random.choices(list(mcolors.CSS4_COLORS.values()),k = 4) # This is to generate random colours

plt.pie(sections, labels=labels,
        startangle=90,
        explode = (0.1, 0.1, 0.1, 0.1),
        autopct = '%1.2f%%',
        # shadow=True,
        radius=2, # size of the pie chart
        colors=colors,
        # wedgeprops = {'linewidth': 1},
        rotatelabels = False)

plt.axis('equal') # Try commenting this out.
plt.title('Total meat consumption meat type-wise')
plt.show()




Observation:
1. It is clear that the highest Pig is the hightest type of consumption followed by Polutry and Beef.
1. The lowest consumption meat type is Sheep

In [None]:
# Laction-wise Total Meat Consumption

dfx = df.groupby('LOCATION')[['Value']].sum().reset_index().sort_values('Value',ascending=False)

fig = px.bar(dfx,dfx['LOCATION'],dfx['Value'], hover_name='LOCATION',hover_data=['LOCATION'],color='LOCATION',title='Laction-wise Total Meat Consumption.')
# HTML(fig.to_html()) # for colab
fig.show() # generally

Observation:
1. The country WLD has the highest consumption of meat folloed by Brics and Oecd.

In [None]:
# Laction-wise Total Meat Consumption

by_c = df.groupby('SUBJECT')[['Value']].sum().reset_index().sort_values('Value',ascending=False)

fig = px.bar(by_c,by_c['SUBJECT'],by_c['Value'], hover_name='SUBJECT',hover_data=['SUBJECT'],color='SUBJECT',title='Meat type-wise Total Meat Consumption.')
# HTML(fig.to_html()) # for colab
fig.show() # generally

Observation:
1. The highest consumption of meat is Pig followed by Poultry, Beef and Sheep.

In [None]:
# Meat Consumption Change through the Years

fig = px.scatter(df, x="TIME", y="Value", hover_name='LOCATION',hover_data=['MEASURE'],color='SUBJECT',title='Meat Consumption Change through the Years')
# HTML(fig.to_html()) # for colab
fig.show() # generally

Observation:
1. The consumption of meat is estimated to be increased in future. 
2. It is estimated that the meat consumption will be increased 2 fold for Pig and Poultry in the year 2025.
3. The consumption of Pig meat is increased after 2015.
4. The consumption of Beef meat will remain almost same in future also.

In [None]:
df2=df[df['LOCATION'].isin(['WLD','BRICS','OECD','EU28'])==False]

fig = px.scatter(df2, x="TIME", y="Value",symbol='SUBJECT',hover_data=['MEASURE'],color='LOCATION',hover_name='SUBJECT',
                 title='Meat Production by Country and Type')
# HTML(fig.to_html()) # for colab
fig.show() # for general

Observation:
1. The consumption of Pig meat is highest and will remain highest.
2. The consumption of Poultry meat will remain second highest.
3. It is estimated that Pig meat will be highest consumed by the country CHN

## Change of meat eating habit

In [None]:
# change of meat eating habit

dfMEH = df.loc[df['MEASURE'] == 'THND_TONNE']

dfMEH91To95 = dfMEH.loc[(dfMEH['TIME'] >= 1991) & (dfMEH['TIME'] <= 1995)]
dfMEH96To05 = dfMEH.loc[(dfMEH['TIME'] >= 1996) & (dfMEH['TIME'] <= 2005)]
dfMEH06To10 = dfMEH.loc[(dfMEH['TIME'] >= 2006) & (dfMEH['TIME'] <= 2010)]
dfMEH11To15 = dfMEH.loc[(dfMEH['TIME'] >= 2011) & (dfMEH['TIME'] <= 2015)]
dfMEH21To25 = dfMEH.loc[(dfMEH['TIME'] >= 2021) & (dfMEH['TIME'] <= 2025)]

dfMEH91To95 = dfMEH91To95.groupby(by = ['SUBJECT']).Value.sum()
dfMEH96To05 = dfMEH96To05.groupby(by = ['SUBJECT']).Value.sum()
dfMEH06To10 = dfMEH06To10.groupby(by = ['SUBJECT']).Value.sum()
dfMEH11To15 = dfMEH11To15.groupby(by = ['SUBJECT']).Value.sum()
dfMEH21To25 = dfMEH21To25.groupby(by = ['SUBJECT']).Value.sum()

fig = plt.figure (figsize=(18,7))
fig.suptitle('Change of meat consumption habit over 35 years', size = 22)

ax5 = plt.subplot(1, 5, 1)
ax5.set_title('From 1991 to 1995')
dfMEH91To95.plot.pie(autopct='%1.0f%%')
plt.ylabel("")

ax5 = plt.subplot(1, 5, 2)
ax5.set_title('From 1996 to 2005')
dfMEH96To05.plot.pie(autopct='%1.0f%%')
plt.ylabel("")

ax6 = plt.subplot(1, 5, 3)
ax6.set_title('From 2006 to 2010')
dfMEH06To10.plot.pie(autopct='%1.0f%%')
plt.ylabel("")

ax7 = plt.subplot(1, 5, 4)
ax7.set_title('From 2011 to 2015')
dfMEH11To15.plot.pie(autopct='%1.0f%%')
plt.ylabel("")

ax8 = plt.subplot(1, 5, 5)
ax8.set_title('From 2021 to 2025')
dfMEH21To25.plot.pie(autopct='%1.0f%%')
plt.ylabel("")

Observation:
1. It can be clearly seen that the consumption of Pig meat is reducing year by year from 47% to 38%.
2. The consumption of Beef meat increased during 1996 to 2005 but remained constant thereafter.
3. The consumption of Poultry meat has increased in all the years except for the years from 47% to 38%.
4. The consumption of Sheep generally remained same in entire period of 35 years.

In [None]:
# Heatmap

dfx=df.pivot_table(index='TIME',columns='SUBJECT',values='Value',aggfunc = sum)
fig, ax = plt.subplots(figsize=(15,10))         # Sample figsize in inches
sns.heatmap(dfx, annot=True, linewidths=.5, ax=ax)


In [None]:
print("Notebook completed.")