Write a Pandas program to filter the records from the world alcohol consumption dataset where the average consumption of beverages per person (beer, spirit, and wine combined) falls within the range of 0.5 to 2.5. Retrieve the records that include the country, average consumption, region, and GDP per capita.

In [174]:
import pandas as pd

df = pd.read_csv("dataset/World_Alcohol_Dataset - MAIN.csv")

In [175]:
df

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,life_expentancy,region,gdp_per_capita
0,Afghanistan,0,0,0,65.0,ASIA (EX. NEAR EAST),700
1,Albania,89,132,54,77.8,EASTERN EUROPE,4500
2,Algeria,25,0,14,75.6,NORTHERN AFRICA,6000
3,Angola,217,57,45,52.4,SUB-SAHARAN AFRICA,1900
4,Antigua and Barbuda,102,128,45,76.4,LATIN AMER. & CARIB,11000
...,...,...,...,...,...,...,...
172,Venezuela,333,100,3,74.1,LATIN AMER. & CARIB,4800
173,Vietnam,111,2,1,76.0,ASIA (EX. NEAR EAST),2500
174,Yemen,6,0,0,65.7,NEAR EAST,650
175,Zambia,32,19,4,61.8,SUB-SAHARAN AFRICA,1000


In [176]:
df.dtypes

country             object
beer_servings       object
spirit_servings     object
wine_servings       object
life_expentancy    float64
region              object
gdp_per_capita       int64
dtype: object

It's the same dataset, we already know where the defective rows are.

In [177]:
df[df["wine_servings"].str.contains("\?")] #we can see the row with the "?" character.

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,life_expentancy,region,gdp_per_capita
10,Bahamas,122,176,?,76.1,LATIN AMER. & CARIB,16700


In [178]:
df[df["beer_servings"].str.contains("\?")] 

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,life_expentancy,region,gdp_per_capita
96,Macedonia,?,27,86,75.7,EASTERN EUROPE,6700


In [179]:
df[df["spirit_servings"].str.contains("\?")] 

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,life_expentancy,region,gdp_per_capita
44,Denmark,224,?,278,86.0,WESTERN EUROPE,31100


In [180]:
df1=df.copy() #defining a new dataframe 
df1=df1[df1["wine_servings"]!="?"]
df1

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,life_expentancy,region,gdp_per_capita
0,Afghanistan,0,0,0,65.0,ASIA (EX. NEAR EAST),700
1,Albania,89,132,54,77.8,EASTERN EUROPE,4500
2,Algeria,25,0,14,75.6,NORTHERN AFRICA,6000
3,Angola,217,57,45,52.4,SUB-SAHARAN AFRICA,1900
4,Antigua and Barbuda,102,128,45,76.4,LATIN AMER. & CARIB,11000
...,...,...,...,...,...,...,...
172,Venezuela,333,100,3,74.1,LATIN AMER. & CARIB,4800
173,Vietnam,111,2,1,76.0,ASIA (EX. NEAR EAST),2500
174,Yemen,6,0,0,65.7,NEAR EAST,650
175,Zambia,32,19,4,61.8,SUB-SAHARAN AFRICA,1000


In [181]:
df1=df1[df1["beer_servings"]!="?"]
df1=df1[df1["spirit_servings"]!="?"]

In [182]:
df1.dtypes

country             object
beer_servings       object
spirit_servings     object
wine_servings       object
life_expentancy    float64
region              object
gdp_per_capita       int64
dtype: object

In [183]:
#changing types of the three columns
df1["beer_servings"]=df1["beer_servings"].astype(int)
df1["wine_servings"]=df1["wine_servings"].astype(int)
df1["spirit_servings"]=df1["spirit_servings"].astype(int)

In [184]:
df1.dtypes

country             object
beer_servings        int64
spirit_servings      int64
wine_servings        int64
life_expentancy    float64
region              object
gdp_per_capita       int64
dtype: object

In [185]:
df1["average_consumption"]= (df1["wine_servings"]+df1["beer_servings"]+df1["spirit_servings"])/3 #creating a new column "average consumption" 
df1

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,life_expentancy,region,gdp_per_capita,average_consumption
0,Afghanistan,0,0,0,65.0,ASIA (EX. NEAR EAST),700,0.000000
1,Albania,89,132,54,77.8,EASTERN EUROPE,4500,91.666667
2,Algeria,25,0,14,75.6,NORTHERN AFRICA,6000,13.000000
3,Angola,217,57,45,52.4,SUB-SAHARAN AFRICA,1900,106.333333
4,Antigua and Barbuda,102,128,45,76.4,LATIN AMER. & CARIB,11000,91.666667
...,...,...,...,...,...,...,...,...
172,Venezuela,333,100,3,74.1,LATIN AMER. & CARIB,4800,145.333333
173,Vietnam,111,2,1,76.0,ASIA (EX. NEAR EAST),2500,38.000000
174,Yemen,6,0,0,65.7,NEAR EAST,650,2.000000
175,Zambia,32,19,4,61.8,SUB-SAHARAN AFRICA,1000,18.333333


In [186]:
filtered_df1=df1[(df1["average_consumption"]>=0.5) & (df1["average_consumption"]<=2.5)] #showing the df with the requested infos
filtered_df1[["country","average_consumption","region","gdp_per_capita"]]

Unnamed: 0,country,average_consumption,region,gdp_per_capita
36,Comoros,1.666667,SUB-SAHARAN AFRICA,700
74,Indonesia,2.0,ASIA (EX. NEAR EAST),3200
101,Mali,2.333333,SUB-SAHARAN AFRICA,900
110,Myanmar,2.0,ASIA (EX. NEAR EAST),1800
116,Niger,2.0,SUB-SAHARAN AFRICA,800
134,Saudi Arabia,1.666667,NEAR EAST,11800
157,East Timor,2.0,ASIA (EX. NEAR EAST),500
174,Yemen,2.0,NEAR EAST,650
