 ## Finding outliers using IQR
Outliers can have big effects on statistics like mean, as well as statistics that rely on the mean, such as variance and standard deviation. Interquartile range, or IQR, is another way of measuring spread that's less influenced by outliers. IQR is also often used to find outliers. If a value is less than 
 or greater than 
, it's considered an outlier. In fact, this is how the lengths of the whiskers in a matplotlib box plot are calculated.

Calculate the total co2_emission per country by grouping by country and taking the sum of co2_emission. Store the resulting DataFrame as emissions_by_country.

In [1]:
import numpy as np
import pandas as pd

# Load dataset
food_consumption = pd.read_csv(
    'https://assets.datacamp.com/production/repositories/5786/datasets/49f6356966016c70a9f63a0474942675377bdcf2/food_consumption.csv',
    encoding='latin1'
)

# Example of using groupby and sum
emissions_by_country = food_consumption.groupby('country')['co2_emission'].sum()

# Print the result
print(emissions_by_country)


country
Albania      1777.85
Algeria       707.88
Angola        412.99
Argentina    2172.40
Armenia      1109.93
              ...   
Uruguay      1634.91
Venezuela    1104.10
Vietnam       641.51
Zambia        225.30
Zimbabwe      350.33
Name: co2_emission, Length: 130, dtype: float64


Compute the first and third quartiles of emissions_by_country and store these as q1 and q3.
Calculate the interquartile range of emissions_by_country and store it as iqr.

In [2]:
 #Compute first (Q1) and third (Q3) quartiles
q1 = np.quantile(emissions_by_country, 0.25)
q3 = np.quantile(emissions_by_country, 0.75)

# Calculate interquartile range (IQR)
iqr = q3 - q1

# Print results
print("Q1 (25th percentile):", q1)
print("Q3 (75th percentile):", q3)
print("Interquartile Range (IQR):", iqr)

Q1 (25th percentile): 446.66
Q3 (75th percentile): 1111.1525000000001
Interquartile Range (IQR): 664.4925000000001


Calculate the lower and upper cutoffs for outliers of emissions_by_country, and store these as lower and upper.

In [4]:
# Calculate outlier cutoffs
lower = q1 - 1.5 * iqr
upper = q3 + 1.5 * iqr

# Print results
print("Q1 (25th percentile):", q1)
print("Q3 (75th percentile):", q3)
print("IQR:", iqr)
print("Lower cutoff for outliers:", lower)
print("Upper cutoff for outliers:", upper)

Q1 (25th percentile): 446.66
Q3 (75th percentile): 1111.1525000000001
IQR: 664.4925000000001
Lower cutoff for outliers: -550.0787500000001
Upper cutoff for outliers: 2107.89125


Subset emissions_by_country to get countries with a total emission greater than the upper cutoff or a total emission less than the lower cutoff.

In [5]:
#  Subset countries with emissions outside the cutoff range
outliers = emissions_by_country[(emissions_by_country < lower) | (emissions_by_country > upper)]

# Display results
print("Countries with outlier total CO₂ emissions:")
print(outliers)

Countries with outlier total CO₂ emissions:
country
Argentina    2172.4
Name: co2_emission, dtype: float64
