In [1]:
import numpy as np
import pandas as pd 

#### Methane is responsible for around 30% of the rise in global temperatures since the Industrial Revolution, and rapid and sustained reductions in methane emissions are key to limiting near-term global warming and improving air quality. The energy sector – including oil, natural gas, coal and bioenergy – accounts for nearly 40% of methane emissions from human activity.


The following dataset has information about methane gas emissions globally. Details about the columns are as follows.

- region - region of the world
- country - Country of Emission.
- emissions - Methane Emissions in kt.
- type - Sector from which emissions occur.
- Segment- Sub-sector from which emissions occur.
- reason - The reason for the emission.
- baseYear - Base year for the tracking of emissions.
- notes - The source of data

### <font color='red'>* Try answering the questions in Markdown from the dataset. *</font>

In [2]:
df=pd.read_csv("Methane_final.csv")

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,region,country,emissions,type,segment,reason,baseYear,notes
0,0,Africa,Algeria,257.611206,Agriculture,Total,All,2019-2021,Average based on United Nations Framework Conv...
1,1,Africa,Algeria,0.052,Energy,Bioenergy,All,2022,Estimates from end-uses are for 2020 or 2021 (...
2,2,Africa,Algeria,130.798996,Energy,Gas pipelines and LNG facilities,Fugitive,2022,Not available
3,3,Africa,Algeria,69.741898,Energy,Gas pipelines and LNG facilities,Vented,2022,Not available
4,4,Africa,Algeria,213.987,Energy,Onshore gas,Fugitive,2022,Not available


#### 1) How many numerical and categorical columns are there in the dataset?

In [4]:

# We clearly see that there are 2 numerical and 7 categorical columns in the dataset 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1548 entries, 0 to 1547
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  1548 non-null   int64  
 1   region      1548 non-null   object 
 2   country     1548 non-null   object 
 3   emissions   1548 non-null   float64
 4   type        1548 non-null   object 
 5   segment     1548 non-null   object 
 6   reason      1548 non-null   object 
 7   baseYear    1548 non-null   object 
 8   notes       1548 non-null   object 
dtypes: float64(1), int64(1), object(7)
memory usage: 109.0+ KB


#### 2) Display the emission levels as present in the last 100 rows of data?

In [5]:
df.index,df.shape

(RangeIndex(start=0, stop=1548, step=1), (1548, 9))

Unnamed: 0,emissions
1448,762.098083
1449,0.604000
1450,103.285004
1451,60.720402
1452,32.376099
...,...
1543,3102.500000
1544,30296.500000
1545,133350.984375
1546,9737.874023


#### 3) How many unique sectors in the data are responsible for emissions and which sector is captured most in the data?



In [38]:

# 4 unique sectors are responsible for emissions

array(['Agriculture', 'Energy', 'Other', 'Waste'], dtype=object)

In [8]:

# Energy sector is captured most in the data

0    Energy
Name: type, dtype: object

#### 4) What is the minimum and maximum emission level rounded upto 3 decimal places for a country like 'Algeria'?

0.004

2669.195

#### 5) Remove any leading and trailing spaces from the 'notes' column for all the rows and replace ' "Not avaialble" with "NA" '

0       Average based on United Nations Framework Conv...
1       Estimates from end-uses are for 2020 or 2021 (...
2                                                      NA
3                                                      NA
4                                                      NA
                              ...                        
1543                                                   NA
1544                                                   NA
1545    Estimates from end-uses are for 2020 or 2021 (...
1546    Average based on United Nations Framework Conv...
1547    Average based on United Nations Framework Conv...
Name: notes, Length: 1548, dtype: object

#### 6) Display all those rows where the emission levels are more than the mean emission level?

In [21]:
df.emissions.mean()

643.2559723044416

Unnamed: 0.1,Unnamed: 0,region,country,emissions,type,segment,reason,baseYear,notes
8,8,Africa,Algeria,1154.119995,Energy,Onshore oil,Vented,2022,Not available
12,12,Africa,Algeria,2669.194580,Energy,Total,All,2022,Estimates from end-uses are for 2020 or 2021 (...
48,48,Africa,Botswana,843.401672,Waste,Total,All,2019-2021,Average based on United Nations Framework Conv...
67,67,Africa,Chad,1223.932983,Agriculture,Total,All,2019-2021,Average based on United Nations Framework Conv...
118,118,Africa,Egypt,684.532227,Agriculture,Total,All,2019-2021,Average based on United Nations Framework Conv...
...,...,...,...,...,...,...,...,...,...
1543,1543,World,World,3102.500000,Energy,Satellite-detected large oil and gas emissions,All,2022,Not available
1544,1544,World,World,30296.500000,Energy,Steam coal,All,2022,Not available
1545,1545,World,World,133350.984375,Energy,Total,All,2022,Estimates from end-uses are for 2020 or 2021 (...
1546,1546,World,World,9737.874023,Other,Total,All,2019-2021,Average based on United Nations Framework Conv...


#### 7) Are there any countries which have emissions from the 'Bioenergy' segment in the year '2022'?

In [29]:
df.baseYear.value_counts()

2022         1233
2019-2021     315
Name: baseYear, dtype: int64

In [37]:

# Yes there are countries which have emissions from the 'Bioenergy' segment in the year '2022'

array(['Algeria', 'Angola', 'Benin', 'Botswana', 'Cameroon', 'Congo',
       "Cote d'Ivoire", 'Democratic Republic of Congo', 'Egypt',
       'Equatorial Guinea', 'Eritrea', 'Ethiopia', 'Gabon', 'Ghana',
       'Kenya', 'Libya', 'Morocco', 'Mozambique', 'Namibia', 'Niger',
       'Nigeria', 'Senegal', 'Somalia', 'South Africa', 'South Sudan',
       'Sudan', 'Tanzania', 'Togo', 'Tunisia', 'Australia', 'Bangladesh',
       'China', 'India', 'Indonesia', 'Japan', 'Korea', 'Malaysia',
       'Mongolia', 'New Zealand', 'Other countries in Southeast Asia',
       'Pakistan', 'Philippines', 'Thailand', 'Vietnam', 'Argentina',
       'Bolivia', 'Brazil', 'Colombia', 'Cuba', 'Ecuador', 'Guyana',
       'Paraguay', 'Peru', 'Trinidad and Tobago', 'Uruguay', 'Venezuela',
       'Denmark', 'Estonia', 'European Union', 'France', 'Germany',
       'Israel', 'Italy', 'Netherlands', 'Norway',
       'Other countries in Europe', 'Other EU17 countries',
       'Other EU7 countries', 'Poland', 'Romania',

#### 8) Which region has the most emissons from the 'Onshore gas' more than the median emission levels ?

In [40]:
df['emissions'].median()

24.064668655395508

Unnamed: 0.1,Unnamed: 0,region,country,emissions,type,segment,reason,baseYear,notes
4,4,Africa,Algeria,213.987,Energy,Onshore gas,Fugitive,2022,Not available
5,5,Africa,Algeria,464.308014,Energy,Onshore gas,Vented,2022,Not available
128,128,Africa,Egypt,40.961601,Energy,Onshore gas,Vented,2022,Not available
231,231,Africa,Libya,48.9403,Energy,Onshore gas,Vented,2022,Not available
297,297,Africa,Nigeria,63.722599,Energy,Onshore gas,Fugitive,2022,Not available
298,298,Africa,Nigeria,138.264999,Energy,Onshore gas,Vented,2022,Not available
416,416,Asia Pacific,Australia,45.123402,Energy,Onshore gas,Fugitive,2022,Not available
417,417,Asia Pacific,Australia,97.908302,Energy,Onshore gas,Vented,2022,Not available
432,432,Asia Pacific,Bangladesh,39.7145,Energy,Onshore gas,Fugitive,2022,Not available
433,433,Asia Pacific,Bangladesh,86.172203,Energy,Onshore gas,Vented,2022,Not available


In [48]:

# Middle East region has the most emissons from the 'Onshore gas' more than the median emission levels

'Middle East'