<a href="https://colab.research.google.com/github/polydiaguiar/sustainability-data-driven-projects/blob/main/Port_1_GEE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
"""
Script for Agriculture GEE emissions.

This script loads a dataset from Google Drive,
performs an exploratory analysis, and saves the results.
"""

# Import the library for data manipulation
import pandas as pd
# Import the function to connect to Google Drive in Colab
from google.colab import drive

# Mount Google Drive to allow access to the project files.
# The 'force_remount=True' parameter ensures the drive is remounted
# even if a session is already active.
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [2]:
# Path to the dataset on greenhouse gas emissions from rice cultivation by country.
path = '/content/drive/MyDrive/bancos/rice-cultivation_country_emissions_v4_5_0.csv'

# Load the dataset into a pandas DataFrame to begin the analysis.
df = pd.read_csv(path)

In [3]:
# Inspect the first 5 rows of dataset for a quick overview of the data
df.head()

Unnamed: 0,iso3_country,sector,subsector,start_time,end_time,gas,emissions_quantity,emissions_quantity_units,temporal_granularity,created_date,modified_date
0,ABW,agriculture,rice-cultivation,2015-01-01 00:00:00,2015-12-31 00:00:00,co2e_100yr,0.0,,annual,,
1,ABW,agriculture,rice-cultivation,2016-01-01 00:00:00,2016-12-31 00:00:00,co2e_100yr,0.0,,annual,,
2,ABW,agriculture,rice-cultivation,2017-01-01 00:00:00,2017-12-31 00:00:00,co2e_100yr,0.0,,annual,,
3,ABW,agriculture,rice-cultivation,2018-01-01 00:00:00,2018-12-31 00:00:00,co2e_100yr,0.0,,annual,,
4,ABW,agriculture,rice-cultivation,2019-01-01 00:00:00,2019-12-31 00:00:00,co2e_100yr,0.0,,annual,,


In [4]:
# Inspect the exact column names to check for typos or extra spaces before cleaning.
print(df.columns)

Index(['iso3_country', 'sector', 'subsector', 'start_time', 'end_time', 'gas',
       'emissions_quantity', 'emissions_quantity_units',
       'temporal_granularity', 'created_date', 'modified_date'],
      dtype='object')


In [5]:
# Creat a boolean mask to filter 9 countries: Brazil and the world's top 8 rice-producing
mask =  df['iso3_country'].isin(['BRA','CHN', 'IND', 'BGD', 'IDN', 'VNM', 'THA', 'MMR', 'PHL'])

In [6]:
# Apply the mask to create a new DataFrame containing only the top 8 rice producers and Brazil
df_selected = df[mask]

In [7]:
# Get a technical summary of the filtered dataset
df_selected.info()

<class 'pandas.core.frame.DataFrame'>
Index: 99 entries, 242 to 2661
Data columns (total 11 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   iso3_country              99 non-null     object 
 1   sector                    99 non-null     object 
 2   subsector                 99 non-null     object 
 3   start_time                99 non-null     object 
 4   end_time                  99 non-null     object 
 5   gas                       99 non-null     object 
 6   emissions_quantity        99 non-null     float64
 7   emissions_quantity_units  0 non-null      float64
 8   temporal_granularity      99 non-null     object 
 9   created_date              0 non-null      float64
 10  modified_date             0 non-null      float64
dtypes: float64(4), object(7)
memory usage: 9.3+ KB


In [8]:
# Drop of non informative columns
df_selected.drop(columns=['created_date', 'emissions_quantity_units', 'modified_date'], axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_selected.drop(columns=['created_date', 'emissions_quantity_units', 'modified_date'], axis=1, inplace=True)


In [9]:
# Function to get the head and tail of a datframe
def get_head_tail (dados):
  head_tail = pd.concat([dados.head(), dados.tail()], axis=0)
  print ("✅ Print the 5 first and last ones lines \n")
  print(head_tail)

resultado = get_head_tail(df_selected)

✅ Print the 5 first and last ones lines 

     iso3_country       sector         subsector           start_time  \
242           BGD  agriculture  rice-cultivation  2015-01-01 00:00:00   
243           BGD  agriculture  rice-cultivation  2016-01-01 00:00:00   
244           BGD  agriculture  rice-cultivation  2017-01-01 00:00:00   
245           BGD  agriculture  rice-cultivation  2018-01-01 00:00:00   
246           BGD  agriculture  rice-cultivation  2019-01-01 00:00:00   
2657          VNM  agriculture  rice-cultivation  2021-01-01 00:00:00   
2658          VNM  agriculture  rice-cultivation  2022-01-01 00:00:00   
2659          VNM  agriculture  rice-cultivation  2023-01-01 00:00:00   
2660          VNM  agriculture  rice-cultivation  2024-01-01 00:00:00   
2661          VNM  agriculture  rice-cultivation  2025-01-01 00:00:00   

                 end_time         gas  emissions_quantity temporal_granularity  
242   2015-12-31 00:00:00  co2e_100yr        6.834218e+07               a

In [10]:
# Check of constant information
dic = pd.DataFrame({'Valores únicos': df_selected.nunique()})
dic

Unnamed: 0,Valores únicos
iso3_country,9
sector,1
subsector,1
start_time,11
end_time,11
gas,1
emissions_quantity,99
temporal_granularity,1


In [11]:
# Drop of constant information
df_selected.drop(columns=['sector', 'subsector', 'gas', 'temporal_granularity'], axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_selected.drop(columns=['sector', 'subsector', 'gas', 'temporal_granularity'], axis=1, inplace=True)


In [12]:
resultado = get_head_tail(df_selected)

✅ Print the 5 first and last ones lines 

     iso3_country           start_time             end_time  \
242           BGD  2015-01-01 00:00:00  2015-12-31 00:00:00   
243           BGD  2016-01-01 00:00:00  2016-12-31 00:00:00   
244           BGD  2017-01-01 00:00:00  2017-12-31 00:00:00   
245           BGD  2018-01-01 00:00:00  2018-12-31 00:00:00   
246           BGD  2019-01-01 00:00:00  2019-12-31 00:00:00   
2657          VNM  2021-01-01 00:00:00  2021-12-31 00:00:00   
2658          VNM  2022-01-01 00:00:00  2022-12-31 00:00:00   
2659          VNM  2023-01-01 00:00:00  2023-12-31 00:00:00   
2660          VNM  2024-01-01 00:00:00  2024-12-31 00:00:00   
2661          VNM  2025-01-01 00:00:00  2025-12-31 00:00:00   

      emissions_quantity  
242         6.834218e+07  
243         6.644359e+07  
244         6.120540e+07  
245         6.246632e+07  
246         6.041476e+07  
2657        5.390754e+07  
2658        4.383173e+07  
2659        4.470394e+07  
2660        4.465137e

In [13]:
df_selected['end_date'] = pd.to_datetime(df_selected['end_time']).dt.date

df_selected['start_date'] = pd.to_datetime(df_selected['start_time']).dt.date

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_selected['end_date'] = pd.to_datetime(df_selected['end_time']).dt.date
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_selected['start_date'] = pd.to_datetime(df_selected['start_time']).dt.date


In [14]:
df_selected.drop(columns=['start_time', 'end_time'], axis = 1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_selected.drop(columns=['start_time', 'end_time'], axis = 1, inplace=True)


In [15]:
df_selected.head()

Unnamed: 0,iso3_country,emissions_quantity,end_date,start_date
242,BGD,68342180.0,2015-12-31,2015-01-01
243,BGD,66443590.0,2016-12-31,2016-01-01
244,BGD,61205400.0,2017-12-31,2017-01-01
245,BGD,62466320.0,2018-12-31,2018-01-01
246,BGD,60414760.0,2019-12-31,2019-01-01


In [16]:
# Salve anda donwload transformed data
df_final = df_selected.to_csv('/content/drive/MyDrive/bancos/data_trated_rice', index=False)