In [96]:
import plotly.io as pio

pio.renderers.default = "vscode+jupyterlab+notebook_connected"

# Research Focus: Deforestation and Food Security

Deforestation, driven by industrialization and agricultural expansion, directly impacts global food security. Approximately **40% of the resources used in the food industry rely on nature**, including forests, which provide critical ecosystem services essential for sustainable agriculture. Forests regulate rainfall, prevent soil erosion, support biodiversity (like pollinators and pest controllers), and stabilize the climate. Their destruction disrupts food supply chains, decreases crop yields, and reduces dietary diversity, particularly in regions heavily reliant on forest ecosystems. These cascading effects highlight the urgent need to understand and address the relationship between deforestation and food security.

1. **Research Question**:  
   What is the relationship between deforestation and the Global Food Security Index across key sub-regions from 2012 to 2022?

2. **Hypothesis**:  
   Increasing rates of deforestation negatively impact the Global Food Security Index by:
   - Reducing agricultural productivity.
   - Disrupting food supply chains.
   - Decreasing dietary diversity.  

These effects are expected to vary significantly across sub-regions, with tropical regions experiencing greater vulnerabilities compared to temperate regions.

### Data Sources

1. **Forest Dataset**:  
   Sourced from [Global Forest Watch](https://www.globalforestwatch.org/dashboards/global), an organization dedicated to providing real-time data and tools for monitoring forests worldwide.

2. **Global Food Security Index Dataset**:  
   Obtained from [Economist Impact](https://impact.economist.com/sustainability/project/food-security-index/), which offers a dataset on Global Food Security Index.

In [60]:
import pandas as pd
import plotly.express as px

## Load the **Global Food Security Index Data**

In [61]:
fsi = pd.read_excel("fsi.xlsx")
fsi

Unnamed: 0,year,index,Algeria,Angola,Argentina,Australia,Austria,Azerbaijan,Bahrain,Bangladesh,...,Ukraine,United Arab Emirates,United Kingdom,United States,Uruguay,Uzbekistan,Venezuela,Vietnam,Yemen,Zambia
0,2012,FOOD SECURITY ENVIRONMENT,50.5,42.9,63.5,70.8,74.4,56.9,64.7,47.1,...,55.8,63.2,71.6,76.7,60.9,50.4,47.5,54.5,40.0,45.3
1,2013,FOOD SECURITY ENVIRONMENT,47.8,41.9,63.4,73.8,74.0,61.6,64.5,51.9,...,54.3,61.4,74.9,77.3,66.9,51.4,48.4,60.8,42.5,46.5
2,2014,FOOD SECURITY ENVIRONMENT,52.7,41.9,63.6,76.1,76.0,63.8,65.9,53.0,...,56.8,62.1,73.9,78.1,67.0,49.9,47.3,64.6,41.2,46.0
3,2015,FOOD SECURITY ENVIRONMENT,54.9,43.3,60.0,74.8,77.6,65.4,65.2,52.9,...,53.1,61.5,76.5,76.5,68.6,52.3,47.7,64.7,46.2,45.6
4,2016,FOOD SECURITY ENVIRONMENT,57.9,42.4,64.4,75.6,77.4,62.8,65.5,54.2,...,49.5,60.3,77.0,78.9,69.8,55.0,50.7,66.2,43.3,47.3
5,2017,FOOD SECURITY ENVIRONMENT,58.3,39.2,64.0,76.1,78.0,59.0,66.1,57.1,...,54.5,63.9,77.7,79.3,70.5,57.7,48.2,64.0,39.4,41.5
6,2018,FOOD SECURITY ENVIRONMENT,58.5,40.6,64.5,77.1,77.5,58.2,69.5,56.5,...,52.4,71.6,76.9,78.9,75.3,52.1,47.5,67.3,38.9,45.5
7,2019,FOOD SECURITY ENVIRONMENT,58.1,42.4,62.3,75.7,78.2,62.4,69.4,54.8,...,55.5,72.9,78.4,78.7,74.2,51.4,45.4,65.6,38.1,45.5
8,2020,FOOD SECURITY ENVIRONMENT,61.1,43.6,65.5,73.8,78.9,63.3,68.6,54.2,...,57.8,73.7,78.8,79.1,74.1,53.1,45.3,65.5,39.9,46.6
9,2021,FOOD SECURITY ENVIRONMENT,62.5,45.5,64.7,70.7,77.7,60.8,69.3,53.6,...,60.6,73.6,79.3,78.7,69.2,54.5,44.0,62.7,39.8,44.7


## Explanation: Transforming the Global Food Security Index Dataset

The Global Food Security Index dataset is currently stored in a **wide format**, where each country is represented as a separate column. However, to perform effective data analysis, we need to transform the dataset into a **long format**, where countries are represented as values in a single column, making it easier to analyze and manipulate the data.

In [62]:
fsi_long = pd.melt(
    fsi,
    id_vars=["year", "index"],
    var_name="country_1",
    value_name="Value"
)
fsi_long

Unnamed: 0,year,index,country_1,Value
0,2012,FOOD SECURITY ENVIRONMENT,Algeria,50.5
1,2013,FOOD SECURITY ENVIRONMENT,Algeria,47.8
2,2014,FOOD SECURITY ENVIRONMENT,Algeria,52.7
3,2015,FOOD SECURITY ENVIRONMENT,Algeria,54.9
4,2016,FOOD SECURITY ENVIRONMENT,Algeria,57.9
...,...,...,...,...
1238,2018,FOOD SECURITY ENVIRONMENT,Zambia,45.5
1239,2019,FOOD SECURITY ENVIRONMENT,Zambia,45.5
1240,2020,FOOD SECURITY ENVIRONMENT,Zambia,46.6
1241,2021,FOOD SECURITY ENVIRONMENT,Zambia,44.7


## Load the **Deforestation Data**

In [80]:
deforestation = pd.read_csv('treecover_loss__ha.csv')
deforestation

Unnamed: 0,iso,umd_tree_cover_loss__year,umd_tree_cover_loss__ha,gfw_gross_emissions_co2e_all_gases__Mg
0,AFG,2001,88.092712,2.324262e+04
1,AGO,2001,101220.621525,3.747955e+07
2,AIA,2001,3.878461,6.495611e+02
3,ALA,2001,396.934826,7.429655e+04
4,ALB,2001,3729.021031,1.365790e+06
...,...,...,...,...
4566,XKO,2023,1465.438575,8.512107e+05
4567,XNC,2023,41.029104,1.414011e+04
4568,ZAF,2023,29571.219239,2.619368e+07
4569,ZMB,2023,190416.586825,7.545991e+07


### Data Preprocessing:

- The forest dataset from **Global Forest Watch** includes a column named `iso`, which contains country codes following the International Organization for Standardization (ISO) standard. However, it does not include country names or additional specifications.
- To address this, we need to identify and load an additional dataset containing ISO codes alongside country names and specifications. This will allow us to link the forest data to corresponding countries for further analysis.
- Based on the output of the `info()` function, we can confirm that the data types for each column are appropriately formatted for analysis.
- To improve readability, we need to rename the column `umd_tree_cover_loss__year` to `year`.

In [77]:
deforestation.rename(columns={
    'umd_tree_cover_loss__year': 'year',
}, inplace=True)
deforestation

Unnamed: 0,iso,year,umd_tree_cover_loss__ha,gfw_gross_emissions_co2e_all_gases__Mg
0,AFG,2001,88.092712,2.324262e+04
1,AGO,2001,101220.621525,3.747955e+07
2,AIA,2001,3.878461,6.495611e+02
3,ALA,2001,396.934826,7.429655e+04
4,ALB,2001,3729.021031,1.365790e+06
...,...,...,...,...
4566,XKO,2023,1465.438575,8.512107e+05
4567,XNC,2023,41.029104,1.414011e+04
4568,ZAF,2023,29571.219239,2.619368e+07
4569,ZMB,2023,190416.586825,7.545991e+07


### Preparing Data for Sub-Region Analysis

To analyze the relationship between deforestation and food security at the sub-region level, we need data that includes sub-region information. Since our deforestation dataset contains ISO codes, we can load a dataset with ISO code mappings to sub-regions and merge it with our deforestation data to include the necessary sub-region details.

In [81]:
iso = pd.read_csv("continents_region.csv")
iso

Unnamed: 0,alpha-3,name,continent_region,continent_sub_region
0,AFG,Afghanistan,Asia,Southern Asia
1,ALA,Åland Islands,Europe,Northern Europe
2,ALB,Albania,Europe,Southern Europe
3,DZA,Algeria,Africa,Northern Africa
4,ASM,American Samoa,Oceania,Polynesia
...,...,...,...,...
244,WLF,Wallis and Futuna,Oceania,Polynesia
245,ESH,Western Sahara,Africa,Northern Africa
246,YEM,Yemen,Asia,Western Asia
247,ZMB,Zambia,Africa,Sub-Saharan Africa


In [84]:
iso.rename(columns={
    'name': 'country',
}, inplace=True)
iso

Unnamed: 0,alpha-3,country,continent_region,continent_sub_region
0,AFG,Afghanistan,Asia,Southern Asia
1,ALA,Åland Islands,Europe,Northern Europe
2,ALB,Albania,Europe,Southern Europe
3,DZA,Algeria,Africa,Northern Africa
4,ASM,American Samoa,Oceania,Polynesia
...,...,...,...,...
244,WLF,Wallis and Futuna,Oceania,Polynesia
245,ESH,Western Sahara,Africa,Northern Africa
246,YEM,Yemen,Asia,Western Asia
247,ZMB,Zambia,Africa,Sub-Saharan Africa


####  We are now ready to merge the deforestation dataset with the continent data to include detailed regional information.

In [85]:
deforestation_iso = pd.merge(deforestation,
                             iso, left_on="iso", right_on="alpha-3",
                             how="left")
deforestation_iso

Unnamed: 0,iso,umd_tree_cover_loss__year,umd_tree_cover_loss__ha,gfw_gross_emissions_co2e_all_gases__Mg,alpha-3,country,continent_region,continent_sub_region
0,AFG,2001,88.092712,2.324262e+04,AFG,Afghanistan,Asia,Southern Asia
1,AGO,2001,101220.621525,3.747955e+07,AGO,Angola,Africa,Sub-Saharan Africa
2,AIA,2001,3.878461,6.495611e+02,AIA,Anguilla,Americas,Latin America and the Caribbean
3,ALA,2001,396.934826,7.429655e+04,ALA,Åland Islands,Europe,Northern Europe
4,ALB,2001,3729.021031,1.365790e+06,ALB,Albania,Europe,Southern Europe
...,...,...,...,...,...,...,...,...
4566,XKO,2023,1465.438575,8.512107e+05,,,,
4567,XNC,2023,41.029104,1.414011e+04,,,,
4568,ZAF,2023,29571.219239,2.619368e+07,ZAF,South Africa,Africa,Sub-Saharan Africa
4569,ZMB,2023,190416.586825,7.545991e+07,ZMB,Zambia,Africa,Sub-Saharan Africa


As observed, there are still several ISO codes that are not included in our region dataset. To address this, we need to identify these missing codes. If possible, we should supplement the dataset by filling in the appropriate country names and regional information from other reliable sources.

In [86]:
nan_data = deforestation_iso[deforestation_iso['continent_region'].isna()]
nan_data

Unnamed: 0,iso,umd_tree_cover_loss__year,umd_tree_cover_loss__ha,gfw_gross_emissions_co2e_all_gases__Mg,alpha-3,country,continent_region,continent_sub_region
200,XAD,2001,1.648989,382.712322,,,,
201,XCA,2001,9.735251,352.238546,,,,
202,XKO,2001,1122.205429,444640.900791,,,,
203,XNC,2001,17.661583,4908.941633,,,,
407,XAD,2002,0.507202,143.618419,,,,
...,...,...,...,...,...,...,...,...
4372,XKO,2022,784.685260,445103.190913,,,,
4373,XNC,2022,527.159030,75621.874984,,,,
4565,XCA,2023,0.989440,104.545602,,,,
4566,XKO,2023,1465.438575,851210.665131,,,,


In [87]:
nan_data['iso'].unique()

array(['XAD', 'XCA', 'XKO', 'XNC'], dtype=object)

According to ISO 3166-1, codes starting with ‘X’ are reserved for user-assigned purposes and do not officially represent recognized countries. However, these codes are often used informally in datasets and applications to denote specific regions or entities. Here’s how the following codes are interpreted:

- **XAD**: Commonly used to denote **Andorra**.  
- **XKO**: Typically used to represent **Kosovo**.  
- **XNC**: Frequently stands for **New Caledonia**.

The code **XCA** appears incomplete or undocumented. As a result, I have decided to drop it from the analysis.

In [88]:
update_values = {
    'XAD': {'country': 'Andorra', 'continent_region': 'Europe', 'continent_sub_region': 'Southern Europe'},
    'XKO': {'country': 'Kosovo', 'continent_region': 'Europe', 'continent_sub_region': 'Southern Europe'},
    'XNC': {'country': 'New Caledonia', 'continent_region': 'Oceania', 'continent_sub_region': 'Melanesia'}
}

for iso, values in update_values.items():
    deforestation_iso.loc[deforestation_iso['iso'] == iso,
                      ['name', 'continent_region', 'continent_sub_region']] = values.values()
deforestation_iso

Unnamed: 0,iso,umd_tree_cover_loss__year,umd_tree_cover_loss__ha,gfw_gross_emissions_co2e_all_gases__Mg,alpha-3,country,continent_region,continent_sub_region,name
0,AFG,2001,88.092712,2.324262e+04,AFG,Afghanistan,Asia,Southern Asia,
1,AGO,2001,101220.621525,3.747955e+07,AGO,Angola,Africa,Sub-Saharan Africa,
2,AIA,2001,3.878461,6.495611e+02,AIA,Anguilla,Americas,Latin America and the Caribbean,
3,ALA,2001,396.934826,7.429655e+04,ALA,Åland Islands,Europe,Northern Europe,
4,ALB,2001,3729.021031,1.365790e+06,ALB,Albania,Europe,Southern Europe,
...,...,...,...,...,...,...,...,...,...
4566,XKO,2023,1465.438575,8.512107e+05,,,Europe,Southern Europe,Kosovo
4567,XNC,2023,41.029104,1.414011e+04,,,Oceania,Melanesia,New Caledonia
4568,ZAF,2023,29571.219239,2.619368e+07,ZAF,South Africa,Africa,Sub-Saharan Africa,
4569,ZMB,2023,190416.586825,7.545991e+07,ZMB,Zambia,Africa,Sub-Saharan Africa,


#### Dropping the XCA Data

In [95]:
deforestation_iso = deforestation_iso[deforestation_iso['iso'] != 'XCA']
deforestation_iso

Unnamed: 0,iso,umd_tree_cover_loss__year,umd_tree_cover_loss__ha,gfw_gross_emissions_co2e_all_gases__Mg,country,continent_region,continent_sub_region,name
0,AFG,2001,88.092712,2.324262e+04,Afghanistan,Asia,Southern Asia,
1,AGO,2001,101220.621525,3.747955e+07,Angola,Africa,Sub-Saharan Africa,
2,AIA,2001,3.878461,6.495611e+02,Anguilla,Americas,Latin America and the Caribbean,
3,ALA,2001,396.934826,7.429655e+04,Åland Islands,Europe,Northern Europe,
4,ALB,2001,3729.021031,1.365790e+06,Albania,Europe,Southern Europe,
...,...,...,...,...,...,...,...,...
4566,XKO,2023,1465.438575,8.512107e+05,,Europe,Southern Europe,Kosovo
4567,XNC,2023,41.029104,1.414011e+04,,Oceania,Melanesia,New Caledonia
4568,ZAF,2023,29571.219239,2.619368e+07,South Africa,Africa,Sub-Saharan Africa,
4569,ZMB,2023,190416.586825,7.545991e+07,Zambia,Africa,Sub-Saharan Africa,


In [93]:
print(deforestation_iso.columns.tolist())

['iso', 'umd_tree_cover_loss__year', 'umd_tree_cover_loss__ha', 'gfw_gross_emissions_co2e_all_gases__Mg', 'country', 'continent_region', 'continent_sub_region', 'name']


In [94]:
deforestation_iso

Unnamed: 0,iso,umd_tree_cover_loss__year,umd_tree_cover_loss__ha,gfw_gross_emissions_co2e_all_gases__Mg,country,continent_region,continent_sub_region,name
0,AFG,2001,88.092712,2.324262e+04,Afghanistan,Asia,Southern Asia,
1,AGO,2001,101220.621525,3.747955e+07,Angola,Africa,Sub-Saharan Africa,
2,AIA,2001,3.878461,6.495611e+02,Anguilla,Americas,Latin America and the Caribbean,
3,ALA,2001,396.934826,7.429655e+04,Åland Islands,Europe,Northern Europe,
4,ALB,2001,3729.021031,1.365790e+06,Albania,Europe,Southern Europe,
...,...,...,...,...,...,...,...,...
4566,XKO,2023,1465.438575,8.512107e+05,,Europe,Southern Europe,Kosovo
4567,XNC,2023,41.029104,1.414011e+04,,Oceania,Melanesia,New Caledonia
4568,ZAF,2023,29571.219239,2.619368e+07,South Africa,Africa,Sub-Saharan Africa,
4569,ZMB,2023,190416.586825,7.545991e+07,Zambia,Africa,Sub-Saharan Africa,


In [14]:
data = pd.merge(deforestation_iso, fsi_long, on =["country_1", "year"], how='inner')
data

Unnamed: 0,iso,year,umd_tree_cover_loss__ha,gfw_gross_emissions_co2e_all_gases__Mg,alpha-3,country_1,continent_region,continent_sub_region,index,Value
0,AGO,2012,180754.820684,5.826792e+07,AGO,Angola,Africa,Sub-Saharan Africa,FOOD SECURITY ENVIRONMENT,42.9
1,ARE,2012,0.000000,7.844774e+00,ARE,United Arab Emirates,Asia,Western Asia,FOOD SECURITY ENVIRONMENT,63.2
2,ARG,2012,478177.910644,1.164256e+08,ARG,Argentina,Americas,Latin America and the Caribbean,FOOD SECURITY ENVIRONMENT,63.5
3,AUS,2012,139802.757737,4.699907e+07,AUS,Australia,Oceania,Australia and New Zealand,FOOD SECURITY ENVIRONMENT,70.8
4,AUT,2012,11158.429234,5.684967e+06,AUT,Austria,Europe,Western Europe,FOOD SECURITY ENVIRONMENT,74.4
...,...,...,...,...,...,...,...,...,...,...
1114,UZB,2022,7.489417,2.529865e+03,UZB,Uzbekistan,Asia,Central Asia,FOOD SECURITY ENVIRONMENT,57.5
1115,VEN,2022,55789.615801,2.981618e+07,VEN,Venezuela,Americas,Latin America and the Caribbean,FOOD SECURITY ENVIRONMENT,42.6
1116,VNM,2022,165142.058464,1.420704e+08,VNM,Vietnam,Asia,South-eastern Asia,FOOD SECURITY ENVIRONMENT,67.9
1117,ZAF,2022,40979.718696,3.413527e+07,ZAF,South Africa,Africa,Sub-Saharan Africa,FOOD SECURITY ENVIRONMENT,61.7


In [15]:
data.drop(columns=['gfw_gross_emissions_co2e_all_gases__Mg', 'alpha-3'], inplace=True)
data

Unnamed: 0,iso,year,umd_tree_cover_loss__ha,country_1,continent_region,continent_sub_region,index,Value
0,AGO,2012,180754.820684,Angola,Africa,Sub-Saharan Africa,FOOD SECURITY ENVIRONMENT,42.9
1,ARE,2012,0.000000,United Arab Emirates,Asia,Western Asia,FOOD SECURITY ENVIRONMENT,63.2
2,ARG,2012,478177.910644,Argentina,Americas,Latin America and the Caribbean,FOOD SECURITY ENVIRONMENT,63.5
3,AUS,2012,139802.757737,Australia,Oceania,Australia and New Zealand,FOOD SECURITY ENVIRONMENT,70.8
4,AUT,2012,11158.429234,Austria,Europe,Western Europe,FOOD SECURITY ENVIRONMENT,74.4
...,...,...,...,...,...,...,...,...
1114,UZB,2022,7.489417,Uzbekistan,Asia,Central Asia,FOOD SECURITY ENVIRONMENT,57.5
1115,VEN,2022,55789.615801,Venezuela,Americas,Latin America and the Caribbean,FOOD SECURITY ENVIRONMENT,42.6
1116,VNM,2022,165142.058464,Vietnam,Asia,South-eastern Asia,FOOD SECURITY ENVIRONMENT,67.9
1117,ZAF,2022,40979.718696,South Africa,Africa,Sub-Saharan Africa,FOOD SECURITY ENVIRONMENT,61.7


In [16]:
data.rename(columns={
    'Value': 'fsi',
}, inplace=True)
data

Unnamed: 0,iso,year,umd_tree_cover_loss__ha,country_1,continent_region,continent_sub_region,index,fsi
0,AGO,2012,180754.820684,Angola,Africa,Sub-Saharan Africa,FOOD SECURITY ENVIRONMENT,42.9
1,ARE,2012,0.000000,United Arab Emirates,Asia,Western Asia,FOOD SECURITY ENVIRONMENT,63.2
2,ARG,2012,478177.910644,Argentina,Americas,Latin America and the Caribbean,FOOD SECURITY ENVIRONMENT,63.5
3,AUS,2012,139802.757737,Australia,Oceania,Australia and New Zealand,FOOD SECURITY ENVIRONMENT,70.8
4,AUT,2012,11158.429234,Austria,Europe,Western Europe,FOOD SECURITY ENVIRONMENT,74.4
...,...,...,...,...,...,...,...,...
1114,UZB,2022,7.489417,Uzbekistan,Asia,Central Asia,FOOD SECURITY ENVIRONMENT,57.5
1115,VEN,2022,55789.615801,Venezuela,Americas,Latin America and the Caribbean,FOOD SECURITY ENVIRONMENT,42.6
1116,VNM,2022,165142.058464,Vietnam,Asia,South-eastern Asia,FOOD SECURITY ENVIRONMENT,67.9
1117,ZAF,2022,40979.718696,South Africa,Africa,Sub-Saharan Africa,FOOD SECURITY ENVIRONMENT,61.7


In [29]:
deforest_agg = data.groupby(['year', 'continent_sub_region']).agg(
    {'umd_tree_cover_loss__ha': 'sum', 'fsi': 'mean'}
).reset_index()

fig = px.line(
    deforest_agg,
    x="year", 
    y="umd_tree_cover_loss__ha", 
    color="continent_sub_region",
    labels={"year": "Year",
        "umd_tree_cover_loss__ha": "Tree Cover Loss (ha)",
        "continent_region": "Continent",
    },
    title="Deforestation by Region Over Time"
)
fig.show()

In [30]:
deforest_agg

Unnamed: 0,year,continent_sub_region,umd_tree_cover_loss__ha,fsi
0,2012,Australia and New Zealand,2.093013e+05,71.700000
1,2012,Central Asia,3.822932e+03,53.400000
2,2012,Eastern Asia,6.673582e+05,66.333333
3,2012,Eastern Europe,5.616977e+06,64.066667
4,2012,Latin America and the Caribbean,5.429153e+06,58.533333
...,...,...,...,...
149,2022,Southern Asia,1.469098e+05,55.440000
150,2022,Southern Europe,2.276766e+05,72.400000
151,2022,Sub-Saharan Africa,1.886993e+06,47.384000
152,2022,Western Asia,5.408784e+04,59.050000


In [31]:
fig = px.line(
    deforest_agg,
    x="year", 
    y="fsi", 
    color="continent_sub_region",
    labels={"year": "Year",
        "fsi": "Average of Food Security Index",
        "continent_region": "Continent",
    },
    title="Food Security Index by Region Over Time"
)
fig.show()

In [32]:
fig = px.scatter(
    data,
    x="umd_tree_cover_loss__ha",
    y="fsi",
    color="continent_sub_region",
    trendline="ols",
    title="Correlation Between Deforestation and Food Security Index",
    labels={
        "umd_tree_cover_loss__ha": "Tree Cover Loss (ha)",
        "fsi": "Food Security Index",
        "continent_region": "Continent"
    }
)

fig.show()

In [77]:
data['umd_tree_cover_loss__ha'].describe()

count    1.119000e+03
mean     2.196162e+05
std      6.693427e+05
min      0.000000e+00
25%      3.717139e+03
50%      3.389930e+04
75%      1.574755e+05
max      6.518852e+06
Name: umd_tree_cover_loss__ha, dtype: float64

In [78]:
data['fsi'].describe()

count    1119.000000
mean       61.036729
std        12.577975
min        32.800000
25%        50.250000
50%        61.500000
75%        72.450000
max        84.300000
Name: fsi, dtype: float64

In [33]:
country_mean = data.groupby('country_1').agg({
    'umd_tree_cover_loss__ha':'mean',
    'fsi':'mean'
}).reset_index()
country_mean

Unnamed: 0,country_1,umd_tree_cover_loss__ha,fsi
0,Algeria,13282.078766,56.472727
1,Angola,224096.321401,42.490909
2,Argentina,244571.491840,63.700000
3,Australia,569075.851311,74.536364
4,Austria,17693.888186,77.072727
...,...,...,...
101,Uruguay,19906.976771,69.845455
102,Uzbekistan,21.907540,53.111111
103,Venezuela,110876.558798,46.781818
104,Vietnam,217915.164426,63.981818


In [34]:
fig = px.scatter(
country_mean,
x= 'umd_tree_cover_loss__ha',
y='fsi',
text='country_1',
labels={
    'umd_tree_cover_loss__ha': 'Average of Tree Cover Loss (ha)',
    'fsi': 'Average of Food Security Index'},
title='Median Tree Cover Loss vs Median CO2 Emission by Country 2001 - 2019')

fig.update_traces(marker=dict(size=10, opacity=0.7), textposition='top center')
fig.update_layout(
title_font_size=16,
xaxis_title='Average of Tree Cover Loss (ha)',
yaxis_title='Average of Food Security Index',
template='plotly_white'
)
fig.show()

In [35]:
from scipy.stats import pearsonr

results = []
for region, group in deforest_agg.groupby('continent_sub_region'):
    corr, p_value = pearsonr(group['umd_tree_cover_loss__ha'], group['fsi'])
    results.append({
        'continent_sub_region': region,
        'correlation': corr,
        'p_value': p_value,
        'is_significant': 'Yes' if p_value < 0.05 else 'No'
    })


correlations = pd.DataFrame(results)
correlations

Unnamed: 0,continent_sub_region,correlation,p_value,is_significant
0,Australia and New Zealand,0.223145,0.509545,No
1,Central Asia,-0.367052,0.266821,No
2,Eastern Asia,0.052871,0.877307,No
3,Eastern Europe,0.180454,0.595441,No
4,Latin America and the Caribbean,0.260004,0.440037,No
5,Northern Africa,0.310776,0.352273,No
6,Northern America,-0.454645,0.160042,No
7,Northern Europe,0.872534,0.000458,Yes
8,South-eastern Asia,-0.440085,0.175557,No
9,Southern Asia,0.671392,0.023692,Yes


In [38]:
correlations['significance_color'] = correlations['is_significant'].map({'Yes': 'green', 'No': 'red'})

fig = px.bar(
    correlations,
    x='continent_sub_region',
    y='correlation',
    title="Correlation Between Tree Cover Loss and CO2 Emissions by Region (Significance Highlighted)",
    labels={
        "continent_region": "Region",
        "correlation": "Correlation Coefficient"
    },
    color='is_significant',
    text='correlation',
    color_discrete_map={'Yes': 'green', 'No': 'red'}
)

fig.update_traces(
    texttemplate='%{text:.2f} (%{customdata[1]})',
    textposition='outside',
    customdata=correlations[['continent_sub_region', 'is_significant']]
)

fig.update_layout(
    showlegend=True, 
    yaxis_title="Correlation Coefficient",
    legend_title="Significance",
)

fig.show()

In [40]:
from scipy.stats import linregress

regions = deforest_agg.groupby('continent_sub_region')

simple_regression_results = []


for region, group in regions:
    slope, intercept, r_value, p_value, std_err = linregress(
        group['umd_tree_cover_loss__ha'], group['fsi']
    )
    
    simple_regression_results.append({
        'region': region,
        'slope': slope,
        'intercept': intercept,
        'r_squared': r_value**2,
        'p_value': p_value,
        'significant': p_value < 0.05
    })

simple_regression_df = pd.DataFrame(simple_regression_results)
print(simple_regression_df)

                             region         slope  intercept  r_squared  \
0         Australia and New Zealand  5.067937e-07  75.198293   0.049794   
1                      Central Asia -1.159647e-03  61.207192   0.134727   
2                      Eastern Asia  1.325808e-06  71.063624   0.002795   
3                    Eastern Europe  3.562000e-07  66.355645   0.032564   
4   Latin America and the Caribbean  3.722344e-07  60.866095   0.067602   
5                   Northern Africa  5.385812e-05  52.598363   0.096582   
6                  Northern America -1.767842e-06  84.392878   0.206702   
7                   Northern Europe  1.116366e-05  72.357273   0.761315   
8                South-eastern Asia -1.467751e-06  64.201733   0.193675   
9                     Southern Asia  5.017202e-05  46.170550   0.450768   
10                  Southern Europe  1.689159e-05  68.644102   0.269762   
11               Sub-Saharan Africa  2.195476e-06  42.243900   0.275676   
12                     We

In [41]:
fig = px.bar(
    simple_regression_df,
    x='region',
    y='r_squared',
    color='significant',  # Highlight statistically significant regions
    title="R-squared and Statistical Significance by Region",
    labels={'region': 'Region', 'r_squared': 'R-squared Value'},
    text='r_squared'
)
fig.update_traces(texttemplate='%{text:.2f}', textposition='outside')
fig.show()