# Group Project Jerry Maleka

### Project Title: World Forest Area (km and %)
#### Done By: Jerry Maleka

© ExploreAI 2024

---

## Table of Contents

<a href=#BC> Background Context</a>

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Data Collection and Description</a>

<a href=#three>3. Loading Data </a>

<a href=#four>4. Data Cleaning and Filtering</a>

<a href=#five>5. Exploratory Data Analysis (EDA)</a>

<a href=#six>6. Modeling </a>

<a href=#seven>7. Evaluation and Validation</a>

<a href=#eight>8. Final Model</a>

<a href=#nine>9. Conclusion and Future Work</a>

<a href=#ten>10. References</a>

---
 <a id="BC"></a>
## **Project Overview**
<a href=#cont>Back to Table of Contents</a>


The project aims to analyze changes in forest areas across different countries from 1990 to 2021. Two datasets are used:

1.	Forest Area in Square Kilometers: Contains data on the total forest area in km² for each country over the years.
2.	Forest Area as a Percentage of Total Land Area: Shows the percentage of a country's land area covered by forests.


The analysis focuses on identifying trends, understanding regional differences, and providing insights into forest conservation efforts globally. The ultimate goal is to highlight countries with significant changes in forest area and suggest additional data that could enrich the analysis.


---
<a href=#one></a>
## **Importing Packages**
<a href=#cont>Back to Table of Contents</a>


Python packages are necessary for data manipulation, analysis, and visualization.

* **Purpose:** Set up the Python environment with necessary libraries and tools.
* **Details:** List and import all the Python packages that will be used throughout the project such as Pandas for data manipulation, Matplotlib/Seaborn for visualization, scikit-learn for modeling, etc.
---

In [2]:
#Importing all important packages

import pandas as pd                  # For data manipulation and analysis
import numpy as np                   # For numerical operations
import matplotlib.pyplot as plt      # For data visualization
import seaborn as sns                # For enhanced data visualization


---
<a href=#two></a>
## **Data Collection and Description**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Describe how the data was collected and provide an overview of its characteristics.
* **Details:** The analysis focuses on identifying trends, understanding regional differences, and providing insights into forest conservation efforts globally. The ultimate goal is to highlight countries with significant changes in forest area and suggest additional data that could enrich the analysis..
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#three></a>
## **Loading Data**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Load the data into the notebook for manipulation and analysis.
* **Details:** Show the code used to load the data and display the first few rows to give a sense of what the raw data looks like.
---

In [7]:
#Data Loading
# Loading the forest area data in square kilometers

forest_area_km = pd.read_csv('forest_area_km.csv')

# Loading the forest area percentage data
forest_area_percent = pd.read_csv('forest_area_percent.csv')

# Loading the Population_per_countries_overtime  data
Population_per_countries_overtime = pd.read_csv('Population_per_countries_overtime.csv')

# Display the first few rows of the DataFrame

print(forest_area_km.head())
print(forest_area_percent.head())
print(Population_per_countries_overtime.head())

     Country Name Country Code     1990      1991      1992      1993  \
0     Afghanistan          AFG  12084.4  12084.40  12084.40  12084.40   
1         Albania          ALB   7888.0   7868.50   7849.00   7829.50   
2         Algeria          DZA  16670.0  16582.00  16494.00  16406.00   
3  American Samoa          ASM    180.7    180.36    180.02    179.68   
4         Andorra          AND    160.0    160.00    160.00    160.00   

       1994     1995      1996      1997  ...      2012       2013      2014  \
0  12084.40  12084.4  12084.40  12084.40  ...  12084.40  12084.400  12084.40   
1   7810.00   7790.5   7771.00   7751.50  ...   7849.17   7863.405   7877.64   
2  16318.00  16230.0  16142.00  16054.00  ...  19332.00  19408.000  19484.00   
3    179.34    179.0    178.66    178.32  ...    173.70    173.400    173.10   
4    160.00    160.0    160.00    160.00  ...    160.00    160.000    160.00   

        2015     2016       2017     2018     2019     2020       2021  
0  1208

---
<a href=#four></a>
## **Data Cleaning and Filtering**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Prepare the data for analysis by cleaning and filtering.
* **Details:** Include steps for handling missing values, removing outliers, correcting errors, and possibly reducing the data (filtering based on certain criteria or features).
---

In [8]:
#1.	Checking for Missing Values:
#•	Missing values can skew analysis results. It's essential to identify and handle them appropriately.
forest_area_km.isnull().sum()
forest_area_percent.isnull().sum()
Population_per_countries_overtime.isnull().sum()


Country Name     0
Country Code     0
1990            44
1991            40
1992            11
1993             8
1994             8
1995             8
1996             8
1997             8
1998             8
1999             8
2000             6
2001             6
2002             6
2003             6
2004             6
2005             6
2006             4
2007             4
2008             4
2009             4
2010             4
2011             1
2012             0
2013             0
2014             0
2015             0
2016             0
2017             0
2018             0
2019             0
2020             0
2021             0
dtype: int64

In [9]:
#2.	Handling Missing Values:
#•	Depending on the nature of the missing data, you can either drop rows/columns with missing values or fill them using methods like forward fill, backward fill, or mean imputation.

#Handling Missing Values:
forest_area_km.fillna(method='ffill', inplace=True)
forest_area_percent.fillna(method='ffill', inplace=True)
Population_per_countries_overtime.fillna(method='ffill', inplace=True)


  forest_area_km.fillna(method='ffill', inplace=True)
  forest_area_percent.fillna(method='ffill', inplace=True)


In [10]:
#3.	Data Type Conversion:
#•	Ensuring all columns have appropriate data types, such as converting years to integers if needed.

forest_area_km = forest_area_km.astype({'1990': 'int64', '2021': 'int64'})
forest_area_percent = forest_area_percent.astype({'1990': 'float64', '2021': 'float64'})


In [11]:
#4.	Removing Duplicates:
#•	Ensuring there are no duplicate entries that might affect the analysis.

#Removing Duplicates:

forest_area_km.drop_duplicates(inplace=True)
forest_area_percent.drop_duplicates(inplace=True)


---
<a href=#five></a>
## **Exploratory Data Analysis (EDA)**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Explore and visualize the data to uncover patterns, trends, and relationships.
* **Details:** Use statistics and visualizations to explore the data. This may include histograms, box plots, scatter plots, and correlation matrices. Discuss any significant findings.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#six></a>
## **Modeling**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Develop and train predictive or statistical models.
* **Details:** Describe the choice of models, feature selection and engineering processes, and show how the models are trained. Include code for setting up the models and explanations of the model parameters.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#seven></a>
## **Evaluation and Validation**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Evaluate and validate the effectiveness and accuracy of the models.
* **Details:** Present metrics used to evaluate the models, such as accuracy, precision, recall, F1-score, etc. Discuss validation techniques employed, such as cross-validation or train/test split.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#eight></a>
## **Final Model**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Present the final model and its performance.
* **Details:** Highlight the best-performing model and discuss its configuration, performance, and why it was chosen over others.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#nine></a>
## **Conclusion and Future Work**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Summarize the findings and discuss future directions.
* **Details:** Conclude with a summary of the results, insights gained, limitations of the study, and suggestions for future projects or improvements in methodology or data collection.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#ten></a>
## **References**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Provide citations and sources of external content.
* **Details:** List all the references and sources consulted during the project, including data sources, research papers, and documentation for tools and libraries used.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

## Additional Sections to Consider

* ### Appendix: 
For any additional code, detailed tables, or extended data visualizations that are supplementary to the main content.

* ### Contributors: 
If this is a group project, list the contributors and their roles or contributions to the project.
