# QCTO - Workplace Module

### Project Title: Workplace Project 
#### Done By: Mfana Nkabinde

© ExploreAI 2024

---

## Table of Contents

<a href=#BC> Background Context</a>

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Data Collection and Description</a>

<a href=#three>3. Loading Data </a>

<a href=#four>4. Data Cleaning and Filtering</a>

<a href=#five>5. Exploratory Data Analysis (EDA)</a>

<a href=#six>6. Modeling </a>

<a href=#seven>7. Evaluation and Validation</a>

<a href=#eight>8. Final Model</a>

<a href=#nine>9. Conclusion and Future Work</a>

<a href=#ten>10. References</a>

---
 <a id="BC"></a>
## **Background Context**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** The insurance industry in 
Africa, despite the continen 
being home to 17% of globl
population, plays a relativly
small role in the world ide
insurance market..
* **Details:** Include information about the problem domain, the specific questions or challenges the project aims to address, and any relevant background information that sets the stage for the work.
---

---
<a href=#one></a>
## **Importing Packages**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Set up the Python environment with necessary libraries and tools.
* **Details:** List and import all the Python packages that will be used throughout the project such as Pandas for data manipulation, Matplotlib/Seaborn for visualization, scikit-learn for modeling, etc.
---

In [1]:
#Please use code cells to code in and do not forget to comment your code.
import pandas                as pd 
import numpy                 as np
import matplotlib.pyplot     as plt
import seaborn as sns
from sklearn.preprocessing        import MinMaxScaler
from sklearn.model_selection      import train_test_split
from sklearn.linear_model         import LinearRegression
from sklearn.tree                 import DecisionTreeRegressor
from sklearn.ensemble             import RandomForestRegressor
from sklearn.ensemble             import StackingRegressor
from sklearn.metrics              import mean_squared_error, r2_score

---
<a href=#two></a>
## **Data Collection and Description**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Describe how the data was collected and provide an overview of its characteristics.
* **Details:** Mention sources of the data, the methods used for collection (e.g., APIs, web scraping, datasets from repositories), and a general description of the dataset including size, scope, and types of data available (e.g., numerical, categorical).
---

In [4]:
# General description of the dataset
def describe_dataset(df):
    description = {
        "Size": f"Rows: {df.shape[0]}, Columns: {df.shape[1]}",
        "Scope": "General description of the dataset's purpose and content",
        "Types of Data": df.dtypes.value_counts().to_dict()
    }
    return description

# Display the general description
description = describe_dataset(df)
print(description)

{'Size': 'Rows: 259, Columns: 34', 'Scope': "General description of the dataset's purpose and content", 'Types of Data': {dtype('float64'): 32, dtype('O'): 2}}


---
<a href=#three></a>
## **Loading Data**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Load the data into the notebook for manipulation and analysis.
* **Details:** Show the code used to load the data and display the first few rows to give a sense of what the raw data looks like.
---

In [3]:
#Please use code cells to code in and do not forget to comment your code.
df = pd.read_csv("C:/Users/nkabinmf/OneDrive - Vodafone Group/Data Science Course/forest_area_km.csv") #Load the data into the notebook
df = pd.DataFrame(df)
df.head() #display the first few rows to give a sense of what the raw data looks like 

Unnamed: 0,Country Name,Country Code,1990,1991,1992,1993,1994,1995,1996,1997,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Afghanistan,AFG,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,...,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4,12084.4
1,Albania,ALB,7888.0,7868.5,7849.0,7829.5,7810.0,7790.5,7771.0,7751.5,...,7849.17,7863.405,7877.64,7891.875,7891.8,7889.025,7889.0,7889.0,7889.0,7889.0
2,Algeria,DZA,16670.0,16582.0,16494.0,16406.0,16318.0,16230.0,16142.0,16054.0,...,19332.0,19408.0,19484.0,19560.0,19560.0,19430.0,19300.0,19390.0,19490.0,19583.333
3,American Samoa,ASM,180.7,180.36,180.02,179.68,179.34,179.0,178.66,178.32,...,173.7,173.4,173.1,172.8,172.5,172.2,171.9,171.6,171.3,171.0
4,Andorra,AND,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,...,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0


In [5]:
df.shape

(259, 34)

In [7]:
df = df.drop(index=[0, 2])
df.head()

Unnamed: 0,Country Name,Country Code,1990,1991,1992,1993,1994,1995,1996,1997,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
1,Albania,ALB,7888.0,7868.5,7849.0,7829.5,7810.0,7790.5,7771.0,7751.5,...,7849.17,7863.405,7877.64,7891.875,7891.8,7889.025,7889.0,7889.0,7889.0,7889.0
3,American Samoa,ASM,180.7,180.36,180.02,179.68,179.34,179.0,178.66,178.32,...,173.7,173.4,173.1,172.8,172.5,172.2,171.9,171.6,171.3,171.0
4,Andorra,AND,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,...,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0,160.0
5,Angola,AGO,792627.8,791073.63,789519.46,787965.29,786411.12,784856.95,783302.78,781748.61,...,710478.76,704928.14,699377.52,693826.9,688276.2,682725.7,677175.1,671624.4,666073.8,660523.133
6,Antigua and Barbuda,ATG,101.1,100.44,99.78,99.12,98.46,97.8,97.14,96.48,...,86.48,85.82,85.16,84.5,83.8,83.2,82.5,81.8,81.2,80.533


---
<a href=#four></a>
## **Data Cleaning and Filtering**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Prepare the data for analysis by cleaning and filtering.
* **Details:** Include steps for handling missing values, removing outliers, correcting errors, and possibly reducing the data (filtering based on certain criteria or features).
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.
df.tail() #checking the last few rown for data preparation.

In [8]:
df.dtypes

Country Name     object
Country Code     object
1990            float64
1991            float64
1992            float64
1993            float64
1994            float64
1995            float64
1996            float64
1997            float64
1998            float64
1999            float64
2000            float64
2001            float64
2002            float64
2003            float64
2004            float64
2005            float64
2006            float64
2007            float64
2008            float64
2009            float64
2010            float64
2011            float64
2012            float64
2013            float64
2014            float64
2015            float64
2016            float64
2017            float64
2018            float64
2019            float64
2020            float64
2021            float64
dtype: object

In [None]:
df.describe()

In [17]:
df['Country Name'].unique().size

257

In [12]:
Area_Rows=df.groupby('Country Name').count()
Area_Rows

Unnamed: 0_level_0,Country Code,1990,1991,1992,1993,1994,1995,1996,1997,1998,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Africa Eastern and Southern,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
Africa Western and Central,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
Albania,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
American Samoa,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
Andorra,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
West Bank and Gaza,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
World,1,0,0,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
"Yemen, Rep.",1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
Zambia,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1


In [None]:
df.isnull().any() #Checking null values 

In [None]:
df_filled = df.fillna(0)

df_filled.head() #Removing null values and replacing with 0

In [None]:
df_filled.isnull().any() #checking if there are null values left.

---
<a href=#five></a>
## **Exploratory Data Analysis (EDA)**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Explore and visualize the data to uncover patterns, trends, and relationships.
* **Details:** Use statistics and visualizations to explore the data. This may include histograms, box plots, scatter plots, and correlation matrices. Discuss any significant findings.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.
print(df_filled.describe()) #Generate summary statistics to get an overview of the dataset.
print(df_filled.info())

In [None]:
# Transpose the DataFrame
df_transposed = df.T
df_dropped_rows = df_transposed.drop(index=[0, 2])

# Display the transposed DataFrame
print(df_dropped_rows)

In [None]:
# Histograms for numerical features
df_filled.hist(figsize=(10, 8))
plt.show()

In [None]:
counts_area = df['Country Name'].value_counts()
print(counts_area)

In [None]:
# Correlation matrix
corr_matrix = df_filled.corr()

# Heatmap of the correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.show()

---
<a href=#six></a>
## **Modeling**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Develop and train predictive or statistical models.
* **Details:** Describe the choice of models, feature selection and engineering processes, and show how the models are trained. Include code for setting up the models and explanations of the model parameters.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#seven></a>
## **Evaluation and Validation**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Evaluate and validate the effectiveness and accuracy of the models.
* **Details:** Present metrics used to evaluate the models, such as accuracy, precision, recall, F1-score, etc. Discuss validation techniques employed, such as cross-validation or train/test split.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#eight></a>
## **Final Model**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Present the final model and its performance.
* **Details:** Highlight the best-performing model and discuss its configuration, performance, and why it was chosen over others.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#nine></a>
## **Conclusion and Future Work**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Summarize the findings and discuss future directions.
* **Details:** Conclude with a summary of the results, insights gained, limitations of the study, and suggestions for future projects or improvements in methodology or data collection.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#ten></a>
## **References**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Provide citations and sources of external content.
* **Details:** List all the references and sources consulted during the project, including data sources, research papers, and documentation for tools and libraries used.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

## Additional Sections to Consider

* ### Appendix: 
For any additional code, detailed tables, or extended data visualizations that are supplementary to the main content.

* ### Contributors: 
If this is a group project, list the contributors and their roles or contributions to the project.
