**Numerical Analysis**

After gaining insight from the cdp data, we use the  aquastat data from Food and Agricultural Organization United Nations to solve the research questions.

**Import libraries**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from sklearn.metrics import accuracy_score
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn import naive_bayes
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler

**Import the dataset**

In [None]:
aquastat = pd.read_csv('https://mda-project-poland.s3.eu-west-3.amazonaws.com/aquastat.csv') 
aquastat.set_index('Country', inplace = True)
aquastat.head()

**Data Description**

The data that you have to analyze consists of the following variables:
(We can  explain every variable here)

# **Data Preprocessing and exploration**

In [None]:
aquastat.shape

Data set has 200 rows and 16 columns

View the descriptive statistics of weight of container. We can do this for any numerical variable.

In [None]:
#Data types
datadict = pd.DataFrame(aquastat.dtypes)
#Missing values
datadict['MissingVal'] = aquastat.isnull().sum()
#Unique values
datadict['NUnique']=aquastat.nunique()
#Count of variable
datadict['Count']=aquastat.count()
#Rename 0 to datatype
datadict = datadict.rename(columns={0:'DataType'})
datadict

In [None]:
aquastat=aquastat.dropna()
aquastat.shape

# ***Exploratory Data Analysis***

**Water Stress:SDG 6.4.2. Water Stress (%)**

From the initial analysis of CDP data, we found that the Water Stress is the biggest problem in the world.
So we are going to explore more about this variable.
According to FAO, A threshold of 25 percent has been identified as the upper limit for the full and unconditional safety of water stress as assessed by indicator 6.4.2. Water Stress can be categorized in following groups:
NO STRESS <25%,
LOW 25–50%,
MEDIUM 50–75%,
HIGH 75–100%.
CRITICAL >100%.

In [None]:
aquastat['SDG 6.4.2. Water Stress (%)'].describe()

In [None]:
#Group the water stress level according to FAO
def getlevel(x):
    if x["SDG 6.4.2. Water Stress (%)"] <25:
        return "NO STRESS"
    elif x["SDG 6.4.2. Water Stress (%)"] <50:
        return "LOW"
    elif x["SDG 6.4.2. Water Stress (%)"] <75:
        return "MEDIUM"
    elif x["SDG 6.4.2. Water Stress (%)"] <100:
        return "HIGH"
    else:
        return "CRITICAL"
aquastat.loc[:,"water stress level"] = aquastat.apply(getlevel,axis=1)


In [None]:
df_stress = aquastat.groupby("water stress level").size()
df_stress= pd.DataFrame(df_stress)
df_stress.columns = ["count"]
df_stress = df_stress.sort_values(by = ['count'], ascending = False)
df_stress = df_stress.reset_index()
df_stress.head()

In [None]:
g = sns.catplot(
    data=df_stress, kind="bar", x = "water stress level", y="count", 
    ci="sd", palette="icefire", alpha=.6, height=6)
g.set_xticklabels(rotation=53)
g.set_axis_labels( "water stress level", "Count")

**Visualise the distribution of water stress on the world map**

In [None]:
pip install echarts-countries-pypkg

In [None]:
import pyecharts

In [None]:
from pyecharts.charts import Map
from pyecharts import options as opts

In [None]:
countries= list(aquastat.index)
stress = list(aquastat["SDG 6.4.2. Water Stress (%)"])
list = [list(z) for z in zip(countries,stress)]

In [None]:
c = (
    Map(init_opts=opts.InitOpts(width="1000px", height="600px")) 
    .set_global_opts(
        title_opts=opts.TitleOpts(title="Distribution of Water Stress, unit=%"),
        visualmap_opts=opts.VisualMapOpts(
            min_=0,
            max_=175,
            range_text = ['Water Stress Level:', ''],  
            is_piecewise=True,  
            pos_top= "middle",  
            pos_left="left",
            orient="vertical",
            split_number=7 
        )
    )
    .add("stress",list,maptype="world")
    .render("Map1.html")
)


**Open the html link can see the distribution of water stress of the world.**

# **Modelling**

We are going to explore the facrors that influence the water stress.

In [None]:
#Check the correlation\s
df=aquastat.iloc[:, :-1]
corr = df.corr()
fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(111)
cax = ax.matshow(corr,cmap='coolwarm', vmin=-1, vmax=1)
fig.colorbar(cax)
ticks = np.arange(0,len(df.columns),1)
ax.set_xticks(ticks)
plt.xticks(rotation=90)
ax.set_yticks(ticks)
ax.set_xticklabels(df.columns)
ax.set_yticklabels(df.columns)
plt.show()