# Predicting Continents via Happines Report
<div class="alert alert-block alert-info" style="margin-top: 20px">
1. [Introduction and Data Import](#0)<br>
2. [Analysis](#1)<br>
3. [Train And Test](#2)
4. [Evaluation](#3)
5. [Visualize](#4)
<hr>


# Introduction and Data Import <a id="0"></a>

As you can see below DecisionTreeClassifier is imported. We will use it to make our predictions.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.tree import DecisionTreeClassifier

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

Reading the data and printing the first 5 rows of it. 

In [None]:
df=pd.read_csv("/kaggle/input/world-happiness/2015.csv")
df.head()

# Analysis and Edit of Dataframe <a id="1"></a>

Dropping out the unnecessary columns from the Dataframe 

In [None]:
df.drop(columns=["Country","Happiness Rank","Happiness Score"],inplace=True)
df.head()

Just to see how the continents are named in the Dataframe. If needed I will change them. 

In [None]:
df["Region"].value_counts()

There are 7 Continents in Eart but as you can see there are 10 different indexes,so I need to edit them and reduce it to 7.<br>
The code part below changes them.

In [None]:
my_list=list()
for index in df["Region"]:
    if "Africa" in index:
        my_list.append("Africa")
    elif "Europe" in index:
        my_list.append("Europe")
    elif "Antartica"in index:
        my_list.append("Antartica")
    elif "Australia"in index:
        my_list.append("Australia")
    elif "North America"in index:
        my_list.append("North America")
    elif "Latin America" in index:
        my_list.append("South America")
    elif "Asia" in index:
        my_list.append("Asia")
df["Region"]=my_list
df.head()

# Train And Test <a id="2"></a>

As mentioned in the topic of this Kernel, I want to try to predict the continent name using all other columns. So therefore X is a list of indexes of all other columns.

In [None]:
X = df[["Standard Error","Economy (GDP per Capita)","Family","Health (Life Expectancy)","Freedom","Trust (Government Corruption)","Generosity","Dystopia Residual"]].values
X[0:5]

And Y is a list that contains the values of the continents.

In [None]:
y = df["Region"].values
y[0:5]

Train and Test

In [None]:
from sklearn.model_selection import train_test_split

Now train_test_split will return 4 different parameters. We will name them:
X_trainset, X_testset, y_trainset, y_testset

The train_test_split will need the parameters:
X, y, test_size=0.3, and random_state=3.

The X and y are the arrays required before the split, the test_size represents the ratio of the testing dataset, and the random_state ensures that we obtain the same splits.

In [None]:
X_trainset, X_testset, y_trainset, y_testset = train_test_split(X, y, test_size=0.3, random_state=3)


Modeling
We will first create an instance of the DecisionTreeClassifier called ContinentTree.
Inside of the classifier, specify criterion="entropy" so we can see the information gain of each node.

In [None]:
ContinentTree = DecisionTreeClassifier(criterion="entropy", max_depth = 5)
ContinentTree # it shows the default parameters

Next, we will fit the data with the training feature matrix X_trainset and training response vector y_trainset

In [None]:
ContinentTree.fit(X_trainset,y_trainset)

I want to see depth of the tree

In [None]:
ContinentTree.get_depth()

Prediction
Let's make some predictions on the testing dataset and store it into a variable called predTree.

In [None]:
predTree = ContinentTree.predict(X_testset)

You can print out predTree and y_testset if you want to visually compare the prediction to the actual values.

In [None]:
print (predTree [0:5])
print (y_testset [0:5])

# Evaluation <a id="3"></a><br>
Next, let's import metrics from sklearn and check the accuracy of our model.

In [None]:
from sklearn import metrics
import matplotlib.pyplot as plt
print("DecisionTrees's Accuracy: ", metrics.accuracy_score(y_testset, predTree))

Pretty not accurate actually.

# Visualization <a id="4"></a><br>
Lets visualize the tree

In [None]:
!pip install pydotplus

In [None]:
from sklearn.externals.six import StringIO
import matplotlib.image as mpimg
from sklearn import tree
%matplotlib inline
import pydotplus

In [None]:
dot_data = StringIO()
filename = "drugtree.png"
featureNames = df.columns[1:]
targetNames = df["Region"].unique().tolist()
out=tree.export_graphviz(ContinentTree,feature_names=featureNames, out_file=dot_data, class_names= np.unique(y_trainset), filled=True,  special_characters=True,rotate=False)  
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())  
graph.write_png(filename)
img = mpimg.imread(filename)
plt.figure(figsize=(100, 200))
plt.imshow(img,interpolation='nearest')