### Loading data and libraries

In [18]:
import pandas as pd
tennis = pd.read_csv("tennis.csv")

In [19]:
tennis.head(n=14)

Unnamed: 0,outlook,temp,humidity,windy,play
0,sunny,hot,high,False,no
1,sunny,hot,high,True,no
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes
4,rainy,cool,normal,False,yes
5,rainy,cool,normal,True,no
6,overcast,cool,normal,True,yes
7,sunny,mild,high,False,no
8,sunny,cool,normal,False,yes
9,rainy,mild,normal,False,yes


##### We do not need to One Hot Encode this data, but Naive Bayes classifier still expects independent variables in the form of numerical values. So we will encode 0,1,2... to our categories using the LabelEncoder function

In [20]:

# sunny = 2, overcast = 0, rainy = 1
# hot = 1, mild = 2, cool = 0
# high = 0, normal = 1
# false = 0 , true = 1

from sklearn.preprocessing import LabelEncoder

enc = LabelEncoder()
tennis['outlook'] = enc.fit_transform(tennis['outlook'])
tennis['temp'] = enc.fit_transform(tennis['temp'])
tennis['humidity'] = enc.fit_transform(tennis['humidity'])
tennis['windy'] = enc.fit_transform(tennis['windy'])


In [21]:
tennis.head(n=14)

Unnamed: 0,outlook,temp,humidity,windy,play
0,2,1,0,0,no
1,2,1,0,1,no
2,0,1,0,0,yes
3,1,2,0,0,yes
4,1,0,1,0,yes
5,1,0,1,1,no
6,0,0,1,1,yes
7,2,2,0,0,no
8,2,0,1,0,yes
9,1,2,1,0,yes


##### The encoder will encode string values using alphabetical order, so for Outlook with values overcast, rainy, sunny, the respective encoded values become 0,1,2.

##### The data is now ready for Naive Bayes classifier. We first split the data into our independent and class variables.

In [22]:
X = tennis.iloc[:,0:4]
y = tennis.iloc[:,4]

##### We use the GaussianNB function to initialize and fit our model.

##### Because there are only 14 rows in tennis, I am using them all to train our model, and will create new data points to test the model.

In [23]:
from sklearn.naive_bayes import GaussianNB

model = GaussianNB()
model.fit(X,y)

GaussianNB(priors=None, var_smoothing=1e-09)

##### Now we can predict if you are going to play tennis or not, given some conditions.

* When outlook is sunny, temperature is cool, humidity is normal, and windy is false (2,0,1,0)
* When outlook is rainy, temperature is mild, humidity is high, and windy is true (1,2,0,1)
* When outlook is overcast, temperature is hot, humidity is high, and windy is true (0,1,0,1)
* When outlook is rainy, temperature is hot, humidity is normal, windy is false (1,1,1,0)

In [25]:
print(model.predict([[2,0,1,0]]))
print(model.predict([[1,2,0,1]]))
print(model.predict([[0,1,0,1]]))
print(model.predict([[1,1,1,0]]))

['yes']
['no']
['yes']
['yes']


##### Note that there is no data point in our training set that resembles the 4th test point, but still points to Yes for playing tennis. The data has 2 instances of rainy, normal humidity, and not 