# Chapter py_12 
 Statistics for Data Science and Analytics<br>
by Peter C. Bruce, Peter Gedeck, Janet F. Dobbins

Publisher: Wiley; 1st edition (2024) <br>
<!-- ISBN-13: 978-3031075650 -->

(c) 2024 Peter C. Bruce, Peter Gedeck, Janet F. Dobbins

The code needs to be executed in sequence.

Python packages and Python itself change over time. This can cause warnings or errors. 
"Warnings" are for information only and can usually be ignored. 
"Errors" will stop execution and need to be fixed in order to get results. 

If you come across an issue with the code, please follow these steps

- Check the repository (https://gedeck.github.io/sdsa-code-solutions/) to see if the code has been upgraded. This might solve the problem.
- Report the problem using the issue tracker at https://github.com/gedeck/sdsa-code-solutions/issues
- Paste the error message into Google and see if someone else already found a solution

In [2]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

mower_df = pd.read_csv("RidingMowers.csv")
outcome = "Ownership"
predictors = ["Income", "Lot_Size"]

X = mower_df[predictors]
y = mower_df[outcome]

scaler = StandardScaler()
X_normalized = scaler.fit_transform(X)

In [3]:
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_normalized, y)

In [4]:
new_customer = pd.DataFrame({"Income": 60, "Lot_Size": 20},
                        index=["New customer"])
new_customer_normalized = scaler.transform(new_customer)
pred_class = model.predict(new_customer_normalized)
print(f'Class predicted for the the new customer: {pred_class[0]}')

Class predicted for the the new customer: Owner


In [5]:
pred_class = model.predict_proba(new_customer_normalized)
print(f'Class predicted for the the new customer: {pred_class[0]}')

Class predicted for the the new customer: [0.2 0.8]


In [6]:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(mower_df[outcome], model.predict(X_normalized))
print(f'Accuracy: {accuracy:.2f}')

Accuracy: 0.83


In [7]:
from sklearn.pipeline import Pipeline
model = Pipeline(steps=[
    ('normalize', StandardScaler()),
    ('kNN', KNeighborsClassifier(n_neighbors=5))
])
print(model)

Pipeline(steps=[('normalize', StandardScaler()),
                ('kNN', KNeighborsClassifier())])


In [8]:
model.fit(X, y)

In [9]:
pred_class = model.predict(new_customer)
print(f'Class predicted for the the new customer: {pred_class[0]}')

Class predicted for the the new customer: Owner


In [10]:
accuracy = accuracy_score(mower_df[outcome], model.predict(X))
print(f'Accuracy: {accuracy:.2f}')

Accuracy: 0.83
