<a href="https://colab.research.google.com/github/richeym-umich/ml-tutorials/blob/main/Machine_Learning_Tutorial_Getting_Started_with_SciKitLearn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

SciKitLearn is an important Python package for implementing machine learning. It comes with several pre-configured algorithms, as well as some pre-trained models. To make use of the most basic SciKitLearn machine learning algorithms, the user only needs to import the required packages and load their own data. In this tutorial, we will demonstrate loading and using several pre-built machine learning algorithms. 

# Classification
First, we will demonstrate a classification example. In a classification problem, the user is trying to answer a question, where the answer can be one of any number of categories. In this example, we will ask the question whether a baseball team should play a game based on certain weather conditions.

We will load data about past games' weather and temperature conditions, as well as whether the team played the game. We will then train a classifier on this information to teach it how to make the determination of whether or not the team should play. Finally, we will test the classifier by giving it new conditions and seeing how it answers the question. 

We are going to use the K nearest neighbors classifier from SciKitLearn, as well as the preprocessing package to put our data in the correct format. First, we must import the necessary packages. 

In [None]:
from sklearn import preprocessing
from sklearn.neighbors import KNeighborsClassifier

Next, we need to organize the known data into features. The following arrays represent actual weather conditions and whether or not the team played in those conditions. Weather and temperature are considered our features, while play is considered our "target", or answer.

In [None]:
# Assigning features and label variables
# First Feature
weather=['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast','Sunny','Sunny',
'Rainy','Sunny','Overcast','Overcast','Rainy']
# Second Feature
temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mild','Mild','Hot','Mild']

# Label or target varible
play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','No']

Next, we need to numerically represent our data so that it can be fed as inputs to our classifier. A Label Encoder will change our string representations to numerical representations.

In [None]:
#creating labelEncoder
le = preprocessing.LabelEncoder()
# Converting string labels into numbers.
weather_encoded=le.fit_transform(weather)
print(weather_encoded)

We convert our features and lables into numerical representations using our label encoder

In [None]:
# converting string labels into numbers
temp_encoded=le.fit_transform(temp)
label=le.fit_transform(play)

Then, we combine our features into a single list. You're able to combine as many features as you'd like into your features list to contribute to the overall determination.

In [None]:
#combining weather and temp into single listof tuples
features=list(zip(weather_encoded,temp_encoded))

Now that we have our features ready, we can create our model

In [None]:
model = KNeighborsClassifier(n_neighbors=3)

# Train the model using the training sets
model.fit(features,label)

Finally, let's try using our model to predict whether the team should play in certain conditions

In [None]:
predicted= model.predict([[0,2]]) # 0:Overcast, 2:Mild
print("Should the team play in these conditions?: " + str(le.inverse_transform(predicted)))