SOSNEP03NB01

<img src="https://raw.githubusercontent.com/microsoft/dataexposed/main/graphics/sosn-white-new-very-small.jpg" alt="Logo" height="150">

## Regression Classification with R

Using a built-in data set consisting of 81 observations of four variables (Age, Number, Kyphosis, Start) in children following corrective spinal surgery, this is an example of R with the glm algorithm to do a prediction of a possible medical condition. The variable Kyphosis reports the absence or presence of this deformity.

[Docs Reference](https://docs.microsoft.com/en-us/machine-learning-server/r/how-to-revoscaler-logistic-regression)

In [12]:
EXECUTE sp_execute_external_script @language = N'R'
    , @script = N'
library(rpart)
dataresults <- rxLogit(Kyphosis ~ Age + Start + Number, data = kyphosis)
print(dataresults)
'

## Anomaly detection using SVM in Python

Using an example from the [SciKitLearn Python Package](https://scikit-learn.org/stable/auto_examples/svm/plot_separating_hyperplane.html#sphx-glr-auto-examples-svm-plot-separating-hyperplane-py), this example shows how to detect an anomaly in data using a Support Vector Machine. 

Another example: [https://docs.microsoft.com/en-us/machine-learning-server/python-reference/microsoftml/rx-oneclass-svm](https://docs.microsoft.com/en-us/machine-learning-server/python-reference/microsoftml/rx-oneclass-svm)

And another: [https://analyticsindiamag.com/understanding-the-basics-of-svm-with-example-and-python-implementation/](https://analyticsindiamag.com/understanding-the-basics-of-svm-with-example-and-python-implementation/)

In [32]:
EXECUTE sp_execute_external_script @language = N'Python'
    , @script = N'

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.datasets import make_blobs


# we create 40 separable points
X, y = make_blobs(n_samples=40, centers=2, random_state=6)

# fit the model, do not regularize for illustration purposes
clf = svm.SVC(kernel="linear", C=1000)
clf.fit(X, y)

plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)

# plot the decision function
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()

# create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)

# plot decision boundary and margins
ax.contour(XX, YY, Z, colors="k", levels=[-1, 0, 1], alpha=0.5,
           linestyles=["--", "-", "--"])
# plot support vectors
ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100,
           linewidth=1, facecolors="none", edgecolors="k")

plt.savefig("SOSONEP04PyPlot01.pdf") 
'

## Decision Trees in Python - Birth result predictions

Using the `infert` built-in dataset, this example uses the Microsoft `rx_fast_trees` library, which is an implementation of FastRank. FastRank is an efficient implementation of the MART gradient boosting algorithm. Gradient boosting is a machine learning technique for regression problems. It builds each regression tree in a step-wise fashion, using a predefined loss function to measure the error for each step and corrects for it in the next. So this prediction model is actually an ensemble of weaker prediction models. In regression problems, boosting builds a series of such trees in a step-wise fashion and then selects the optimal tree using an arbitrary differentiable loss function.

[Docs Reference](https://docs.microsoft.com/en-us/machine-learning-server/python-reference/microsoftml/rx-fast-trees)

In [3]:
EXECUTE sp_execute_external_script @language = N'Python'
    , @script = N'
import numpy
import pandas
from microsoftml import rx_fast_trees, rx_predict
from revoscalepy.etl.RxDataStep import rx_data_step
from microsoftml.datasets.datasets import get_dataset

infert = get_dataset("infert")

import sklearn
if sklearn.__version__ < "0.18":
    from sklearn.cross_validation import train_test_split
else:
    from sklearn.model_selection import train_test_split

infertdf = infert.as_df()
infertdf["isCase"] = infertdf.case == 1
data_train, data_test, y_train, y_test = train_test_split(infertdf, infertdf.isCase)

trees_model = rx_fast_trees(
    formula=" isCase ~ age + parity + education + spontaneous + induced ",
    data=data_train)
    
# RuntimeError: The type (RxTextData) for file is not supported.
score_ds = rx_predict(trees_model, data=data_test,
                     extra_vars_to_write=["isCase", "Score"])
                     
# Print the first five rows
print(rx_data_step(score_ds, number_rows_read=5))
'

## Naïve Bayes in Data Mining in R

As an ongoing promotional strategy, the marketing department for the Adventure Works Cycle company has decided to target potential customers by mailing out fliers. To reduce costs, they want to send fliers only to those customers who are likely to respond. The company stores information in a database about demographics and response to a previous mailing. They want to use this data to see how demographics such as age and location can help predict response to a promotion, by comparing potential customers to customers who have similar characteristics and who have purchased from the company in the past. Specifically, they want to see the differences between those customers who bought a bicycle and those customers who did not.

[Docs Reference](https://docs.microsoft.com/en-us/analysis-services/data-mining/microsoft-naive-bayes-algorithm?view=asallproducts-allversions)

In [None]:
EXECUTE sp_execute_external_script @language = N'R'
    , @script = N'
# Required Packages
# install.packages("e1071")
# install.packages("caTools")
# install.packages("caret")
  
# Loading package
library(e1071)
library(caTools)
library(caret)
  
# Split data into test/train
# and test data
split <- sample.split(iris, SplitRatio = 0.7)
train_cl <- subset(iris, split == "TRUE")
test_cl <- subset(iris, split == "FALSE")
  
# Feature Engineering
train_scale <- scale(train_cl[, 1:4])
test_scale <- scale(test_cl[, 1:4])
  
# Create Naive Bayes Model 
# to training dataset
set.seed(120)  # Setting Seed
classifier_cl <- naiveBayes(Species ~ ., data = train_cl)
classifier_cl
  
# Predict using test data
y_pred <- predict(classifier_cl, newdata = test_cl)
  
# Create a Confusion Matrix
cm <- table(test_cl$Species, y_pred)
cm
  
# Evauate the model
confusionMatrix(cm)
'

## Neural Networks preview

https://docs.microsoft.com/en-us/archive/blogs/mlserver/galaxy-classification-with-neural-networks-a-data-science-workflow