# Sample R Notebook for dashDB Machine Learning - NaiveBAyes

Before running the notebook, insert credentials cell here. To do so click "Find and Add Data" at top right of the screen, then select "Connection" and select "Insert to code" for the dashDB system of your choice. Make sure you have a dashDB connection set up in your project beforehand.
<div> <img width = 370 height =286 src="https://ibm.box.com/shared/static/yc0airtlenm9ezywk3pigr453gkz3u1w.png"> </div>

Next the ibmdbR push down library for dashDB is loaded. It translates R data frame operations into SQLs and machine learning routines executed inside dashDB.

In [None]:
# Load the ibmdbR package and make a connection
library(ibmdbR)
con <- idaConnect(paste("DASHDB", credentials_1["dsn"], sep=";"),'','')
idaInit(con)

### Creating a proxy data frame
Creating an ida (in-database analytics) data frame for CUSTOMER_CHURN sample table. Data remains in dashDB.
Then print a small sample of the data in that table.

In [None]:
churnDf <- ida.data.frame("SAMPLES.CUSTOMER_CHURN")
head(churnDf, 10)

The data in this table can be used to train churn prediction by analyzing the correlation of two variables to churn: 1. Whether the customer is in a business-to-business (IN_B2B_INDUSTRY) industry, 2. Whether the number of products a customer bought (TOTAL_BUY) was less than three.

### Perform a few data transformations
These transformations transien and are not (and do not need to be) written back to the database.
Then print a sample of the tranformed data frame again.

In [None]:
# The CENSOR field is encoded as 0 or 1. Transform this to a string ('nochurn' or 'churn').
churnDf$CHURN <- ifelse(churnDf$CENSOR=='1','nochurn','churn')

# The IN_B2B_INDUSTRY field is also encoded as or 1. Transform this to a string ('no' or 'yes').
churnDf$IN_B2B <- ifelse(churnDf$IN_B2B_INDUSTRY=='1','yes','no')

# Transform the value of the TOTAL_BUY field to a discrete value ('threeormore' or 'lessthanthree').
churnDf$TOTALBUY <- ifelse(churnDf$TOTAL_BUY>2,'threeormore','lessthanthree')

head(churnDf)

### Now train a NaiveBayes classification model for churn/no_churn prediction

In [None]:
nb <- idaNaiveBayes(CHURN~IN_B2B+TOTALBUY,churnDf,"CUST_ID", modelname='customer_churn_predictor')

# Print the model
print(nb)

### List stored ML in dashDB
Check and find the just created ML model.

In [None]:
idaListModels()

### Run a test prediction
For simplicity we use the same data frame as for training. The results are written to a table in dashDB, which is represented by the data frame object returned.

Then print a sample of the predicted result table.

In [None]:
# Use the predict method to make predictions
preds <- predict(nb,churnDf,"CUST_ID")

head(preds)

In [None]:
# Close the connection to the database
idaClose(con)