# Product revenue prediction with the SPSS Machine Learning API

This Scala 2.10 notebook shows you how to create a predictive model by using IBM SPSS Algorithm on Apache Spark v1.6. You'll learn how to create an SPSS random tree model by using the IBM SPSS Machine Learning API, and how to view the model with IBM SPSS Model Viewer.

In this notebook, you'll create a model with telecommunications data to predict when its customers will leave for a competitor, so that you can take some action to retain the customer. We developped prediction model on Google Analytics dataset whit the summary statistics of the model. The dataset is about transactional from source (Google Analytics)of e-commerce site that we will use to build a predictive analysis of product revenue.
    
To get the most out of this notebook, you should have some familiarity with the Scala programming language.

## Contents 
This notebook contains the following main sections:

1. [Load the Web analytics Data with DSX  "Find and add Data" tab.](#overview)
1. [Build your model.](#prepare)
1. [Export the XML files (PMML, StatXML) for other detail statistics.](#analyze) 
1. [Visualization tool for Model visualizations.](#view) 

<a id="overview"></a>
## 1. Load the Web analytics Data with DSX  "Find and add Data" tab.

We have collected list of necessary variables from available Google analytics dataset for predictive model development, which are

    yitemrevenue – Item Revenue at Rs
    xcartadd – Numbers of instance added to cart
    xcartuniqadd – Numbers of unique instance added to cart
    xcartaddtotalrs –Total Rs value of products after they are added to cart
    xcartremove- Numbers of instances removed from cart
    xcardtremovetotal – Total numbers of instances removed from cart
    xcardtremovetotalrs – Total  Rs after numbers of instances removed from cart
    xproductviews – Numbers of page views
    xuniqprodview – Numbers of uniqe product views
    xprodviewinrs – Rs at total numbers of page views

In [9]:
// The code was removed by DSX for sharing.

+------------+--------+------------+---------------+-----------+-----------------+-------------------+-------------+-------------+-------------+
|yitemrevenue|xcartadd|xcartuniqadd|xcartaddtotalrs|xcartremove|xcardtremovetotal|xcardtremovetotalrs|xproductviews|xuniqprodview|xprodviewinrs|
+------------+--------+------------+---------------+-----------+-----------------+-------------------+-------------+-------------+-------------+
|    39215.93|     614|         503|         752186|         10|               10|              11990|        24306|        20498|          299|
|    23819.47|     833|         622|         425667|          8|                5|               3992|        11171|         8718|          571|
|     4415.45|     122|         101|         121878|          2|                2|               1998|         6926|         6017|      6919074|
|    57435.56|     239|         196|         609761|          2|                2|               4998|        11250|         9733|

<a id="prepare"></a>
## 2. Build your model.

After collecting the necessary data, we are ready to develop predictive model. Here, we require a response variable and explanatory variables for regression modeling. The element which is predicted is called the response variable. The variables by which we are going to predict the response variable are called the explanatory variables. In the next part, we will develop model with regression tool.

In [31]:
import com.ibm.spss.ml.classificationandregression.GeneralizedLinear
import com.ibm.spss.ml.classificationandregression.params.Effect

val linearRegression = GeneralizedLinear().
setInputFieldList(Array("xcartadd", "xcartuniqadd", "xcartaddtotalrs","xcartremove", "xcardtremovetotal","xcardtremovetotalrs", "xproductviews","xuniqprodview", "xprodviewinrs")).
setTargetField("yitemrevenue").
setDistribution("NORMAL").
setLinkFunction("IDENTITY").
setEffects(List(
  Effect(List("xcartadd"), List(0)),
  Effect(List("xcartuniqadd"), List(0)),
  Effect(List("xcartaddtotalrs"), List(0)),
  Effect(List("xcartremove"), List(0)),
  Effect(List("xcardtremovetotal"), List(0)),
  Effect(List("xcardtremovetotalrs"), List(0)),
  Effect(List("xproductviews"), List(0)),
  Effect(List("xuniqprodview"), List(0)),
  Effect(List("xprodviewinrs"), List(0))
  ))

val linearRegressionModel = linearRegression.fit(dfData1) 


val predictions1 = linearRegressionModel.transform(dfData1)

val predResultNew = predictions1.withColumn("prediction", predictions1("prediction").cast("double"))
predResultNew.show()

+------------+--------+------------+---------------+-----------+-----------------+-------------------+-------------+-------------+-------------+------------------+
|yitemrevenue|xcartadd|xcartuniqadd|xcartaddtotalrs|xcartremove|xcardtremovetotal|xcardtremovetotalrs|xproductviews|xuniqprodview|xprodviewinrs|        prediction|
+------------+--------+------------+---------------+-----------+-----------------+-------------------+-------------+-------------+-------------+------------------+
|    39215.93|     614|         503|         752186|         10|               10|              11990|        24306|        20498|          299|43788.412597407325|
|    23819.47|     833|         622|         425667|          8|                5|               3992|        11171|         8718|          571|22004.207857393754|
|     4415.45|     122|         101|         121878|          2|                2|               1998|         6926|         6017|      6919074| 8171.245497375272|
|    57435.56|  

<a id="analyze"></a>
### 3. Export the XML files (PMML, StatXML) for other detail statistics.
By exporting your results to different formats, such as Predictive Model Markup Language (PMML) or statXML format you can share your statistical analyses outside of IBM Data Science Experience.

In [40]:
import java.io.{File, PrintWriter}

linearRegressionModel.toPMML("lRM_pmml.xml")
val statXML = linearRegressionModel.statXML()
new PrintWriter("StatXML.xml") {
      write(statXML)
      close
}

$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anon$1@e79a25e7

<a id="view"></a>
### 4. Visualization tool for Model visualizations.

SPSS visualizations offer interactive tables and charts to help you evaluate and improve a predictive analytics model.

These SPSS visualizations provide one comprehensive set of output so that you don't need to create multiple charts and tables to determine model performance.


In [41]:
import com.ibm.spss.scala.ModelViewer

val html = ModelViewer.toHTML(linearRegressionModel)
kernel.magics.html(html)
 

0,1,2
Target Field,Target Field,yitemrevenue
Probability Distribution,Probability Distribution,Normal
Link Function,Link Function,Identity
Model Building Method,Model Building Method,Forced Entry
Number of Predictors Input,Number of Predictors Input,9
Number of Predictors in Final Model,Number of Predictors in Final Model,9
Model Type,Model Type,Linear Regression
Log LikeLihood [1]The full log likelihood function is displayed and used in computing information criteria.,Log LikeLihood [1]The full log likelihood function is displayed and used in computing information criteria.,-675965840.991
Deviance,Value,1351931512.897
Deviance,df,82

Records,Number,Percent
Included,92,100.0
Excluded,0,0.0
Total,92,100.0

Source,Type III,Type III,Type III
Source,Wald Chi-Square,df,Sig.
(Intercept),2328556.082,1,0.0
xcartadd,9869225.45,1,0.0
xcartuniqadd,17052520.102,1,0.0
xcartaddtotalrs,939438120.995,1,0.0
xcartremove,21114595.27,1,0.0
xcardtremovetotal,31196978.479,1,0.0
xcardtremovetotalrs,19597501.972,1,0.0
xproductviews,7545065.387,1,0.0
xuniqprodview,7973458.963,1,0.0
xprodviewinrs,22149177.089,1,0.0

Parameter,B,Std. Error,95% Wald Confidence Interval,95% Wald Confidence Interval,Hypothesis Test,Hypothesis Test,Hypothesis Test
Parameter,B,Std. Error,Lower,Upper,Wald Chi-Square,df,Sig.
(Intercept),-355.587,0.233,-356.044,-355.13,2328556.082,1.0,<0.0001
xcartadd,89.550,0.029,89.494,89.606,9869225.45,1.0,<0.0001
xcartuniqadd,-161.128,0.039,-161.204,-161.052,17052520.102,1.0,<0.0001
xcartaddtotalrs,0.101,0.0,0.101,0.101,939438120.995,1.0,<0.0001
xcartremove,597.766,0.13,597.511,598.021,21114595.27,1.0,<0.0001
xcardtremovetotal,-1096.090,0.196,-1096.474,-1095.705,31196978.479,1.0,<0.0001
xcardtremovetotalrs,-0.291,0.0,-0.292,-0.291,19597501.972,1.0,<0.0001
xproductviews,6.797,0.002,6.792,6.802,7545065.387,1.0,<0.0001
xuniqprodview,-7.940,0.003,-7.945,-7.934,7973458.963,1.0,<0.0001
xprodviewinrs,0.001,0.0,0.001,0.001,22149177.089,1.0,<0.0001
