# StandardScaler
Standardizes features by scaling to unit variance and/or removing the mean using column summary statistics on the samples in the training set. This is a very common pre-processing step.  

For example, RBF kernel of Support Vector Machines or the L1 and L2 regularized linear models typically work better when all features have unit variance and/or zero mean.  

Standardization can improve the convergence rate during the optimization process, and also prevents against features with very large variances exerting an overly large influence during model training.  
## Model Fitting
StandardScaler has the following parameters in the constructor:  


* withMean False by default. Centers the data with mean before scaling. It will build a dense output, so this does not work on sparse input and will raise an exception.
* withStd True by default. Scales the data to unit standard deviation.  

We provide a fit method in StandardScaler which can take an input of RDD[Vector], learn the summary statistics, and then return a model which can transform the input dataset into unit standard deviation and/or zero mean features depending how we configure the StandardScaler.  

This model implements VectorTransformer which can apply the standardization on a Vector to produce a transformed Vector or on an RDD[Vector] to produce a transformed RDD[Vector].  

Note that if the variance of a feature is zero, it will return default 0.0 value in the Vector for that feature.
## Example
The example below demonstrates how to load a dataset in libsvm format, and standardize the features so that the new features have unit standard deviation and/or zero mean.

In [6]:
import org.apache.spark.SparkContext._
import org.apache.spark.mllib.feature.StandardScaler
import org.apache.spark.ml.feature.StandardScalerModel
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.util.MLUtils

val PATH = "file:///Users/lzz/work/SparkML/"

val data = MLUtils.loadLibSVMFile(sc, PATH + "data/mllib/sample_libsvm_data.txt")

val scaler1 = new StandardScaler().fit(data.map(x => x.features))
val scaler2 = new StandardScaler(withMean = true, withStd = true).fit(data.map(x => x.features))
// scaler3 is an identical model to scaler2, and will produce identical transformations
// val scaler3 = new StandardScalerModel(scaler2.std, scaler2.mean)

println( "std:" + scaler2.std + " mean:" + scaler2.mean )
// data1 will be unit variance.
val data1 = data.map(x => (x.label, scaler1.transform(x.features)))

// Without converting the features into dense vectors, transformation with zero mean will raise
// exception on sparse vector.
// data2 will be unit variance and zero mean.
val data2 = data.map(x => (x.label, scaler2.transform(Vectors.dense(x.features.toArray))))

std:[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.6,24.7,13.631051153728224,20.275822792157904,30.988787480231448,28.782015989927537,23.063557331159405,7.2123000996403865,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.5,25.3,14.568802284333465,18.522975086648636,32.801482782448616,76.14906367769666,91.7901229254361,92.08137376222942,93.26592587484954,99.85391147252889,103.88438406481146,93.08538493710432,68.16821440044805,31.291492050515505,5.547399062951607,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.200000000000001,24.1,26.015760568928176,34.24242746044501,43.51674355575069,67.13289323078035,97.88882582921413,110.02198311099818,