Skip to content

Commit

Permalink
[SPARK-13512][ML] add example and doc for MaxAbsScaler
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

jira: https://issues.apache.org/jira/browse/SPARK-13512
Add example and doc for ml.feature.MaxAbsScaler.

## How was this patch tested?
 unit tests

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes apache#11392 from hhbyyh/maxabsdoc.
  • Loading branch information
hhbyyh authored and roygao94 committed Mar 22, 2016
1 parent e079d5c commit 9a40f72
Show file tree
Hide file tree
Showing 3 changed files with 133 additions and 0 deletions.
32 changes: 32 additions & 0 deletions docs/ml-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -773,6 +773,38 @@ for more details on the API.
</div>
</div>


## MaxAbsScaler

`MaxAbsScaler` transforms a dataset of `Vector` rows, rescaling each feature to range [-1, 1]
by dividing through the maximum absolute value in each feature. It does not shift/center the
data, and thus does not destroy any sparsity.

`MaxAbsScaler` computes summary statistics on a data set and produces a `MaxAbsScalerModel`. The
model can then transform each feature individually to range [-1, 1].

The following example demonstrates how to load a dataset in libsvm format and then rescale each feature to [-1, 1].

<div class="codetabs">
<div data-lang="scala" markdown="1">

Refer to the [MaxAbsScaler Scala docs](api/scala/index.html#org.apache.spark.ml.feature.MaxAbsScaler)
and the [MaxAbsScalerModel Scala docs](api/scala/index.html#org.apache.spark.ml.feature.MaxAbsScalerModel)
for more details on the API.

{% include_example scala/org/apache/spark/examples/ml/MaxAbsScalerExample.scala %}
</div>

<div data-lang="java" markdown="1">

Refer to the [MaxAbsScaler Java docs](api/java/org/apache/spark/ml/feature/MaxAbsScaler.html)
and the [MaxAbsScalerModel Java docs](api/java/org/apache/spark/ml/feature/MaxAbsScalerModel.html)
for more details on the API.

{% include_example java/org/apache/spark/examples/ml/JavaMaxAbsScalerExample.java %}
</div>
</div>

## Bucketizer

`Bucketizer` transforms a column of continuous features to a column of feature buckets, where the buckets are specified by users. It takes a parameter:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.examples.ml;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
// $example on$
import org.apache.spark.ml.feature.MaxAbsScaler;
import org.apache.spark.ml.feature.MaxAbsScalerModel;
import org.apache.spark.sql.DataFrame;
// $example off$
import org.apache.spark.sql.SQLContext;

public class JavaMaxAbsScalerExample {

public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("JavaMaxAbsScalerExample");
JavaSparkContext jsc = new JavaSparkContext(conf);
SQLContext jsql = new SQLContext(jsc);

// $example on$
DataFrame dataFrame = jsql.read().format("libsvm").load("data/mllib/sample_libsvm_data.txt");
MaxAbsScaler scaler = new MaxAbsScaler()
.setInputCol("features")
.setOutputCol("scaledFeatures");

// Compute summary statistics and generate MaxAbsScalerModel
MaxAbsScalerModel scalerModel = scaler.fit(dataFrame);

// rescale each feature to range [-1, 1].
DataFrame scaledData = scalerModel.transform(dataFrame);
scaledData.show();
// $example off$
jsc.stop();
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

// scalastyle:off println
package org.apache.spark.examples.ml

import org.apache.spark.{SparkConf, SparkContext}
// $example on$
import org.apache.spark.ml.feature.MaxAbsScaler
// $example off$
import org.apache.spark.sql.SQLContext

object MaxAbsScalerExample {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("MaxAbsScalerExample")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)

// $example on$
val dataFrame = sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
val scaler = new MaxAbsScaler()
.setInputCol("features")
.setOutputCol("scaledFeatures")

// Compute summary statistics and generate MaxAbsScalerModel
val scalerModel = scaler.fit(dataFrame)

// rescale each feature to range [-1, 1]
val scaledData = scalerModel.transform(dataFrame)
scaledData.show()
// $example off$
sc.stop()
}
}
// scalastyle:on println

0 comments on commit 9a40f72

Please sign in to comment.