Skip to content

Commit

Permalink
Merge branch 'master' into housing
Browse files Browse the repository at this point in the history
  • Loading branch information
leahmcguire committed Jul 15, 2019
2 parents 6e92ff4 + 23b6e91 commit 9f48ecb
Show file tree
Hide file tree
Showing 40 changed files with 903 additions and 316 deletions.
40 changes: 40 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,45 @@
# Changelog

## 0.6.0

Bug fixes:
- Quick Fix Alias Type Names [#346](https://github.com/salesforce/TransmogrifAI/pull/346)
- Forecast Evaluator - fixes SMAPE, adds MASE and Seasonal Error metrics [#342](https://github.com/salesforce/TransmogrifAI/pull/342)

New features / updates:
- Aggregate LOCOs of DateToUnitCircleTransformer. [#349](https://github.com/salesforce/TransmogrifAI/pull/349)
- Convert lambda functions into concrete classes to allow compatibility with Scala 2.12 [#357](https://github.com/salesforce/TransmogrifAI/pull/357)
- Replace mapValues with immutable Map where applicable [#363](https://github.com/salesforce/TransmogrifAI/pull/363)
- Aggregate spark metrics during run time instead of post processing by default [#358](https://github.com/salesforce/TransmogrifAI/pull/358)
- Allow customizing serialization for FeatureGenerator extract function [#352](https://github.com/salesforce/TransmogrifAI/pull/352)
- Update helloworld examples to be simple [#351](https://github.com/salesforce/TransmogrifAI/pull/351)
- Adding `key` ctor field in all RawFeatureFilter results [#348](https://github.com/salesforce/TransmogrifAI/pull/348)
- Forecast evaluator + SMAPE metric [#337](https://github.com/salesforce/TransmogrifAI/pull/337)
- Local scoring for model with features of all types [#340](https://github.com/salesforce/TransmogrifAI/pull/340)
- Remove local runner + update docs [#335](https://github.com/salesforce/TransmogrifAI/pull/335)
- Added missing test for java conversions [#334](https://github.com/salesforce/TransmogrifAI/pull/334)
- Get rid of scalaj-collections [#333](https://github.com/salesforce/TransmogrifAI/pull/333)
- Workflow independent model loading [#274](https://github.com/salesforce/TransmogrifAI/pull/274)
- Aggregated LOCOs of SmartTextVectorizer outputs [#308](https://github.com/salesforce/TransmogrifAI/pull/308)
- Added community projects docs section [#326](https://github.com/salesforce/TransmogrifAI/pull/326)
- Add FeatureBuilder.fromSchema [#325](https://github.com/salesforce/TransmogrifAI/pull/325)
- Improve WeekOfMonth in date transformers [#323](https://github.com/salesforce/TransmogrifAI/pull/323)
- Improved datetime unit transformer shortcuts - Part 2 [#319](https://github.com/salesforce/TransmogrifAI/pull/319)
- Correctly pass main class for CLI sub project [#321](https://github.com/salesforce/TransmogrifAI/pull/321)
- Serialize blacklisted map keys with the model + updated access on workflow/model members [#320](https://github.com/salesforce/TransmogrifAI/pull/320)
- Improved datetime unit transformer shortcuts [#316](https://github.com/salesforce/TransmogrifAI/pull/316)
- Improved OpScalarStandardScalerTest [#317](https://github.com/salesforce/TransmogrifAI/pull/317)
- improved PercentileCalibratorTest [#318](https://github.com/salesforce/TransmogrifAI/pull/318)
- Added concrete wrappers for HashingTF, NGram and StopWordsRemover [#314](https://github.com/salesforce/TransmogrifAI/pull/314)
- Avoid singleton random generators [#312](https://github.com/salesforce/TransmogrifAI/pull/312)
- Remove free function aggregation with feature builders [#311](https://github.com/salesforce/TransmogrifAI/pull/311)
- Added util methods to create class/object by name + retrieve type tag by type name [#310](https://github.com/salesforce/TransmogrifAI/pull/310)

Dependency updates:
- Bump shadowjar plugin to 5.0.0 [#306](https://github.com/salesforce/TransmogrifAI/pull/306)
- Bump Apache Tika to 1.21 [#331](https://github.com/salesforce/TransmogrifAI/pull/331)
- Enable CicleCI version 2.1 [#353](https://github.com/salesforce/TransmogrifAI/pull/353)

## 0.5.3

Bug fixes:
Expand Down
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# TransmogrifAI

[![Maven Central](https://img.shields.io/maven-central/v/com.salesforce.transmogrifai/transmogrifai-core_2.11.svg?colorB=blue)](https://search.maven.org/search?q=g:com.salesforce.transmogrifai) [![Download](https://api.bintray.com/packages/salesforce/maven/TransmogrifAI/images/download.svg)](https://bintray.com/salesforce/maven/TransmogrifAI/_latestVersion) [![Javadocs](https://www.javadoc.io/badge/com.salesforce.transmogrifai/transmogrifai-core_2.11/0.5.3.svg?color=blue)](https://www.javadoc.io/doc/com.salesforce.transmogrifai/transmogrifai-core_2.11/0.5.3) [![Spark version](https://img.shields.io/badge/spark-2.3-brightgreen.svg)](https://spark.apache.org/downloads.html) [![Scala version](https://img.shields.io/badge/scala-2.11-brightgreen.svg)](https://www.scala-lang.org/download/2.11.12.html) [![License](http://img.shields.io/:license-BSD--3-blue.svg)](./LICENSE) [![Chat](https://badges.gitter.im/salesforce/TransmogrifAI.svg)](https://gitter.im/salesforce/TransmogrifAI?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![Maven Central](https://img.shields.io/maven-central/v/com.salesforce.transmogrifai/transmogrifai-core_2.11.svg?colorB=blue)](https://search.maven.org/search?q=g:com.salesforce.transmogrifai) [![Download](https://api.bintray.com/packages/salesforce/maven/TransmogrifAI/images/download.svg)](https://bintray.com/salesforce/maven/TransmogrifAI/_latestVersion) [![Javadocs](https://www.javadoc.io/badge/com.salesforce.transmogrifai/transmogrifai-core_2.11/0.6.0.svg?color=blue)](https://www.javadoc.io/doc/com.salesforce.transmogrifai/transmogrifai-core_2.11/0.6.0) [![Spark version](https://img.shields.io/badge/spark-2.3-brightgreen.svg)](https://spark.apache.org/downloads.html) [![Scala version](https://img.shields.io/badge/scala-2.11-brightgreen.svg)](https://www.scala-lang.org/download/2.11.12.html) [![License](http://img.shields.io/:license-BSD--3-blue.svg)](./LICENSE) [![Chat](https://badges.gitter.im/salesforce/TransmogrifAI.svg)](https://gitter.im/salesforce/TransmogrifAI?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

[![TravisCI Build Status](https://travis-ci.com/salesforce/TransmogrifAI.svg?token=Ex9czVEUD7AzPTmVh6iX&branch=master)](https://travis-ci.com/salesforce/TransmogrifAI) [![CircleCI Build Status](https://circleci.com/gh/salesforce/TransmogrifAI.svg?&style=shield&circle-token=e84c1037ae36652d38b49207728181ee85337e0b)](https://circleci.com/gh/salesforce/TransmogrifAI) [![Documentation Status](https://readthedocs.org/projects/transmogrifai/badge/?version=stable)](https://docs.transmogrif.ai/en/stable/?badge=stable) [![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/2557/badge)](https://bestpractices.coreinfrastructure.org/projects/2557) [![Codecov](https://codecov.io/gh/salesforce/TransmogrifAI/branch/master/graph/badge.svg)](https://codecov.io/gh/salesforce/TransmogrifAI) [![CodeFactor](https://www.codefactor.io/repository/github/salesforce/transmogrifai/badge)](https://www.codefactor.io/repository/github/salesforce/transmogrifai)

Expand Down Expand Up @@ -128,8 +128,8 @@ Start by picking TransmogrifAI version to match your project dependencies from t

| TransmogrifAI Version | Spark Version | Scala Version | Java Version |
|-------------------------------------------------|:-------------:|:-------------:|:------------:|
| 0.6.0 (unreleased, master) | 2.3 | 2.11 | 1.8 |
| **0.5.3 (stable)**, 0.5.2, 0.5.1, 0.5.0 | **2.3** | **2.11** | **1.8** |
| 0.6.1 (unreleased, master) | 2.3 | 2.11 | 1.8 |
| **0.6.0 (stable)**, 0.5.3, 0.5.2, 0.5.1, 0.5.0 | **2.3** | **2.11** | **1.8** |
| 0.4.0, 0.3.4 | 2.2 | 2.11 | 1.8 |

For Gradle in `build.gradle` add:
Expand All @@ -140,10 +140,10 @@ repositories {
}
dependencies {
// TransmogrifAI core dependency
compile 'com.salesforce.transmogrifai:transmogrifai-core_2.11:0.5.3'
compile 'com.salesforce.transmogrifai:transmogrifai-core_2.11:0.6.0'
// TransmogrifAI pretrained models, e.g. OpenNLP POS/NER models etc. (optional)
// compile 'com.salesforce.transmogrifai:transmogrifai-models_2.11:0.5.3'
// compile 'com.salesforce.transmogrifai:transmogrifai-models_2.11:0.6.0'
}
```

Expand All @@ -154,10 +154,10 @@ scalaVersion := "2.11.12"
resolvers += Resolver.jcenterRepo

// TransmogrifAI core dependency
libraryDependencies += "com.salesforce.transmogrifai" %% "transmogrifai-core" % "0.5.3"
libraryDependencies += "com.salesforce.transmogrifai" %% "transmogrifai-core" % "0.6.0"

// TransmogrifAI pretrained models, e.g. OpenNLP POS/NER models etc. (optional)
// libraryDependencies += "com.salesforce.transmogrifai" %% "transmogrifai-models" % "0.5.3"
// libraryDependencies += "com.salesforce.transmogrifai" %% "transmogrifai-models" % "0.6.0"
```

Then import TransmogrifAI into your code:
Expand Down
14 changes: 10 additions & 4 deletions core/src/main/scala/com/salesforce/op/dsl/RichDateFeature.scala
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ trait RichDateFeature {
f.transformWith(
new UnaryLambdaTransformer[Date, DateList](
operationName = "dateToList",
RichDateFeatureLambdas.toDateList
new RichDateFeatureLambdas.ToDateList
)
)
}
Expand Down Expand Up @@ -137,7 +137,7 @@ trait RichDateFeature {
f.transformWith(
new UnaryLambdaTransformer[DateTime, DateTimeList](
operationName = "dateTimeToList",
RichDateFeatureLambdas.toDateTimeList
new RichDateFeatureLambdas.ToDateTimeList
)
)
}
Expand Down Expand Up @@ -204,7 +204,13 @@ trait RichDateFeature {
}

object RichDateFeatureLambdas {
def toDateList: Date => DateList = (x: Date) => x.value.toSeq.toDateList

def toDateTimeList: DateTime => DateTimeList = (x: DateTime) => x.value.toSeq.toDateTimeList
class ToDateList extends Function1[Date, DateList] with Serializable {
def apply(v: Date): DateList = v.value.toSeq.toDateList
}

class ToDateTimeList extends Function1[Date, DateTimeList] with Serializable {
def apply(v: Date): DateTimeList = v.value.toSeq.toDateTimeList
}

}
20 changes: 13 additions & 7 deletions core/src/main/scala/com/salesforce/op/dsl/RichMapFeature.scala
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@

package com.salesforce.op.dsl

import com.salesforce.op.dsl.RichMapFeatureLambdas._
import com.salesforce.op.features.FeatureLike
import com.salesforce.op.features.types._
import com.salesforce.op.stages.impl.feature._
Expand Down Expand Up @@ -1098,9 +1097,10 @@ trait RichMapFeature {
* @return prediction, rawPrediction, probability
*/
def tupled(): (FeatureLike[RealNN], FeatureLike[OPVector], FeatureLike[OPVector]) = {
(f.map[RealNN](predictionToRealNN),
f.map[OPVector](predictionToRaw),
f.map[OPVector](predictionToProbability)
import RichMapFeatureLambdas._
(f.map[RealNN](new PredictionToRealNN),
f.map[OPVector](new PredictionToRaw),
f.map[OPVector](new PredictionToProbability)
)
}

Expand All @@ -1121,11 +1121,17 @@ trait RichMapFeature {

object RichMapFeatureLambdas {

def predictionToRealNN: Prediction => RealNN = _.prediction.toRealNN
class PredictionToRealNN extends Function1[Prediction, RealNN] with Serializable {
def apply(p: Prediction): RealNN = p.prediction.toRealNN
}

def predictionToRaw: Prediction => OPVector = p => Vectors.dense(p.rawPrediction).toOPVector
class PredictionToRaw extends Function1[Prediction, OPVector] with Serializable {
def apply(p: Prediction): OPVector = Vectors.dense(p.rawPrediction).toOPVector
}

def predictionToProbability: Prediction => OPVector = p => Vectors.dense(p.probability).toOPVector
class PredictionToProbability extends Function1[Prediction, OPVector] with Serializable {
def apply(p: Prediction): OPVector = Vectors.dense(p.probability).toOPVector
}

}

Expand Down
56 changes: 38 additions & 18 deletions core/src/main/scala/com/salesforce/op/dsl/RichTextFeature.scala
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ import com.salesforce.op.stages.impl.feature._
import com.salesforce.op.utils.text._

import scala.reflect.runtime.universe.TypeTag


trait RichTextFeature {
self: RichFeature =>

Expand All @@ -48,7 +50,7 @@ trait RichTextFeature {
*
* @return A new MultiPickList feature
*/
def toMultiPickList: FeatureLike[MultiPickList] = f.map[MultiPickList](textToMultiPickList)
def toMultiPickList: FeatureLike[MultiPickList] = f.map[MultiPickList](new TextToMultiPickList)


/**
Expand Down Expand Up @@ -560,14 +562,14 @@ trait RichTextFeature {
*
* @return email prefix
*/
def toEmailPrefix: FeatureLike[Text] = f.map[Text](emailToPrefix, "prefix")
def toEmailPrefix: FeatureLike[Text] = f.map[Text](new EmailPrefixToText, "prefix")

/**
* Extract email domains
*
* @return email domain
*/
def toEmailDomain: FeatureLike[Text] = f.map[Text](emailToDomain, "domain")
def toEmailDomain: FeatureLike[Text] = f.map[Text](new EmailDomainToText, "domain")

/**
* Check if email is valid
Expand Down Expand Up @@ -600,7 +602,7 @@ trait RichTextFeature {
others: Array[FeatureLike[Email]] = Array.empty,
maxPctCardinality: Double = OpOneHotVectorizer.MaxPctCardinality
): FeatureLike[OPVector] = {
val domains = (f +: others).map(_.map[PickList](emailToPickList))
val domains = (f +: others).map(_.map[PickList](new EmailDomainToPickList))
domains.head.pivot(others = domains.tail, topK = topK, minSupport = minSupport, cleanText = cleanText,
trackNulls = trackNulls, maxPctCardinality = maxPctCardinality
)
Expand All @@ -613,19 +615,19 @@ trait RichTextFeature {
/**
* Extract url domain, i.e. salesforce.com, data.com etc.
*/
def toDomain: FeatureLike[Text] = f.map[Text](urlToDomain, "urlDomain")
def toDomain: FeatureLike[Text] = f.map[Text](new URLDomainToText, "urlDomain")

/**
* Extracts url protocol, i.e. http, https, ftp etc.
*/
def toProtocol: FeatureLike[Text] = f.map[Text](urlToProtocol, "urlProtocol")
def toProtocol: FeatureLike[Text] = f.map[Text](new URLProtocolToText, "urlProtocol")

/**
* Verifies if the url is of correct form of "Uniform Resource Identifiers (URI): Generic Syntax"
* RFC2396 (http://www.ietf.org/rfc/rfc2396.txt)
* Default valid protocols are: http, https, ftp.
*/
def isValidUrl: FeatureLike[Binary] = f.exists(urlIsValid)
def isValidUrl: FeatureLike[Binary] = f.exists(new URLIsValid)

/**
* Converts a sequence of [[URL]] features into a vector, extracting the domains of the valid urls
Expand All @@ -650,7 +652,7 @@ trait RichTextFeature {
others: Array[FeatureLike[URL]] = Array.empty,
maxPctCardinality: Double = OpOneHotVectorizer.MaxPctCardinality
): FeatureLike[OPVector] = {
val domains = (f +: others).map(_.map[PickList](urlToPickList))
val domains = (f +: others).map(_.map[PickList](new URLDomainToPickList))
domains.head.pivot(others = domains.tail, topK = topK, minSupport = minSupport, cleanText = cleanText,
trackNulls = trackNulls, maxPctCardinality = maxPctCardinality
)
Expand Down Expand Up @@ -697,7 +699,7 @@ trait RichTextFeature {
): FeatureLike[OPVector] = {

val feats: Array[FeatureLike[PickList]] =
(f +: others).map(_.detectMimeTypes(typeHint).map[PickList](textToPickList))
(f +: others).map(_.detectMimeTypes(typeHint).map[PickList](new TextToPickList))

feats.head.vectorize(
topK = topK, minSupport = minSupport, cleanText = cleanText, trackNulls = trackNulls, others = feats.tail,
Expand Down Expand Up @@ -801,22 +803,40 @@ trait RichTextFeature {

object RichTextFeatureLambdas {

def emailToPickList: Email => PickList = _.domain.toPickList
class EmailDomainToPickList extends Function1[Email, PickList] with Serializable {
def apply(v: Email): PickList = v.domain.toPickList
}

def emailToPrefix: Email => Text = _.prefix.toText
class EmailDomainToText extends Function1[Email, Text] with Serializable {
def apply(v: Email): Text = v.domain.toText
}

def emailToDomain: Email => Text = _.domain.toText
class EmailPrefixToText extends Function1[Email, Text] with Serializable {
def apply(v: Email): Text = v.prefix.toText
}

def urlToPickList: URL => PickList = (v: URL) => if (v.isValid) v.domain.toPickList else PickList.empty
class URLDomainToPickList extends Function1[URL, PickList] with Serializable {
def apply(v: URL): PickList = if (v.isValid) v.domain.toPickList else PickList.empty
}

def urlToDomain: URL => Text = _.domain.toText
class URLDomainToText extends Function1[URL, Text] with Serializable {
def apply(v: URL): Text = v.domain.toText
}

def urlToProtocol: URL => Text = _.protocol.toText
class URLProtocolToText extends Function1[URL, Text] with Serializable {
def apply(v: URL): Text = v.protocol.toText
}

def urlIsValid: URL => Boolean = _.isValid
class URLIsValid extends Function1[URL, Boolean] with Serializable {
def apply(v: URL): Boolean = v.isValid
}

def textToPickList: Text => PickList = _.value.toPickList
class TextToPickList extends Function1[Text, PickList] with Serializable {
def apply(v: Text): PickList = v.value.toPickList
}

def textToMultiPickList: Text => MultiPickList = _.value.toSet[String].toMultiPickList
class TextToMultiPickList extends Function1[Text, MultiPickList] with Serializable {
def apply(v: Text): MultiPickList = v.value.toSet[String].toMultiPickList
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -232,10 +232,10 @@ private[op] class OpMultiClassificationEvaluator
ThresholdMetrics(
topNs = topNs,
thresholds = thresholds,
correctCounts = agg.mapValues { case (cor, _) => cor.toSeq },
incorrectCounts = agg.mapValues { case (_, incor) => incor.toSeq },
noPredictionCounts = agg.mapValues { case (cor, incor) =>
(Array.fill(nThresholds)(nRows) + cor.map(-_) + incor.map(-_)).toSeq
correctCounts = agg.map { case (k, (cor, _)) => k -> cor.toSeq },
incorrectCounts = agg.map { case (k, (_, incor)) => k -> incor.toSeq },
noPredictionCounts = agg.map { case (k, (cor, incor)) =>
k -> (Array.fill(nThresholds)(nRows) + cor.map(-_) + incor.map(-_)).toSeq
}
)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ private[filters] case class PreparedFeatures
* @return pair consisting of response and predictor summaries (in this order)
*/
def summaries: (Map[FeatureKey, Summary], Map[FeatureKey, Summary]) =
responses.mapValues(Summary(_)) -> predictors.mapValues(Summary(_))
responses.map { case (k, s) => k -> Summary(s) } -> predictors.map { case (k, s) => k -> Summary(s) }

/**
* Computes vector of size responseKeys.length + predictorKeys.length. The first responses.length
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,14 @@ class EmailToPickListMapTransformer(uid: String = UID[EmailToPickListMapTransfor
operationName = "emailToPickListMap",
transformer = new UnaryLambdaTransformer[Email, PickList](
operationName = "emailToPickList",
transformFn = EmailToPickListMapTransformer.emailToPickList
transformFn = new EmailToPickListMapTransformer.EmailToPickList
)
)

object EmailToPickListMapTransformer {
def emailToPickList: Email => PickList = email => email.domain.toPickList

class EmailToPickList extends Function1[Email, PickList] with Serializable {
def apply(v: Email): PickList = v.domain.toPickList
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ class IsValidPhoneMapDefaultCountry(uid: String = UID[IsValidPhoneMapDefaultCoun

phoneNumberMap.value
.mapValues(p => PhoneNumberParser.validate(p.toPhone, region, isStrict))
.collect{ case(k, v) if !v.isEmpty => k -> v.value.get }.toBinaryMap
.collect { case (k, SomeValue(Some(b))) => k -> b }.toBinaryMap
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,8 @@ package com.salesforce.op.stages.impl.feature
import com.salesforce.op.UID
import com.salesforce.op.features.types._
import com.salesforce.op.stages.base.unary.UnaryTransformer
import com.salesforce.op.utils.json.{JsonLike, JsonUtils}
import com.salesforce.op.utils.json.JsonUtils
import org.apache.spark.sql.types.{Metadata, MetadataBuilder}
import org.json4s.JsonAST.{JField, JNothing}
import org.json4s.{CustomSerializer, JObject}

import scala.reflect.runtime.universe.TypeTag
import scala.util.{Failure, Try}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,5 +53,5 @@ class TimePeriodMapTransformer[I <: DateMap]
) extends UnaryTransformer[I, IntegralMap](operationName = "dateMapToTimePeriod", uid = uid) {

override def transformFn: I => IntegralMap =
(i: I) => i.value.mapValues(t => period.extractIntFromMillis(t).toLong).toIntegralMap
(i: I) => i.value.map { case (k, t) => k -> period.extractIntFromMillis(t).toLong }.toIntegralMap
}
Loading

0 comments on commit 9f48ecb

Please sign in to comment.