Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sl/flatten batch non array #1

Closed
wants to merge 40 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
127c70a
docs: add explanation dashboard integration example notebook (#1236)
imatiach-msft Nov 2, 2021
81f5f80
chore: release synapseml 0.9.2 (#1237)
serena-ruan Nov 3, 2021
5733b85
fix: performance issue in interpretability notebooks (#1238)
memoryz Nov 3, 2021
8adff5a
fix: min_data_in_leaf missing from dataset parameters in lightgbm (#1…
imatiach-msft Nov 3, 2021
6f601ff
docs: Adding document and notebook for Data Balance Analysis (#1226)
ms-kashyap Nov 3, 2021
ae4d058
fix: Broken link to notebook in Data Balance Analysis doc (#1240)
ms-kashyap Nov 3, 2021
5429d80
fix: fix publish to maven path and pypi env typo (#1242)
serena-ruan Nov 4, 2021
ba41290
fix: enable backwards compatibility foe old namespace (#1244)
mhamilton723 Nov 4, 2021
5a921cc
docs: Introduce Responsible AI section on website (Interpretability +…
ms-kashyap Nov 6, 2021
9831ee6
feat: Add redactedText to PIIV3 (#1247)
stuartleeks Nov 8, 2021
23a997b
chore: Add config to Text Api location (#1235)
stuartleeks Nov 8, 2021
8d1bd20
perf: website enhancement (#1245)
serena-ruan Nov 9, 2021
d72dd3e
docs: remove website whitepsace
mhamilton723 Nov 9, 2021
8406dd4
doc: add form demo and update website
mhamilton723 Nov 9, 2021
43870b6
docs: tweak announcement styling
mhamilton723 Nov 10, 2021
6459f3c
docs: fixup blog posts
mhamilton723 Nov 10, 2021
c0b516b
doc: fix broken website (#1251)
mhamilton723 Nov 10, 2021
b2751eb
refactor: automate adding tons of codeblock helpers in website sample…
serena-ruan Nov 12, 2021
c6660d0
docs: clean up some spacing issues in website (#1256)
mhamilton723 Nov 15, 2021
e6da4d5
chore: Update to SynapseML v0.9.4 (#1258)
mhamilton723 Nov 16, 2021
1c1edf6
fix: fix install instructions (#1259)
mhamilton723 Nov 16, 2021
b365858
chore: add website telemetry (#1260)
mhamilton723 Nov 16, 2021
b2a9c37
chore: fix website telemetry (#1261)
mhamilton723 Nov 16, 2021
7bfd0d7
docs: update and improve readme (#1262)
mhamilton723 Nov 17, 2021
16a9418
docs: add website announcement bar (#1263)
mhamilton723 Nov 17, 2021
d8e64b3
fix: remove some deps that cause sec issues (#1264)
mhamilton723 Nov 17, 2021
d0406c8
fix: Removing broadcast hint (#1255)
memoryz Nov 23, 2021
1bbcb7b
fix: fix installation instruction (#1268)
serena-ruan Nov 23, 2021
175fbc5
Fix a couple of links (#1266)
elswork Nov 24, 2021
db03b01
feat: add predict_disable_shape_check in LightGBM (#1273)
nhymxu Nov 30, 2021
eae0e15
fix: support Math expressions in markdown doc. (#1278)
memoryz Dec 1, 2021
82719fe
fix: add logging for number of columns and rows when creating dataset…
imatiach-msft Dec 1, 2021
c65423c
build: bump algoliasearch-helper from 3.6.1 to 3.6.2 in /website (#1270)
dependabot[bot] Dec 1, 2021
3898ad9
fix: hotfix for Rtests (#1283)
serena-ruan Dec 2, 2021
6ea8a9a
feat: Add TextAnalyze transformer to add support for Text Analytics `…
stuartleeks Dec 2, 2021
ff2aa02
perf: improve lightgbm training performance 4x-10x by setting num_thr…
imatiach-msft Dec 3, 2021
a58ad76
feat: add multivariate anomaly detection (#1272)
serena-ruan Dec 6, 2021
b3e91b7
feat: Add non-array value propagation to FlattenBatch
stuartleeks Dec 6, 2021
8752098
drop - temp test speedup
stuartleeks Dec 6, 2021
5fd33a1
improve null handling
stuartleeks Dec 6, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -56,4 +56,5 @@ node_modules/

# Misc
.bsp
website/.docusaurus
website/.docusaurus
null/
147 changes: 37 additions & 110 deletions README.md

Large diffs are not rendered by default.

17 changes: 12 additions & 5 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,7 @@ val settings = Seq(
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false),
autoAPIMappings := true,
pomPostProcess := pomPostFunc,
sbtPlugin := false
)
ThisBuild / publishMavenStyle := true

Expand Down Expand Up @@ -361,10 +362,16 @@ testWebsiteDocs := {
)
}

sonatypeProjectHosting := Some(
ThisBuild / sonatypeProjectHosting := Some(
GitHubHosting("Azure", "SynapseML", "mmlspark-support@microsot.com"))
homepage := Some(url("https://github.com/Microsoft/SynapseML"))
developers := List(
ThisBuild / homepage := Some(url("https://github.com/Microsoft/SynapseML"))
ThisBuild / scmInfo := Some(
ScmInfo(
url("https://github.com/Azure/SynapseML"),
"scm:git@github.com:Azure/SynapseML.git"
)
)
ThisBuild / developers := List(
Developer("mhamilton723", "Mark Hamilton",
"mmlspark-support@microsoft.com", url("https://github.com/mhamilton723")),
Developer("imatiach-msft", "Ilya Matiach",
Expand All @@ -373,9 +380,9 @@ developers := List(
"mmlspark-support@microsoft.com", url("https://github.com/drdarshan"))
)

licenses += ("MIT", url("https://github.com/Microsoft/SynapseML/blob/master/LICENSE"))
ThisBuild / licenses += ("MIT", url("https://github.com/Microsoft/SynapseML/blob/master/LICENSE"))

credentials += Credentials("Sonatype Nexus Repository Manager",
ThisBuild / credentials += Credentials("Sonatype Nexus Repository Manager",
"oss.sonatype.org",
Secrets.nexusUsername,
Secrets.nexusPassword)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -332,10 +332,4 @@ abstract class CognitiveServicesBaseNoHandler(val uid: String) extends Transform
}

abstract class CognitiveServicesBase(uid: String) extends
CognitiveServicesBaseNoHandler(uid) with HasHandler {
setDefault(handler -> HandlingUtils.advancedUDF(100)) //scalastyle:ignore magic.number

override def handlingFunc(client: CloseableHttpClient,
request: HTTPRequestData): HTTPResponseData =
getHandler(client, request)
}
CognitiveServicesBaseNoHandler(uid) with HasHandler
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ trait BasicAsyncReply extends HasAsyncReply {
request: HTTPRequestData): HTTPResponseData = {
val response = HandlingUtils.advanced(getBackoffs: _*)(client, request)
if (response.statusLine.statusCode == 202) {
val location = new URI(response.headers.filter(_.name == "Operation-Location").head.value)
val location = new URI(response.headers.filter(_.name.toLowerCase() == "operation-location").head.value)
val maxTries = getMaxPollingRetries
val key = request.headers.find(_.name == "Ocp-Apim-Subscription-Key").map(_.value)
val it = (0 to maxTries).toIterator.flatMap { _ =>
Expand All @@ -255,7 +255,6 @@ trait BasicAsyncReply extends HasAsyncReply {
}
}


trait HasAsyncReply extends Params {
val backoffs: IntArrayParam = new IntArrayParam(
this, "backoffs", "array of backoffs to use in the handler")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,21 +53,23 @@ class FormOntologyLearner(override val uid: String) extends Estimator[FormOntolo
}

override def fit(dataset: Dataset[_]): FormOntologyTransformer = {
val fromRow = AnalyzeResponse.makeFromRowConverter

def combine(st1: StructType, st2: StructType): StructType = {
FormOntologyLearner.combineDataTypes(st1, st2).asInstanceOf[StructType]
}

val mergedSchema = dataset.toDF()
.select(col(getInputCol))
.map(extractOntology(fromRow))(Encoders.kryo[StructType])
.reduce(combine _)

new FormOntologyTransformer()
.setInputCol(getInputCol)
.setOutputCol(getOutputCol)
.setOntology(mergedSchema)
logFit({
val fromRow = AnalyzeResponse.makeFromRowConverter

def combine(st1: StructType, st2: StructType): StructType = {
FormOntologyLearner.combineDataTypes(st1, st2).asInstanceOf[StructType]
}

val mergedSchema = dataset.toDF()
.select(col(getInputCol))
.map(extractOntology(fromRow))(Encoders.kryo[StructType])
.reduce(combine _)

new FormOntologyTransformer()
.setInputCol(getInputCol)
.setOutputCol(getOutputCol)
.setOntology(mergedSchema)
})
}

override def copy(extra: ParamMap): Estimator[FormOntologyTransformer] = defaultCopy(extra)
Expand Down Expand Up @@ -109,11 +111,13 @@ class FormOntologyTransformer(override val uid: String) extends Model[FormOntolo
}

override def transform(dataset: Dataset[_]): DataFrame = {
val fromRow = AnalyzeResponse.makeFromRowConverter
val convertToOntologyUDF = UDFUtils.oldUdf(convertToOntology(fromRow) _, getOntology)
logTransform[DataFrame]({
val fromRow = AnalyzeResponse.makeFromRowConverter
val convertToOntologyUDF = UDFUtils.oldUdf(convertToOntology(fromRow) _, getOntology)

dataset.toDF()
.withColumn(getOutputCol, convertToOntologyUDF(col(getInputCol)))
dataset.toDF()
.withColumn(getOutputCol, convertToOntologyUDF(col(getInputCol)))
})
}

override def transformSchema(schema: StructType): StructType = {
Expand Down
Loading