-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate name detection into SmartTextVectorizer #456
Conversation
…onoids instead of custom accumulators
… Still need to fix HLL serialization in Spark issue
…ix printing bug for NameDetectStats
…stead of moments of text length; Still need to fix no moments higher than the 1st being calculated
…Cleaned up test code
Codecov Report
@@ Coverage Diff @@
## master #456 +/- ##
===========================================
- Coverage 87% 74.72% -12.28%
===========================================
Files 341 341
Lines 11485 11532 +47
Branches 378 597 +219
===========================================
- Hits 9992 8617 -1375
- Misses 1493 2915 +1422
Continue to review full report at Codecov.
|
dataset.map(_.map(computeTextStats(_, shouldCleanText)).toArray).reduce(_ + _), | ||
Array.fill[NameDetectStats](inN.length)(NameDetectStats.empty) | ||
) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the formatting of both if branches look the same since they're doing nearly the same thing?
// In which case create SensitiveFeatureInformation for all features | ||
case ((feature: String, key: Option[String]), stats: NameDetectStats) | ||
if log.isDebugEnabled || computeTreatAsName(stats) => | ||
val N = stats.dictCheckResult.count.toDouble |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more descriptive name, please
Describe the proposed solution
Incorporates the changes in #445 and #457 into
SmartTextVectorizer
andSmartTextMapVectorizer
.Additional context
Merge #457 before merging this PR. Compare the diff between this PR and that one on my forked repo.
Changes from #455 needs to be merged before this PR is ready.