-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate name detection into SmartTextVectorizer #456
Closed
Closed
Changes from all commits
Commits
Show all changes
142 commits
Select commit
Hold shift + click to select a range
2acf3fc
Re-added unary estimator code and started porting logic to Algebird m…
MWYang b55c31e
Re-added JRC name dictionary and cleaned up names of methods
MWYang f443952
Fixed bug with AveragedValue computation; Trying to debug current Alg…
MWYang 557ef39
Fixed wrong inequality direction in guard checks
MWYang b6728ec
Added HLL back to monoid accumulator and code now compiles correctly;…
MWYang ff1b2ef
Fixed HLL in NameDetectStats not serializing correctly; Now need to f…
MWYang 33b77c9
Fixed NameDetectStats printing
MWYang 4caec75
Fixed guard stat calculation computing moments of number of tokens in…
MWYang b60dc4a
Fixed moments calculation and fixed divide by zero error when list of…
MWYang 469111b
Added gender identification code transforming; All previous tests now…
MWYang e5f169e
Undid SparkUtils changes, which are no longer necessary
MWYang b701612
Renamed class names to be more consistent + small fixes
MWYang 8342dae
Added honorific detection
MWYang 2e0e85a
Implemented RegEx checking for gender
MWYang b079a27
Implemented mixed gender identification strategies
MWYang a1197a7
Removed TODOs and extraneous functions in preparation for PR
MWYang 19bad0b
Updated documentation
MWYang 3172bde
Ignore null values in detecting names
MWYang 0d82eef
Added flag for ignoring nulls
MWYang 7812d12
Added sir/madam to list of honorifics
MWYang 345508f
Merge branch 'master' into my/unary-detect-names
MWYang a80e382
Fixed typo when adding sir/madam to list of honorifics that caused fa…
MWYang f7817d9
Fixed failing test due to divide by zero NA on some inputs
MWYang cf3eff0
Cleaned up redundant import in tests
MWYang eaaa23a
Added failing tests for STV
MWYang b928a49
Made small changes based on PR comments (updated inline comment and i…
MWYang 048c084
Created metadata case class per PR review; Added tests for metadata; …
MWYang 5d30e79
Added test for name threshold
MWYang 78c321c
Updated comment about NameDetectStats.toJson
MWYang a597d21
Added tests for new NameStats feature type
MWYang 12b3eae
Merge branch 'my/unary-detect-names' into my/stv-detect-names
MWYang e299677
Started porting over name detection code before wanting to try and si…
MWYang 997a132
Added private declaration to methods in NameDetectFun trait
MWYang 6b8a039
Abstracted out even more name detection logic into NameDetectUtils
MWYang 1250559
Added default dictionaries to NameDetectUtils object (for lazy and pe…
MWYang 2d804ee
Fixed tests sometimes failing because they were not using the same na…
MWYang f4ecea1
Merge branch 'my/unary-detect-names' into my/stv-detect-names
MWYang 6c34935
Using new util abstractions for name detection and fixed encoder issue
MWYang 9427cd3
Refactored STV changes to be cleaner
MWYang 11f5bf3
Figured out Algebird-fu so that we can perform both reduce operations…
MWYang 38bff0b
Using Algebird shortcuts again to reduce verbosity
MWYang e823ecb
Added custom enum for how to handle each column in SmartTextVectorizer
MWYang 5f4f592
Updated NameDetectStats.toJson to be less verbose and use custom seri…
MWYang 5336e71
Updated NameDetectStats.toJson to be less verbose and use custom seri…
MWYang 9e9c149
Updated STV.partition name to be more meaningful
MWYang 141f1af
Added back SensitiveFeatureInformation metadata files/changes
MWYang 33f6809
Delete accidentally committed temporary test file
MWYang 464fe52
Added shortcut for unary name detector
MWYang 6cc90de
Merge branch 'my/unary-detect-names' into my/stv-detect-names
MWYang 87c568d
Started to make Changes to SmartTextMapVectorizer but ran into proble…
MWYang bff4d1b
Delete accidentally committed temporary test file
MWYang 639af3d
Removed type parameter from NameDetectFun because of later conflict w…
MWYang a8b4423
Merge branch 'my/unary-detect-names' into my/stv-detect-names
MWYang e7795dc
Added first failing test for SmartTextMapVectorizer
MWYang 440068c
Removed Pythonic i.e. not Scala-ic index thing and added separate cas…
MWYang e6136f9
Removed extraneous case classes for dictionaries, per PR comment
MWYang 283d76e
Small fixes (updated comments, re-ordered things) per PR comments
MWYang 80d81ed
Removed usage of broadcast variables in transformer b/c it does not s…
MWYang e003eb2
Fixed serialization of GenderDetectStrategy, per PR recommendation to…
MWYang ee6c24b
Started merging my/unary-detect-names into my/stv-detect-names
MWYang 2fc0c3d
Restored GenderDetectStrategy after merge
MWYang b51e14b
Fixed missing plus sign in OpPipelineStageReaderWriter causing double…
MWYang 634b664
Merge branch 'my/unary-detect-names' into my/stv-detect-names
MWYang 03431e9
Cleaned up utils file by moving all implicit definitions to NameDetec…
MWYang cde7551
Passed first test for SmartTextMapVectorizer
MWYang 393275c
Synced changes from upstream feature branch for STV changes
MWYang ad9574c
Tidied up monoid definition for NameDetectStats after figuring out ho…
MWYang 72a1d48
Merge branch 'my/unary-detect-names' into my/stv-detect-names
MWYang a8d91de
Added more passing tests for SmartTextMapVectorizer
MWYang 0639af5
Started to add next test for excluding names from vector output
MWYang 4f316e5
Updated tests based on my new correct understanding that Text.empty =…
MWYang 714dcdc
Fixed failing test due to constructing to-be-compared estimators diff…
MWYang 27a6ce6
Added first failing metadata test
MWYang 9e7bfd2
Changed SensitiveFeatureInformation.Name to log gender detection stra…
MWYang 5d7716b
Abstracted out ordering of gender detection strategies into utils file
MWYang bc9bc63
Merge branch 'my/unary-detect-names' into my/stv-detect-names
MWYang 7c218e9
Added warning logging into SmartTextVectorizer
MWYang 500d16e
Passed first metadata test for SmartTextVectorizer; Started to re-wor…
MWYang 08af16b
Added first passing metadata test for STMapV
MWYang 573820c
Fixed OPVectorMetadataTest
MWYang eb5ff9b
Small fixes to tests
MWYang eaad6f8
Merge branch 'master' into my/unary-detect-names
MWYang 2bea311
Merge branch 'my/unary-detect-names' of https://github.com/MWYang/Tra…
MWYang 77f91a2
Small fixes (better Scala code, more safe, better patterns) from Matt…
MWYang b6b385a
Improved gender detection strategy tests to check that the correct st…
MWYang 8ed02bd
Broke out guard check numbers into their own params
MWYang e92101a
Merge branch 'my/unary-detect-names' into my/stv-detect-names
MWYang 799eb58
Added operationName as an argument to HumanNameDetectorModel for easi…
MWYang ca69ed8
Merge branch 'master' into my/unary-detect-names
MWYang b82440a
Merge branch 'my/unary-detect-names' of https://github.com/MWYang/Tra…
MWYang d747c92
Revert to using container Text class for NameDetectUtils per PR comments
MWYang 7d01d70
Added NameStats to FeatureBuilder
MWYang b4209c6
Added NameStats to a few more places
MWYang 23d7a57
Added NameStats to TestFeatureBuilder and RandomMap
MWYang d86f2d6
Merge branch 'my/unary-detect-names' into my/stv-detect-names
MWYang eea3a3c
Reordered tests to avoid flooding output with test logs
MWYang b9522de
Merge branch 'master' into my/unary-detect-names
tovbinm e4e3ddd
Incorporated PR comments (using enumeratum for NameStats map keys/val…
MWYang f551543
Merge branch 'my/unary-detect-names' into my/stv-detect-names
MWYang a9d95a1
Passed most SmartTextVectorizer tests after merging changes
MWYang 30476c8
Made all tests pass - Debuging wasn't being enabled due to non-intuit…
MWYang aa680b8
Fixed test to show that the output for SmartTextVectorizer is the sam…
MWYang a57aa29
Added all other metadata tests for SmartTextVectorizer
MWYang b9120a5
Removed some print statements
MWYang be2047d
Updated documentation
MWYang 8b02dff
Incorporated PR comments (renamed GenderStrings to GenderValues and r…
MWYang 8844ef5
Removed plural names from NameStats enums and factored out method in …
MWYang 0cc10e0
More small fixes from PR comments
MWYang a0d97c7
Merge branch 'my/unary-detect-names' into my/stv-detect-names
MWYang 3ad8ce1
Got previous tests working
Jauntbox b00775b
New test also working
Jauntbox 03ae4d0
Remove debug output
Jauntbox 9759747
Added test for TextList monoid
Jauntbox 63ad17a
Added all tests for SmartTextMapVectorizer
MWYang 89fa865
Removed print statements
MWYang 10644d8
Added another test and fixed sneaky metadata issue
Jauntbox 4fea007
Removed emptiness check
MWYang 8f7b125
Merge branch 'my/unary-detect-names' into my/stv-detect-names
MWYang 9317250
Merge branch 'master' of github.com:salesforce/TransmogrifAI into km/…
Jauntbox 6424463
Merge branch 'master' into my/stv-detect-names
MWYang a107db1
Pulled out SensitiveFeatureInformation metadata changes into its own …
MWYang 32f29ce
Merge branch 'my/sensitive-metadata' into my/stv-detect-names
MWYang acb873e
Fixed SensitiveFeatureInformation tests failing due to not changing t…
MWYang 973aabb
Merge branch 'my/sensitive-metadata' into my/stv-detect-names
MWYang 3ae735c
Addressing comments
Jauntbox bef2b22
Spelling
Jauntbox 069031c
Fixed failing test by making default behavior of SmartTextVectorizer …
Jauntbox d8f7f21
Merge branch 'km/token-lens-map' into my/sensitive-metadata
MWYang e13c4d1
Merge branch 'my/sensitive-metadata' into my/stv-detect-names
MWYang fe57c89
Merge branch 'km/token-lens-map' into my/sensitive-metadata
MWYang 458af4a
Merge branch 'my/sensitive-metadata' into my/stv-detect-names
MWYang 40cb64b
Made all tests pass after merge (Ignore in STMapV didn't handle empty…
MWYang fde5c8d
Merge branch 'master' into my/sensitive-metadata
MWYang 2572332
Merge branch 'my/sensitive-metadata' into my/stv-detect-names
MWYang a8504da
Removed enum from SensitiveFeatureInformation per PR comments
MWYang eac0e05
Using case class for GenderDetectionStrategy information
MWYang 26f30fb
Cleaning up tests per PR comments
MWYang f66d896
Merge branch 'my/sensitive-metadata' into my/stv-detect-names
MWYang 9a3f5af
Made fixes for metadata changes
MWYang bd7b90d
Merge branch 'master' into my/sensitive-metadata
MWYang 5c37c26
Merge branch 'my/sensitive-metadata' into my/stv-detect-names
MWYang 9344e5f
Merge branch 'master' into my/stv-detect-names
MWYang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the formatting of both if branches look the same since they're doing nearly the same thing?