Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate name detection into SmartTextVectorizer #456

Closed
wants to merge 142 commits into from

Commits on Dec 5, 2019

  1. Re-added unary estimator code and started porting logic to Algebird m…

    …onoids instead of custom accumulators
    MWYang committed Dec 5, 2019
    Configuration menu
    Copy the full SHA
    2acf3fc View commit details
    Browse the repository at this point in the history

Commits on Dec 6, 2019

  1. Configuration menu
    Copy the full SHA
    b55c31e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f443952 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    557ef39 View commit details
    Browse the repository at this point in the history
  4. Added HLL back to monoid accumulator and code now compiles correctly;…

    … Still need to fix HLL serialization in Spark issue
    MWYang committed Dec 6, 2019
    Configuration menu
    Copy the full SHA
    b6728ec View commit details
    Browse the repository at this point in the history
  5. Fixed HLL in NameDetectStats not serializing correctly; Now need to f…

    …ix printing bug for NameDetectStats
    MWYang committed Dec 6, 2019
    Configuration menu
    Copy the full SHA
    ff1b2ef View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    33b77c9 View commit details
    Browse the repository at this point in the history
  7. Fixed guard stat calculation computing moments of number of tokens in…

    …stead of moments of text length; Still need to fix no moments higher than the 1st being calculated
    MWYang committed Dec 6, 2019
    Configuration menu
    Copy the full SHA
    4caec75 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    b60dc4a View commit details
    Browse the repository at this point in the history

Commits on Dec 7, 2019

  1. Configuration menu
    Copy the full SHA
    469111b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e5f169e View commit details
    Browse the repository at this point in the history

Commits on Dec 9, 2019

  1. Configuration menu
    Copy the full SHA
    b701612 View commit details
    Browse the repository at this point in the history
  2. Added honorific detection

    MWYang committed Dec 9, 2019
    Configuration menu
    Copy the full SHA
    8342dae View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2e0e85a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b079a27 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    a1197a7 View commit details
    Browse the repository at this point in the history
  6. Updated documentation

    MWYang committed Dec 9, 2019
    Configuration menu
    Copy the full SHA
    19bad0b View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    3172bde View commit details
    Browse the repository at this point in the history
  8. Added flag for ignoring nulls

    MWYang committed Dec 9, 2019
    Configuration menu
    Copy the full SHA
    0d82eef View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    7812d12 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    345508f View commit details
    Browse the repository at this point in the history

Commits on Dec 10, 2019

  1. Configuration menu
    Copy the full SHA
    a80e382 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f7817d9 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    cf3eff0 View commit details
    Browse the repository at this point in the history
  4. Added failing tests for STV

    MWYang committed Dec 10, 2019
    Configuration menu
    Copy the full SHA
    eaaa23a View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    b928a49 View commit details
    Browse the repository at this point in the history
  6. Created metadata case class per PR review; Added tests for metadata; …

    …Cleaned up test code
    MWYang committed Dec 10, 2019
    Configuration menu
    Copy the full SHA
    048c084 View commit details
    Browse the repository at this point in the history
  7. Added test for name threshold

    MWYang committed Dec 10, 2019
    Configuration menu
    Copy the full SHA
    5d30e79 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    78c321c View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    a597d21 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    12b3eae View commit details
    Browse the repository at this point in the history

Commits on Dec 11, 2019

  1. Started porting over name detection code before wanting to try and si…

    …mplify the shared code even further
    MWYang committed Dec 11, 2019
    Configuration menu
    Copy the full SHA
    e299677 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    997a132 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6b8a039 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    1250559 View commit details
    Browse the repository at this point in the history
  5. Fixed tests sometimes failing because they were not using the same na…

    …me dictionary as NameDetectUtils; Removed now unncessary change to RandomText test helper
    MWYang committed Dec 11, 2019
    Configuration menu
    Copy the full SHA
    2d804ee View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    f4ecea1 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    6c34935 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    9427cd3 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    11f5bf3 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    38bff0b View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    e823ecb View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    5f4f592 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    5336e71 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    9e9c149 View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2019

  1. Configuration menu
    Copy the full SHA
    141f1af View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    33f6809 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    464fe52 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    6cc90de View commit details
    Browse the repository at this point in the history
  5. Started to make Changes to SmartTextMapVectorizer but ran into proble…

    …m with utils and types
    MWYang committed Dec 12, 2019
    Configuration menu
    Copy the full SHA
    87c568d View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    bff4d1b View commit details
    Browse the repository at this point in the history
  7. Removed type parameter from NameDetectFun because of later conflict w…

    …ith SmartTextMapVectorizer
    MWYang committed Dec 12, 2019
    Configuration menu
    Copy the full SHA
    639af3d View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    a8b4423 View commit details
    Browse the repository at this point in the history

Commits on Dec 13, 2019

  1. Configuration menu
    Copy the full SHA
    e7795dc View commit details
    Browse the repository at this point in the history
  2. Removed Pythonic i.e. not Scala-ic index thing and added separate cas…

    …e for checkign the last token instead, per PR comment
    MWYang committed Dec 13, 2019
    Configuration menu
    Copy the full SHA
    440068c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e6136f9 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    283d76e View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    80d81ed View commit details
    Browse the repository at this point in the history
  6. Fixed serialization of GenderDetectStrategy, per PR recommendation to…

    … test with local workflow
    MWYang committed Dec 13, 2019
    Configuration menu
    Copy the full SHA
    e003eb2 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    ee6c24b View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    2fc0c3d View commit details
    Browse the repository at this point in the history
  9. Fixed missing plus sign in OpPipelineStageReaderWriter causing double…

    … serialization test to fail
    MWYang committed Dec 13, 2019
    Configuration menu
    Copy the full SHA
    b51e14b View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    634b664 View commit details
    Browse the repository at this point in the history
  11. Cleaned up utils file by moving all implicit definitions to NameDetec…

    …tStats object and started integrating name detection in SmartTextMapVectorizer
    MWYang committed Dec 13, 2019
    Configuration menu
    Copy the full SHA
    03431e9 View commit details
    Browse the repository at this point in the history

Commits on Dec 14, 2019

  1. Configuration menu
    Copy the full SHA
    cde7551 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    393275c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ad9574c View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    72a1d48 View commit details
    Browse the repository at this point in the history

Commits on Dec 16, 2019

  1. Configuration menu
    Copy the full SHA
    a8d91de View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0639af5 View commit details
    Browse the repository at this point in the history

Commits on Dec 17, 2019

  1. Updated tests based on my new correct understanding that Text.empty =…

    …> null, not empty string
    MWYang committed Dec 17, 2019
    Configuration menu
    Copy the full SHA
    4f316e5 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    714dcdc View commit details
    Browse the repository at this point in the history

Commits on Dec 18, 2019

  1. Configuration menu
    Copy the full SHA
    27a6ce6 View commit details
    Browse the repository at this point in the history
  2. Changed SensitiveFeatureInformation.Name to log gender detection stra…

    …tegies and started on passing first sensitive feature test
    MWYang committed Dec 18, 2019
    Configuration menu
    Copy the full SHA
    9e7bfd2 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    5d7716b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    bc9bc63 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    7c218e9 View commit details
    Browse the repository at this point in the history

Commits on Dec 19, 2019

  1. Passed first metadata test for SmartTextVectorizer; Started to re-wor…

    …k SensitiveFeatureInformation for map feature types
    MWYang committed Dec 19, 2019
    Configuration menu
    Copy the full SHA
    500d16e View commit details
    Browse the repository at this point in the history

Commits on Dec 20, 2019

  1. Configuration menu
    Copy the full SHA
    08af16b View commit details
    Browse the repository at this point in the history
  2. Fixed OPVectorMetadataTest

    MWYang committed Dec 20, 2019
    Configuration menu
    Copy the full SHA
    573820c View commit details
    Browse the repository at this point in the history
  3. Small fixes to tests

    MWYang committed Dec 20, 2019
    Configuration menu
    Copy the full SHA
    eb5ff9b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    eaad6f8 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    2bea311 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    77f91a2 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    b6b385a View commit details
    Browse the repository at this point in the history

Commits on Dec 21, 2019

  1. Configuration menu
    Copy the full SHA
    8ed02bd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e92101a View commit details
    Browse the repository at this point in the history

Commits on Jan 6, 2020

  1. Configuration menu
    Copy the full SHA
    799eb58 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ca69ed8 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b82440a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    d747c92 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    7d01d70 View commit details
    Browse the repository at this point in the history

Commits on Jan 7, 2020

  1. Configuration menu
    Copy the full SHA
    b4209c6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    23d7a57 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d86f2d6 View commit details
    Browse the repository at this point in the history

Commits on Jan 8, 2020

  1. Configuration menu
    Copy the full SHA
    eea3a3c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b9522de View commit details
    Browse the repository at this point in the history
  3. Incorporated PR comments (using enumeratum for NameStats map keys/val…

    …ues and proper camel casing for guard check params)
    MWYang committed Jan 8, 2020
    Configuration menu
    Copy the full SHA
    e4e3ddd View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f551543 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    a9d95a1 View commit details
    Browse the repository at this point in the history
  6. Made all tests pass - Debuging wasn't being enabled due to non-intuit…

    …ve execution order for ScalaTest
    MWYang committed Jan 8, 2020
    Configuration menu
    Copy the full SHA
    30476c8 View commit details
    Browse the repository at this point in the history
  7. Fixed test to show that the output for SmartTextVectorizer is the sam…

    …e with or without name entries
    MWYang committed Jan 8, 2020
    Configuration menu
    Copy the full SHA
    aa680b8 View commit details
    Browse the repository at this point in the history

Commits on Jan 9, 2020

  1. Configuration menu
    Copy the full SHA
    a57aa29 View commit details
    Browse the repository at this point in the history
  2. Removed some print statements

    MWYang committed Jan 9, 2020
    Configuration menu
    Copy the full SHA
    b9120a5 View commit details
    Browse the repository at this point in the history
  3. Updated documentation

    MWYang committed Jan 9, 2020
    Configuration menu
    Copy the full SHA
    be2047d View commit details
    Browse the repository at this point in the history
  4. Incorporated PR comments (renamed GenderStrings to GenderValues and r…

    …emoved BooleanStrings enum
    MWYang committed Jan 9, 2020
    Configuration menu
    Copy the full SHA
    8b02dff View commit details
    Browse the repository at this point in the history
  5. Removed plural names from NameStats enums and factored out method in …

    …identifyGender to hopefully reduce complexity
    MWYang committed Jan 9, 2020
    Configuration menu
    Copy the full SHA
    8844ef5 View commit details
    Browse the repository at this point in the history

Commits on Jan 10, 2020

  1. Configuration menu
    Copy the full SHA
    0cc10e0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a0d97c7 View commit details
    Browse the repository at this point in the history
  3. Got previous tests working

    Jauntbox committed Jan 10, 2020
    Configuration menu
    Copy the full SHA
    3ad8ce1 View commit details
    Browse the repository at this point in the history
  4. New test also working

    Jauntbox committed Jan 10, 2020
    Configuration menu
    Copy the full SHA
    b00775b View commit details
    Browse the repository at this point in the history
  5. Remove debug output

    Jauntbox committed Jan 10, 2020
    Configuration menu
    Copy the full SHA
    03ae4d0 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    9759747 View commit details
    Browse the repository at this point in the history

Commits on Jan 13, 2020

  1. Configuration menu
    Copy the full SHA
    63ad17a View commit details
    Browse the repository at this point in the history
  2. Removed print statements

    MWYang committed Jan 13, 2020
    Configuration menu
    Copy the full SHA
    89fa865 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    10644d8 View commit details
    Browse the repository at this point in the history

Commits on Jan 14, 2020

  1. Removed emptiness check

    MWYang committed Jan 14, 2020
    Configuration menu
    Copy the full SHA
    4fea007 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8f7b125 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9317250 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    6424463 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    a107db1 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    32f29ce View commit details
    Browse the repository at this point in the history
  7. Fixed SensitiveFeatureInformation tests failing due to not changing t…

    …he to/fromMetadata tests
    MWYang committed Jan 14, 2020
    Configuration menu
    Copy the full SHA
    acb873e View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    973aabb View commit details
    Browse the repository at this point in the history
  9. Addressing comments

    Jauntbox committed Jan 14, 2020
    1 Configuration menu
    Copy the full SHA
    3ae735c View commit details
    Browse the repository at this point in the history

Commits on Jan 15, 2020

  1. Spelling

    Jauntbox committed Jan 15, 2020
    Configuration menu
    Copy the full SHA
    bef2b22 View commit details
    Browse the repository at this point in the history
  2. Fixed failing test by making default behavior of SmartTextVectorizer …

    …and SmartTextMapVectorizer the same as previous behavior - don't ignore any features
    Jauntbox committed Jan 15, 2020
    Configuration menu
    Copy the full SHA
    069031c View commit details
    Browse the repository at this point in the history

Commits on Jan 21, 2020

  1. Configuration menu
    Copy the full SHA
    d8f7f21 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e13c4d1 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    fe57c89 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    458af4a View commit details
    Browse the repository at this point in the history

Commits on Jan 22, 2020

  1. Configuration menu
    Copy the full SHA
    40cb64b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fde5c8d View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2572332 View commit details
    Browse the repository at this point in the history

Commits on Jan 24, 2020

  1. Configuration menu
    Copy the full SHA
    a8504da View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    eac0e05 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    26f30fb View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f66d896 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    9a3f5af View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    bd7b90d View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    5c37c26 View commit details
    Browse the repository at this point in the history

Commits on Jan 29, 2020

  1. Configuration menu
    Copy the full SHA
    9344e5f View commit details
    Browse the repository at this point in the history