Tests look strange #47

justafucker · 2015-06-18T00:13:01Z

I'm looking at https://github.com/mailgun/talon/blob/master/tests/signature/learning/helpers_test.py

And don't understand why in

'Sergey N.  Obukhov <serobnic@xxx.ru>': ['Sergey', 'Obukhov'],

the expected result doesn't include 'serobnic'

The text was updated successfully, but these errors were encountered:

justafucker · 2015-06-18T00:29:53Z

What is also unclear to me is that in https://github.com/afedosenko/talon/blob/master/tests/signature/learning/featurespace_test.py

s = '''John Doe
VP Research and Development, Xxxx Xxxx Xxxxx
555-226-2345
john@example.com'''
    sender = 'John <john@example.com>'
    features = fs.features(sender)
    result = fs.apply_features(s, features)
    # note that we don't consider the first line because signatures don't
    # usually take all the text, empty lines are not considered
    eq_(result, [[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
                 [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                 [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
                 [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

the last line contains 'john' which mean the last '0' should be '1'

obukhov-sergey · 2015-09-14T09:34:06Z

Hi @justafucker. Sorry for confusion and thanks for your interest / questions. Will try to explain them.

The 1st test checks that ['Sergey', 'Obukhov'] will be among extracted names - not that they are the only ones extracted. E.g. if you modify the test and add serobnic to the list the test will pass as well.

There is a test that specifically checks that given sergey@xxx.ru we'll extract sergey: https://github.com/mailgun/talon/blob/master/tests/signature/learning/helpers_test.py#L103

But we definitely encourage you to submit a PR if you find tests / code confusing and wish to contribute / improve them.

Regarding your 2nd question. The algo looks for lines like "John Doe" or "John" or "Doe" i.e. a line should end with extracted name or extracted name should be a detached word. This requirement might seem strange in respect to "john@example.com" but in general it helps to avoid false positives when extracted name happens to be some general sequence of chars that might occurs in a line.

justafucker changed the title ~~Helper tests look strange~~ Tests look strange Jun 18, 2015

obukhov-sergey closed this as completed Sep 14, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests look strange #47

Tests look strange #47

justafucker commented Jun 18, 2015

justafucker commented Jun 18, 2015

obukhov-sergey commented Sep 14, 2015

Tests look strange #47

Tests look strange #47

Comments

justafucker commented Jun 18, 2015

justafucker commented Jun 18, 2015

obukhov-sergey commented Sep 14, 2015