Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUMO mappings #52

Open
vcvpaiva opened this issue Feb 11, 2017 · 3 comments
Open

SUMO mappings #52

vcvpaiva opened this issue Feb 11, 2017 · 3 comments

Comments

@vcvpaiva
Copy link
Member

vcvpaiva commented Feb 11, 2017

To investigate: from @arademaker Jan, 25, 2017
$ awk '$0 ~ /^[0-9]/ { print substr($10,length($10),length($10))}' sentences.conllu | sort | uniq -c | sort -n -r

all mappings to SUMO by type:

38671 ?
29606 +
16477 =
171 @
67 [

67 [ are negation meanings, a bad convention from SUMO and it's worth investigating them all.
but 171@ are strange, as I expected them only for American, Asian, German.

$ awk '$0 ~ /^[0-9]/ && $10 ~ /[@[]$/ { print $2,$4,$10}' sentences.conllu | sort | uniq -c | sort -n -r
35 fish NOUN NN|08688076-n|Region@
22 snowboarder NOUN NN|10617665-n|SportsPosition@
16 one NOUN NN|13742573-n|PositiveInteger@
13 acoustic ADJ JJ|02868489-a|FieldOfStudy@
12 bull NOUN NN|08686332-n|Region@
10 lion NOUN NN|08686821-n|Region@
10 barefoot NOUN RB|00278078-r|Clothing[
9 American ADJ JJ|02927303-a|LandArea@
8 silent ADJ JJ|00942163-a|LinguisticCommunication[
8 poles NOUN NNS|13650921-n|LengthMeasure@
8 pole NOUN NN|13650921-n|LengthMeasure@
8 naked VERB JJ|00457998-a|Clothing[
5 new ADJ JJ|02584699-a|Damaging[
4 rod NOUN NN|13650921-n|LengthMeasure@
4 afghan ADV JJ|03003928-a|Nation@
4 Indian ADJ JJ|02928347-a|LandArea@
3 only ADV RB|00008600-r|SocialInteraction[
3 nude NOUN JJ|00457998-a|Clothing[
3 lone ADJ JJ|02251212-a|SocialInteraction[
3 bull ADJ NN|08686332-n|Region@
3 barefoot NOUN JJ|02156686-a|Clothing[
3 alone ADV RB|00157967-r|SocialInteraction[
3 alone ADJ RB|00157967-r|SocialInteraction[
3 African ADJ JJ|02941790-a|Continent@
2 shoeless NOUN JJ|02156686-a|Clothing[
2 shoeless ADJ JJ|02156686-a|Clothing[
2 perch NOUN NN|13650921-n|LengthMeasure@
2 naked ADJ JJ|00457998-a|Clothing[
2 motionless ADJ JJ|01564315-a|Motion[
2 american ADJ JJ|02927303-a|LandArea@
2 Fish PROPN NN|08688076-n|Region@
2 Egyptian ADJ JJ|02971469-a|Nation@
1 york NOUN NN|08159924-n|FamilyGroup@
1 unmanned VERB JJ|01479940-a|Human[
1 snowboarders NOUN NNS|10617665-n|SportsPosition@
1 silently ADV RB|00112090-r|LinguisticCommunication[
1 seattle ADJ NN|09154178-n|PortCity@
1 perches NOUN NNS|13650921-n|LengthMeasure@
1 nude ADJ JJ|00457998-a|Clothing[
1 motionlessly ADV RB|00404311-r|Motion[
1 lone NOUN JJ|02251212-a|SocialInteraction[
1 labrador NOUN NN|08819883-n|LandArea@
1 indian ADJ JJ|02928347-a|LandArea@
1 healthy NOUN JJ|01170243-a|DiseaseOrSyndrome[
1 harmlessly ADV RB|00310036-r|Damaging[
1 grand NOUN NN|13750844-n|PositiveInteger@
1 first ADV JJ|02186338-a|Integer@
1 cloudy ADJ JJ|00461311-a|RadiatingLight[
1 barefoot ADJ JJ|02156686-a|Clothing[
1 artificial ADJ JJ|01571363-a|OrganicObject[
1 african ADJ JJ|02941790-a|Continent@
1 Shepard PROPN NN|11297263-n|Man@
1 Egyptian PROPN JJ|02971469-a|Nation@
1 African PROPN JJ|02941790-a|Continent@

@vcvpaiva
Copy link
Member Author

fish NOUN NN|08688076-n|Region@
bull NOUN NN|08686332-n|Region@
lion NOUN NN|08686821-n|Region@
Fish PROPN NN|08688076-n|Region@
seem to be Zodiac signs.

@vcvpaiva
Copy link
Member Author

Adding here the list @arademaker produced of words without SUMO concepts. Very few of these should have concepts, as mostly we don't expect SUMO concepts for prepositions, determiners, particles, etc.
Exceptions that show issues in PWN:
24 shirtless ADJ
21 motocross NOUN
17 biker NOUN
16 wheelie NOUN
13 scissors NOUN (lemmatization issue)
11 snowboarding VERB
9 bmx NOUN
9 biking VERB
8 corndogs NOUN
8 breaded VERB
8 backbends NOUN
7 wetsuit NOUN
7 tambourines NOUN
6 underwater NOUN
5 inflatable ADJ
5 bellbottoms NOUN
4 waterskiing VERB
4 t-ball NOUN
4 snowsuits NOUN
4 rollerbladers NOUN
4 pyramid-shaped ADJ
4 pajamas NOUN
4 kickboxing NOUN
4 jetski NOUN
4 gymnastic ADJ
4 gloved ADJ
4 coverall NOUN
3 upside-down NOUN
3 snowboarding NOUN
3 sleeved ADJ
3 skateboarding VERB
3 shirtless NOUN
3 preteen NOUN
3 pong ADV
3 piercings NOUN
3 mittened VERB
3 maracas NOUN
3 loris NOUN
3 kiddies NOUN
3 kickboxing VERB
3 footbag NOUN
3 deboning VERB
3 bmx ADP
3 Seadoo PROPN
3 Rollerbladers NOUN
3 ATVs PROPN
etc
MissingSUMOconcepts.txt

@vcvpaiva
Copy link
Member Author

related to issue #60

@vcvpaiva vcvpaiva changed the title Strange SUMO mappings SUMO mappings Mar 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant