Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topology of PWN #142

Open
vcvpaiva opened this issue Feb 6, 2018 · 3 comments
Open

topology of PWN #142

vcvpaiva opened this issue Feb 6, 2018 · 3 comments

Comments

@vcvpaiva
Copy link
Member

vcvpaiva commented Feb 6, 2018

Could anyone tell me how many of the 117659 synsets have glosses? not all do

Can we add to the repo somewhere the corpus of glosses, inspectable?
https://wordnet.princeton.edu/glosstag.shtml

@fcbr
Copy link
Member

fcbr commented Feb 7, 2018

@vcvpaiva AFAIK all PWN synsets have glosses. For example, if we use the Prolog output of PWN, and removing the duplicates in the Prolog generated we have 117659 entries.

$ cd prolog
$ cat wn_g.pl | awk -F, '{print $1}'| sort | uniq -c | wc
117659

Also, if we look at the tagged glosses, it seems that all of them have tagged glosses too:

$ cd glosstag/standoff
$ cat index.byid.tab | awk -F'$\t' '{print $1'} | sort | uniq | wc
117659

@vcvpaiva
Copy link
Member Author

vcvpaiva commented Feb 7, 2018

thanks @fcbr!
this is odd, as I am sure many times I have had the impression not having a gloss.
maybe it's when it's a single word like

05893261-n sine_qua_non, essential_condition | sine qua non
(a prerequisite)

what is a tagged gloss, please?

and questions on the topology of PWN:

  1. how many synsets s go directly to Entity? do all synsets go to Entity?

  2. how many have two hops?

  3. how many have a long hierarchy like kitty<domestic_cat <cat<feline<carnivore< placental_mammal < mammal<vertebrate<chordate<animal<organism<living_thing<entity?

  4. I seem to remember that you were calculating isolated nodes vs hierarchies? where is that data now?

@arademaker
Copy link
Member

arademaker commented Feb 7, 2018

Yes, we do have glosses with only 1-2 words. The tagged gloss corpus is not complete; not all glosses are entirely tagged, I talked with Christiane Fellbaum about it. Actually, this is an excellent work still waiting to be done.

corpus of glosses = tagged corpus

@vcvpaiva vcvpaiva changed the title glosses in PWN topology of PWN May 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants