Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging adj lexical entries #284

Closed
wants to merge 2 commits into from
Closed

Merging adj lexical entries #284

wants to merge 2 commits into from

Conversation

bbou
Copy link

@bbou bbou commented Feb 13, 2020

Following issues #180 and #283

The adjective lexical entries have been merged and named without adjective position (ewn-aware-p--a + ewn-aware-a -> ewn-aware-a). They group the senses originally split among (otherwise similar) multiple lexentries.

The naming follows the format specification. see format

Sense ids are not affected and preserve adj position information (ewn-aware-p--a-00191603-01 ewn-aware-a-01984219-02). The order in which they appear is preserved under the new grouping.

The schema is unchanged (adj position appears only within the sense id) and has not been exported to an attribute or element. How about provisionally using 'note' or 'dc:type' which are legit ?

Source code
https://github.com/1313ou/ewn-transformation3/blob/master/merge-lexentries3.xsl

This requires a XSLT3.0 transformer and Saxon is used.

Sorry I don't know what pretty-print formatter you use so I left it the way Saxon outputs it.

@jmccrae
Copy link
Member

jmccrae commented Feb 13, 2020

Unfortunately I am going to have to reject this PR as it reformats the source code in a way that would make it hard to track changes.

See Contribution Guidelines: "Avoid the use of automatic tools or formatters to keep commits small and trackable."

@1313ou
Copy link
Contributor

1313ou commented Feb 13, 2020 via email

@jmccrae
Copy link
Member

jmccrae commented Feb 15, 2020

Hi, so I reformatted this PR to the format. You can see the diff here

https://github.com/globalwordnet/english-wordnet/compare/master..f273f79697301481defb93563d60409e39def994

The diff uncovered a small bug with the lemmas for 'tete-a-tete', 'cock-a-hoop', 'two-a-penny'

A new PR for this is at #287

@jmccrae jmccrae closed this Feb 15, 2020
@1313ou
Copy link
Contributor

1313ou commented Feb 15, 2020

-AFAIK there is no pretty-printing policy, or is there ? The data looks like the output of xmllint but it's not, at least with default settings.
-Some of the garbage is due to the DTD (lexicalized = "true"). More on the DTD later.
-The transformer has been fixed to guard against this : link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants