Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: #auxfun macros (and other #funs too if feasible?) to distinguish word order #23

Open
inariksit opened this issue Feb 4, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@inariksit
Copy link
Member

Current behaviour, it treats phrases like "Section 10" (apposition) and "10 sections" identically.

@inariksit inariksit added the enhancement New feature or request label Feb 7, 2022
@inariksit
Copy link
Member Author

inariksit commented Feb 10, 2022

Another issue: attachment of modifiers, suppose a phrase like

"each portion of a building separated by walls"

In dt, I get these two options:

#1
AdjCN
    ( AdvCN ( UseN portion_N )
        ( PrepNP of_Prep
            ( DetCN ( DetQuant IndefArt NumSg ) ( UseN building_N ) )
        )
    )
    ( PassVAgent separate_V
        ( DetCN (DetQuant IndefArt NumPl)  ( UseN wall_N ) )
    ): CN[2,3,4,5,6,8]

#LIN: "portion of a building separated by walls"

#2
AdvNP
    ( DetCN each_Det
        ( AdjCN ( UseN portion_N )
            ( PassVAgent separate_V
                ( DetCN (DetQuant IndefArt NumPl)  ( UseN wall_N ) )
          )
    )
   ( PrepNP of_Prep
        ( DetCN ( DetQuant IndefArt NumSg ) ( UseN building_N ) )
    ): NP[1,2,3,4,5,6,8]
#LIN: "portion separated by walls of a building"

However, dt doesn't contain the NP version of 1, which would be just to apply DetCN each_Det on that tree. I wonder if some pruning step removes the NP version of 1, because it covers as many words as 2? (I tried to run the example without pruneDevTree, but the particular sentence is very long and the program was taking a long time. If you think that might be the reason, I can produce a shorter version of the sentence and try again.)

In any case, I can only imagine that the NP-version of 1 would also be constructed, but it's thrown away before it can be prioritised. And I would like to prioritise it, because the attachment matches the word order: both "building" and "walls" are children of "portion", but in 1, building is more immediately attached.

@inariksit
Copy link
Member Author

I can solve the particular case with an #auxfun that says, every time when a NOUN has an acl and nmod child, put nmod before acl. But this is not ideal for scalability.

With an explicit DISTANCE=-1* or similar, I could duplicate that rule to say that whatever is closer to the head in the original word order, gets attached first in the tree. This is tedious, but finite: there are finite amount of relations, and finite combinations that appear together in real life texts.

Could one make a more fundamental change in the algorithm that wouldn't require explicit instructions about word order? Like ranking higher trees whose subtrees are attached according to distance in the original string. I don't know if this is feasible at all/requires too much rewriting. I can get by with auxfuns, just thinking aloud here.

@inariksit
Copy link
Member Author

Here's a conllu file to test with

1	Each	each	DET	DT	_	2	det	_	_
2	portion	portion	NOUN	NN	Number=Sing	10	nsubj	_	_
3	of	of	ADP	IN	_	5	case	_	_
4	a	a	DET	DT	Definite=Ind|PronType=Art	5	det	_	_
5	building	building	NOUN	NN	Number=Sing	2	nmod	_	_
6	separated	separate	VERB	VBN	Tense=Past|VerbForm=Part	2	acl	_	_
7	by	by	ADP	IN	_	8	case	_	_
8	walls	wall	NOUN	NNS	Number=Plur	6	obl	_	_
9	is	be	AUX	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	10	cop	_	_
10	separate	separate	ADJ	JJ	Degree=Pos	0	root	_	SpacesAfter=\n

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant