Rules Improvement for French #38

Pantalaymon · 2021-12-12T22:30:58Z

Hello ,

As I will be using coreferee in a new project I am still working on improving the rules.

I added a few more rules in lang/fr/language_rules.py as well as a few tests in tests/fr to make sure they work as expected.
There is also some edits in lang/fr/data files which are used by the rules

Regarding the new rules, I don't know if you plan to use the same rules for the spacy native solution that you are developing but I just wanted to share that on top of the language specific rules for noun/anaphora - anaphora pairs, the system would greatly benefit from language specific rules for noun - noun coreferring pairs. For instance to prevent singular named entities (say John Doe) from coreferring with plural nouns (say the people) or gender-incompatible nouns.

Pantalaymon · 2022-01-20T16:50:56Z

Hi @richardpaulhudson
Are you still maintaining coreferee?
It would be really desirable for the last french version to be updated as those last commits fix a major issue with the output.

richardpaulhudson · 2022-01-31T08:55:01Z

Hi @Pantalaymon, thank you very much for this and please accept my sincere apologies for taking so unacceptably long to get back to you. Coreferee is still being maintained and will still be maintained in the future; with me having changed employers I seem to have missed the original PR notification in December.

I am currently doing experiments into ways of improving the accuracy specifically for English. The most likely outcome — although this is by no means set in stone — is that we will end up implementing a new library for English coreference. Coreferee will definitely still be supported for the other languages and it may well be that the results of the experiments point to some cross-language improvements that can be made to Coreferee as well.

Your suggestion to implement rules to filter noun-noun coreference sounds like a very good idea and I shall definitely look into this further.

Two questions about this PR:

Are the rules designed to be used with the existing model? Would it make sense to generate a new model?
If the rules improve the accuracy, would it make sense to specify the improved accuracy in https://github.com/msg-systems/coreferee#142-model-performance? (At the same time, I can see there is no easy way of measuring the new accuracy if we decide not to generate a new model.)

Pantalaymon · 2022-01-31T15:19:23Z

Hi @richardpaulhudson ,

Very interesting. So it would be a new library independent from base spacy?

Regarding my suggestion, I think that partly exceeds the original focus of coreferee which was anaphora resolution, Since the noun-noun pairing operates mostly on a cross-language level and a rule-based system . However if you really plan to start from this project as to implement a larger, multi-language coreference resolution solution for spacy, I am 100% convinced that specific language rules for noun-noun coreference would be worth designing.

Regarding your questions :

I have not retrained the model at all, so yes the rules work with the existing model. Although I think I have slightly modified the rules for mention definition (independent noun and anaphora) so retraining the model might result in better accuracy... or not. I am not sure if the noun-noun pairing rules affect the training of the neural ensemble at all... if it does I will definitely retrain it and compare the results when I have time.
Well I'm not sure since this table is about the accuracy of the neural ensemble between potential anaphoric pairs if I'm not mistake and not about the whole coreference chains.

By the way, regarding the evaluation of the whole coreference chains, I have been able to evaluate the tool for french with more usual metrics here by using the CONLL format. The results are not so good for the reasons exposed below but still ok.
I think the same method would be used to evaluate other languages supported by coreferee provided the corpus is converted to CONLL. Then only a few adapatations to each language (namely the separators in the conll loader and the dependencies to exclude from the building of mention phrases) would be required before you can run the coreference resolution scorer.
Dependending on the genres in the test corpora, it could yield better results than what I had for french.

Pantalaymon added 6 commits December 9, 2021 19:36

rules regarding grammatical compatibility of noun pairs

6803985

plural toponym exceptions

ec82e98

added a few mixed gender nouns

0832d03

grammatical and semantic rules for corefering nouns and anaphora

710ba0c

reduction of mixed gender person nouns

508d433

fixing mix gender noun compatibility

2c6f4a6

changing function names

111e08e

richardpaulhudson merged commit 5ffaa37 into msg-systems:development Feb 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rules Improvement for French #38

Rules Improvement for French #38

Pantalaymon commented Dec 12, 2021

Pantalaymon commented Jan 20, 2022

richardpaulhudson commented Jan 31, 2022

Pantalaymon commented Jan 31, 2022

Rules Improvement for French #38

Rules Improvement for French #38

Conversation

Pantalaymon commented Dec 12, 2021

Pantalaymon commented Jan 20, 2022

richardpaulhudson commented Jan 31, 2022

Pantalaymon commented Jan 31, 2022