Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nl] unseen error #7153

Open
ghost opened this issue Oct 2, 2022 · 2 comments
Open

[nl] unseen error #7153

ghost opened this issue Oct 2, 2022 · 2 comments
Assignees
Labels
Dutch Especially for Dutch

Comments

@ghost
Copy link

ghost commented Oct 2, 2022

Daar komt bij dat dit de aandacht afleid.=>afleidt.

@ghost ghost changed the title unseen error [nl] unseen error Oct 2, 2022
@danielnaber danielnaber added the Dutch Especially for Dutch label Oct 2, 2022
@ghost
Copy link
Author

ghost commented Oct 3, 2022

This specific case could be detected as a wkw:tgw:1ep at the end of the sentence, without 'ik' in it. Of course with exceptions for occasional overlap with other word forms.

@ghost ghost assigned ghost and LanguageTool-AS and unassigned ghost Oct 5, 2022
@ghost
Copy link
Author

ghost commented Oct 7, 2022

A more general solution (which also shows lots of crap from Wikipedia when testing, but it is a start:

<rule id="IK_VORM_ZONDER_IK" name="Zin met een ik-vorm, zonder 'ik'">
    <antipattern>
        <token skip="-1" postag="SENT_START"/>
        <token skip="-1">ik</token>
        <token postag="SENT_END"/>
    </antipattern>
    <antipattern>
        <token regexp="yes">en|of|,|;</token>
        <token postag="WKW:TGW:1EP"/>
    </antipattern>    
    <antipattern>
        <token postag="WKW:TGW:1EP"/>
        <token regexp="yes">je|jij</token>
    </antipattern>
    <pattern>
        <token><exception postag="SENT_START"/><exception>ik</exception></token>
        <marker>
            <token postag="WKW:TGW:1EP"><exception postag_regexp="yes" postag="ZNW.*|BNW.*|WKW:TGW:3EP|VGW|WKW:VTD:ONV|VNW.*|VRZ|ENM.*"/><exception regexp="yes">meer|maar</exception></token>
        </marker>
    </pattern>
    <message>Deze zin heeft een ik-vorm, maar er staat geen 'ik' in de zin. Is dit een dt-foutje, of is de zin ingekort door 'ik' weg te laten?</message>
    <suggestion><match no="2" postag="WKW:TGW:3EP"></match></suggestion>
    <example correction="afleidt">Daar komt bij dat dit de aandacht <marker>afleid</marker>.</example>
</rule>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dutch Especially for Dutch
Projects
None yet
Development

No branches or pull requests

2 participants