Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to limit the amount of data added amounting to approximately 10% of the original data #2

Closed
XY-1 opened this issue Feb 14, 2020 · 0 comments

Comments

@XY-1
Copy link

XY-1 commented Feb 14, 2020

I am deeply interested in your research paper, and I would like to make an additional research.
Could you tell me how did you limit the amount of data added amounting to approximately 10% of the original data, in detail?

  1. Did you ignore all the matched terms which are the same with specific entries?
    ---For example, if the word "thank" is decided to be the entry to be ignored, ignore all the word "thank" in the corpus.
    Or, did you ignore matched term depending on the opportunity of matching?
    -- For example, the word "thank" in the first sentence is ignored, but the word "thank" in the second sentence may be not ignored.

  2. Did you decide sentences in which you ignored the matched terms in advance?
    In other words, before the term matching, did you split sentences into 90% sentences and 10% sentences, and matched terms only to 10%? sentences?
    Or, As a result of ignoring term match, did sentences contain term annotations were added amounting to approximately 10% of the original data?

  3. Is it possible to ignore specific match terms when there are multiple match terms in one sentence?
    --For example, if the word "thank", "common", "vote" are matched in one sentence, is it possible to ignore only "thank"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants