Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

similarity detection: filter unimportant words out #61

Closed
tstromberg opened this issue May 6, 2020 · 1 comment · Fixed by #66
Closed

similarity detection: filter unimportant words out #61

tstromberg opened this issue May 6, 2020 · 1 comment · Fixed by #66
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@tstromberg
Copy link
Collaborator

We can get a higher hit-rate for similarity if we filter out a basic list of words which don't impact the nature of an issue or PR:

There are certain adjectives and prepositions that come to mind:

a
about
an
and
are
be
by
can
completely
does
extremely
for
has
have
how
in
is
maybe
of
on
or
should
some
something
still
than
the
then
to
use
very
via
when
why
with

However: we need to be careful not to skew things so badly that the min-similarity index is no longer helpful.

@tstromberg tstromberg changed the title similarity detection: implement initial unimportant word list similarity detection: filter unimportant words out May 6, 2020
@tstromberg tstromberg added enhancement New feature or request help wanted Extra attention is needed labels May 6, 2020
@tstromberg
Copy link
Collaborator Author

If someone wants to take it this on, I recommend adding the code within compressTitle:

func compressTitle(t string) string {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant