Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if word exists #24

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

naidenovaleksei
Copy link
Contributor

@naidenovaleksei naidenovaleksei commented Nov 2, 2021

Try to solve #1
Details:

  • wordnet.synsets has been saved to check if word exist due to its using in other code (see link)
  • nltk.corpus.words isn't used to avoid another one dependency in nltk (nltk.download("words") will be required here)
  • class Game takes parameter vocab_path for reading all used English words from vocabulary
  • class Game takes parameter corpus_path for building vocabulary of all used words

naidenov-alex and others added 7 commits November 2, 2021 12:05
…ction-ml#23)

* remove dependency on google storage

* Modified return of LocalFasttextPlayer, added time and code fields

* Modifications for better separation of settings and code

* Better division of server vs local games

* Moved players into the main func

* delete empty server_settings.py from server folder

* Further separation of GAME_SCOPES

* Separation of flask_app players and Hat Game players
Copy link
Contributor

@aguschin aguschin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, @naidenovaleksei! 👍
I left some comments, let's discuss them.

self.used_words = corpus_to_words(corpus_path)
elif vocab_path is not None:
with open(vocab_path, encoding="utf-8") as f:
words = f.readlines()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use another variable instead of words here, because we use this name for another param in __init__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 26e6f0d

existing_words = [w for w in words if len(wordnet.synsets(w)) > 0]
def remove_non_existing_words(self, words):
if self.used_words is not None:
existing_words = [w for w in words if w in self.used_words]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a situation when initial words you pass to __init__ aren't all there in self.used_words. I.e. we guess the word which is not in self.used_words and will be removed. Let's add words to self.used_words if the later is not empty (in __init__ method).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this will fix this strange behaviour, but I would check if that's not wordnet.synsets filtering out those words, and can we fix it with using nltk.corpus.words.

Copy link
Contributor Author

@naidenovaleksei naidenovaleksei Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add words to self.used_words if the later is not empty (in init method).

Oh, sure. Now it seems like a bug. Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add words to self.used_words if the later is not empty (in init method).

added in 26e6f0d

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this will fix this strange behaviour

yes, adding words for guessing to whitelist fixed this strange behaviour

I would check if that's not wordnet.synsets filtering out those words, and can we fix it with using nltk.corpus.words

wordnet.synsets really filter rare English words, but nltk.corpus.words do the same (see #1 (comment))

@@ -13,6 +13,7 @@
import the_hat_game.nltk_setup # noqa: F401
from the_hat_game.loggers import c_handler, logger
from the_hat_game.players import RemotePlayer
from data.utils import corpus_to_words
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data/ migrated to server/ folder. The idea was to split the logic into two parts (the-hat-game part - only game logic, don't know anything about server implementation; server - everything about working with cloud, uploading texts, etc).
So maybe we should move some data functions to the-hat-game/data.py. Those which can be used in the-hat-game and aren't related to server directly. Namely, sent_2_words, corpus_to_words, and maybe save_words (in this case we should remove default vocab_path arg value, because it's related to server implementation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 0481ab4

sent_2_words, corpus_to_words were added, save_words not (it is not used anywhere)

@naidenovaleksei naidenovaleksei changed the title Check if word exists WIP: Check if word exists Nov 22, 2021
@naidenovaleksei naidenovaleksei changed the title WIP: Check if word exists Check if word exists Nov 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants