You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, I've been using Zeyrek to lemmatize Turkish Tweets of len 250_000. It starts to lemmatize but after 10 minutes or so, I get this error.
AttributeError Traceback (most recent call last)
in
~\AppData\Roaming\Python\Python39\site-packages\zeyrek\morphology.py in lemmatize(self, text)
137 words = _tokenize_text(text)
138 for word in words:
--> 139 analysis = self._parse(word)
140 if len(analysis) == 0:
141 word_lemmas = [word]
~\AppData\Roaming\Python\Python39\site-packages\zeyrek\morphology.py in _parse(self, word)
94 """ Parses a word and returns SingleAnalysis result. """
95 normalized_word = _normalize(word)
---> 96 return self.analyzer.analyze(normalized_word)
97
98 def _analyze_text(self, text, verbose=False):
~\AppData\Roaming\Python\Python39\site-packages\zeyrek\rulebasedanalyzer.py in search(self, current_paths)
59 continue
60 # Creates new paths with outgoing and matching transitions.
---> 61 new_paths = self.advance(path)
62 logging.debug(f"\n--\nNew paths are: ")
63 for p in new_paths:
FYI: It gives the error with two words : "ulemalık" and "nakliyatçılık". When it sees those words it gives this error. (I work with Turkish tweets) I exclude them from the dataset then it works just fine but just to let you know. Thank you very much.
Hi there, I've been using Zeyrek to lemmatize Turkish Tweets of len 250_000. It starts to lemmatize but after 10 minutes or so, I get this error.
AttributeError Traceback (most recent call last)
in
~\AppData\Roaming\Python\Python39\site-packages\zeyrek\morphology.py in lemmatize(self, text)
137 words = _tokenize_text(text)
138 for word in words:
--> 139 analysis = self._parse(word)
140 if len(analysis) == 0:
141 word_lemmas = [word]
~\AppData\Roaming\Python\Python39\site-packages\zeyrek\morphology.py in _parse(self, word)
94 """ Parses a word and returns SingleAnalysis result. """
95 normalized_word = _normalize(word)
---> 96 return self.analyzer.analyze(normalized_word)
97
98 def _analyze_text(self, text, verbose=False):
~\AppData\Roaming\Python\Python39\site-packages\zeyrek\rulebasedanalyzer.py in analyze(self, word)
29 paths.append(SearchPath.initial(candidate, tail))
30 # search graph.
---> 31 result_paths = self.search(paths)
32
33 # generate results from successful paths.
~\AppData\Roaming\Python\Python39\site-packages\zeyrek\rulebasedanalyzer.py in search(self, current_paths)
59 continue
60 # Creates new paths with outgoing and matching transitions.
---> 61 new_paths = self.advance(path)
62 logging.debug(f"\n--\nNew paths are: ")
63 for p in new_paths:
~\AppData\Roaming\Python\Python39\site-packages\zeyrek\rulebasedanalyzer.py in advance(self, path)
123 last_token = transition.last_template_token
124 if last_token.type_ == 'LAST_VOICED':
--> 125 attributes.add(PhoneticAttribute.ExpectsConsonant)
126 elif last_token.type_ == 'LAST_NOT_VOICED':
127 attributes.add(PhoneticAttribute.ExpectsVowel)
AttributeError: 'frozenset' object has no attribute 'add'
The text was updated successfully, but these errors were encountered: