Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'frozenset' object has no attribute 'add' #24

Open
isiktopcu opened this issue Dec 28, 2022 · 3 comments · Fixed by #25
Open

AttributeError: 'frozenset' object has no attribute 'add' #24

isiktopcu opened this issue Dec 28, 2022 · 3 comments · Fixed by #25

Comments

@isiktopcu
Copy link

Hi there, I've been using Zeyrek to lemmatize Turkish Tweets of len 250_000. It starts to lemmatize but after 10 minutes or so, I get this error.


AttributeError Traceback (most recent call last)
in

~\AppData\Roaming\Python\Python39\site-packages\zeyrek\morphology.py in lemmatize(self, text)
137 words = _tokenize_text(text)
138 for word in words:
--> 139 analysis = self._parse(word)
140 if len(analysis) == 0:
141 word_lemmas = [word]

~\AppData\Roaming\Python\Python39\site-packages\zeyrek\morphology.py in _parse(self, word)
94 """ Parses a word and returns SingleAnalysis result. """
95 normalized_word = _normalize(word)
---> 96 return self.analyzer.analyze(normalized_word)
97
98 def _analyze_text(self, text, verbose=False):

~\AppData\Roaming\Python\Python39\site-packages\zeyrek\rulebasedanalyzer.py in analyze(self, word)
29 paths.append(SearchPath.initial(candidate, tail))
30 # search graph.
---> 31 result_paths = self.search(paths)
32
33 # generate results from successful paths.

~\AppData\Roaming\Python\Python39\site-packages\zeyrek\rulebasedanalyzer.py in search(self, current_paths)
59 continue
60 # Creates new paths with outgoing and matching transitions.
---> 61 new_paths = self.advance(path)
62 logging.debug(f"\n--\nNew paths are: ")
63 for p in new_paths:

~\AppData\Roaming\Python\Python39\site-packages\zeyrek\rulebasedanalyzer.py in advance(self, path)
123 last_token = transition.last_template_token
124 if last_token.type_ == 'LAST_VOICED':
--> 125 attributes.add(PhoneticAttribute.ExpectsConsonant)
126 elif last_token.type_ == 'LAST_NOT_VOICED':
127 attributes.add(PhoneticAttribute.ExpectsVowel)

AttributeError: 'frozenset' object has no attribute 'add'

@obulat
Copy link
Owner

obulat commented Dec 29, 2022

Thank you for reporting the issue, @isiktopcu ! The PR I just merged should fix the issue. Please feel free to re-open if you still encounter it :)

@isiktopcu
Copy link
Author

FYI: It gives the error with two words : "ulemalık" and "nakliyatçılık". When it sees those words it gives this error. (I work with Turkish tweets) I exclude them from the dataset then it works just fine but just to let you know. Thank you very much.

@obulat
Copy link
Owner

obulat commented Feb 9, 2023

Thank you for examples, @isiktopcu! I'll look into it.

@obulat obulat reopened this Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants