Skip to content

Commit

Permalink
* Exclude clusterings for words only seen 1 or 2 times, as their clus…
Browse files Browse the repository at this point in the history
…ters are unreliable
  • Loading branch information
syllog1sm committed Apr 17, 2015
1 parent cc4e395 commit 693c5a1
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion bin/init_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,10 @@ def _read_clusters(loc):
cluster, word, freq = line.split()
except ValueError:
continue
clusters[word] = cluster
# If the clusterer has only seen the word a few times, its cluster is
# unreliable.
if int(freq) >= 3:
clusters[word] = cluster
return clusters


Expand Down

0 comments on commit 693c5a1

Please sign in to comment.