Skip to content

Commit

Permalink
Use marisa_trie for storing root word corpus
Browse files Browse the repository at this point in the history
Storing the root word corpus on a linear list is the least efficient way
to store it. Here, I switched it to use marisa_trie - a memory efficient
Trie datastructure based on marisa-trie C++ library. It is expected to
provide lesser space and time complexity.
  • Loading branch information
balasankarc committed May 29, 2016
1 parent 551c1d7 commit 0e9f1b6
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 4 deletions.
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ python:
- "3.4"
- "3.5"
install:
- pip install -r requirements.txt
- pip install -r test-requirements.txt
script: make travis
after_success: coveralls
Expand Down
9 changes: 6 additions & 3 deletions libindic/stemmer/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@

import os

import marisa_trie


class Malayalam:
"""
Expand All @@ -38,10 +40,11 @@ def __init__(self):
self.dictionary = self.dictionary_file.readlines()
self.dictionary_file.close()
try:
self.dictionary = [x.strip().decode('utf-8')
for x in self.dictionary]
self.dictionary = marisa_trie.Trie([x.strip().decode('utf-8')
for x in self.dictionary])
except:
self.dictionary = [x.strip() for x in self.dictionary]
self.dictionary = marisa_trie.Trie(
[x.strip() for x in self.dictionary])

def singleencode(self, word):
'''
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
marisa_trie
4 changes: 3 additions & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,17 @@
# and then run "tox" from this directory.

[tox]
envlist = py27, py35, pypy, pep8
envlist = py27, py35, pep8

[testenv]
commands = {envpython} setup.py test
deps =
-rrequirements.txt
-rtest-requirements.txt

[testenv:pep8]
deps=
-rrequirements.txt
-rtest-requirements.txt
commands=
flake8 libindic

0 comments on commit 0e9f1b6

Please sign in to comment.