extra-model

Code to run the Extra algorithm for unsupervised topic/aspect extraction on English texts.

Quick start

IMPORTANT:

When running Extra inside docker-container, make sure that Docker process has enough resources. For example, on Mac/Windows it should have at least 8 Gb of RAM available to it. Read More about RAM Requirements
GitHub repo does not come with Glove Embeddings. See section Downloading Embeddings for how to download the required embeddings.

Using docker-compose

First, build the image:

docker-compose build

Then, run following command to make sure that extra-model was installed correctly:

docker-compose run test

Downloading Embeddings

Next step is to download the embeddings (we use Glove from Stanford in this project).

To download the required embeddings, run the following command:

docker-compose run --rm setup

The embeddings will be downloaded, unzipped and formatted into a space-efficient format. Files will be saved in the embeddings/ directory in the root of the project directory. If the process fails, it can be safely restarted. If you want to restart the process with new files, delete all files except README.md in the embeddings/ directory.

[Optional] Run `docker-compose build` again

After you've downloaded the embeddings, you may want to run docker-compose build again. This will build an image with embeddings already present inside the image.

The tradeoff here is that the image will be much bigger, but you won't spend ~2 minutes each time you run extra-model waiting for embeddings to be mounted into the container. On the other hand, building an image with embeddings in the context will increase build time from ~3 minutes to ~10 minutes.

Run `extra-model`

Finally, running extra-model is as simple as:

docker-compose run extra-model /package/tests/resources/100_comments.csv

NOTE: when using this approach, input file should be mounted inside the container. By default, everything from extra-model folder will be mounted to /package/ folder. This can be changed in docker-compose.yaml

This will produce a result.csv file in /io/ (default setting) folder.

Location of the output can be changed by supplying second path, e.g.:

docker-compose run extra-model /package/tests/resources/100_comments.csv /io/another_folder

The output filename can also be changed if you want it to be something else than result.csv by supplying a third argument:

docker-compose run extra-model /package/tests/resources/100_comments.csv /io/another_folder another_filename.csv

More examples, as well as an explanation of input/output are available in official documentation.

Using command line

Install `extra-model`

First, install extra-model via pip:

pip install extra-model

Downloading Embeddings

Next, run the following to download and set up the required embeddings (we use Glove from Stanford in this project):

extra-model-setup

The embeddings will be downloaded, unzipped and formatted into a space-efficient format and saved in /embeddings.

If the process fails, it can be safely restarted. If you want to restart the process with new files, delete all files except README.md in the embeddings directory.

Run `extra-model`

Once set up, running extra-model is as simple as:

extra-model tests/resources/100_comments.csv

This will produce a result.csv file in /io. If you want to change the output directory this can be done by providing it as a second argument to extra-model like so:

extra-model tests/resources/100_comments.csv /path/to/store/output

The output filename can also be changed if you want it to be something else than result.csv by supplying a third argument to extra-model:

docker-compose run extra-model tests/resources/100_comments.csv /path/to/store/output another_filename.csv

Using as a Python package

Install `extra-model`

First, install extra-model via pip:

pip install extra-model

Downloading Embeddings

Next, use either the extra-model-setup CLI or docker-compose to download and set up the required embeddings (we use Glove from Stanford in this project):

extra-model-setup

or

docker-compose run --rm setup

The embeddings will be downloaded, unzipped and formatted into a space-efficient format. For the Docker based workflow, the embeddings will be saved to the embeddings directory. For the CLI workflow, by default, files will be saved in /embeddings. You can set another directory by providing it as an argument when running extra-model-setup like so:

extra-model-setup /path/to/store/embeddings

If the process fails, it can be safely restarted. If you want to restart the process with new files, delete all files except README.md in the embeddings directory.

Use `extra-model` as a Python package

Once set up, you can use extra-model by calling the run() function in extra_model/_run.py :

from extra_model._run import run

run(
    input_path=Path("input/path/file.csv"),
    output_path=Path("output/path")
)

This will process input/path and produce a result.csv file in output/path. If you want to change the output filename to be something different than result.csv, you can do os by providing an additional argument to run():

from extra_model._run import run

run(
    input_path=Path("input/path"),
    output_path=Path("output/path"),
    output_filename=Path("output_filename.csv")
)

More examples, as well as an explanation of input/output are available in official documentation.

Authors

extra-model was written by mbalyasin@wayfair.com, mmozer@wayfair.com.

Name		Name	Last commit message	Last commit date
Latest commit History 281 Commits
.github		.github
docker		docker
docs		docs
embeddings		embeddings
extra_model		extra_model
io		io
tests		tests
.bandit		.bandit
.dockerignore		.dockerignore
.flake8		.flake8
.gitignore		.gitignore
.isort.cfg		.isort.cfg
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yaml		docker-compose.yaml
mkdocs.yml		mkdocs.yml
mypy.ini		mypy.ini
publish_documentation.sh		publish_documentation.sh
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

extra-model

Quick start

Using docker-compose

Downloading Embeddings

[Optional] Run `docker-compose build` again

Run `extra-model`

Using command line

Install `extra-model`

Downloading Embeddings

Run `extra-model`

Using as a Python package

Install `extra-model`

Downloading Embeddings

Use `extra-model` as a Python package

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

extra-model

Quick start

Using docker-compose

Downloading Embeddings

[Optional] Run docker-compose build again

Run extra-model

Using command line

Install extra-model

Downloading Embeddings

Run extra-model

Using as a Python package

Install extra-model

Downloading Embeddings

Use extra-model as a Python package

Authors

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

[Optional] Run `docker-compose build` again

Run `extra-model`

Install `extra-model`

Run `extra-model`

Install `extra-model`

Use `extra-model` as a Python package

Packages