Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help loading model #3

Closed
elyase opened this issue Feb 17, 2016 · 12 comments
Closed

Help loading model #3

elyase opened this issue Feb 17, 2016 · 12 comments

Comments

@elyase
Copy link

elyase commented Feb 17, 2016

I downloaded the trained model from:

https://index.spacy.io/models/reddit_vectors-1.0.1/archive.gz

How can I load this into a VectorMap or a gensim model in order to make similarity queries?

@henningpeters
Copy link
Contributor

The easiest way to download and install the model is by calling python -m sense2vec.download after installing sense2vec, e.g., via pip install -e git+git://github.com/spacy-io/sense2vec.git#egg=sense2vec. Please note that you'll need Blas/Atlas packages installed. On RedHad those are atlas and atlas-devel. You can then load the model as follows:

import sputnik
from sense2vec import about
from sense2vec.vectors import VectorMap

package = sputnik.package(about.__title__, about.__version__, about.__default_model__)
vector_map = VectorMap(128)
vector_map.load(package.path)

The code is still a bit rough to use, this will change before we officially release it on PyPI. Also, we would love to hear about your use case. If you want don't want to discuss this publicly please get in contact with me at hp@spacy.io.

@elyase
Copy link
Author

elyase commented Feb 17, 2016

Thanks for you answer. Unfortunately I am getting a:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-25a4baa64678> in <module>()
      5 package = sputnik.package(about.__title__, about.__version__, about.__default_model__)
      6 vector_map = VectorMap(128)
----> 7 vector_map.load(package.path)

/home/ubuntu/sense2vec/sense2vec/vectors.pyx in sense2vec.vectors.VectorMap.load (sense2vec/vectors.cpp:5276)()
     99         with open(path.join(data_dir, 'strings.json')) as file_:
    100             self.strings.load(file_)
--> 101         with open(path.join(data_dir, 'freqs.json')) as file_:
    102             freqs = json.load(file_)
    103         cdef uint64_t hashed

/home/ubuntu/sense2vec/sense2vec/vectors.pyx in sense2vec.vectors.VectorMap.load (sense2vec/vectors.cpp:5217)()
    100             self.strings.load(file_)
    101         with open(path.join(data_dir, 'freqs.json')) as file_:
--> 102             freqs = json.load(file_)
    103         cdef uint64_t hashed
    104         for hashed, freq in freqs:

ValueError: Value is too big

Any ideas?

I will write you an email with some details about our use case.

@henningpeters
Copy link
Contributor

Hmm, that's odd. Can you provide some info on your system (OS, Python, output of pip list, Blas lib, etc.)? Are you maybe on a 32 bit system? We only tested on 64 bit so far. Test this with sys.maxsize > 2**32. Another idea: try upgrading your ujson lib, maybe it's outdated/broken...

@elyase
Copy link
Author

elyase commented Feb 17, 2016

The machine is an M4 Deca Extra Large on EC2 running Ubuntu . I am using conda python 3.5, without MKL. This is my environment:

$ conda list
# packages in environment at /home/ubuntu/miniconda3/envs/reddit_gensim:
#
anaconda-client           1.2.2                    py35_0
boto                      2.39.0                    <pip>
bz2file                   0.98                      <pip>
cloudpickle               0.1.1                    py35_0
clyent                    1.2.1                    py35_0
cymem                     1.30                     py35_0
cython                    0.23.4                   py35_0
decorator                 4.0.6                    py35_0
gensim                    0.12.4                    <pip>
httpretty                 0.8.10                    <pip>
ipykernel                 4.2.2                    py35_0
ipython                   4.1.1                    py35_0
ipython-genutils          0.1.0                     <pip>
ipython-notebook          4.0.4                    py35_0
ipython_genutils          0.1.0                    py35_0
jinja2                    2.8                      py35_0
joblib                    0.9.4                     <pip>
jsonschema                2.4.0                    py35_0
jupyter-client            4.1.1                     <pip>
jupyter-core              4.0.6                     <pip>
jupyter_client            4.1.1                    py35_0
jupyter_core              4.0.6                    py35_0
libgfortran               1.0                           0
libsodium                 1.0.3                         0
markupsafe                0.23                     py35_0
mistune                   0.7.1                    py35_0
murmurhash                0.26.0                    <pip>
nbconvert                 4.1.0                    py35_0
nbformat                  4.0.1                    py35_0
nomkl                     1.0                           0
nose                      1.3.7                    py35_0
notebook                  4.1.0                    py35_0
numexpr                   2.5             np110py35_nomkl_0  [nomkl]
numpy                     1.10.4             py35_nomkl_0  [nomkl]
openblas                  0.2.14                        3
openssl                   1.0.2f                        0
path.py                   8.1.2                    py35_1
pexpect                   3.3                      py35_0
pickleshare               0.5                      py35_0
pip                       8.0.2                    py35_0
plac                      0.9.1                    py35_0
preshed                   0.44                     py35_0
ptyprocess                0.5                      py35_0
pygments                  2.1                      py35_0
python                    3.5.1                         0
python-dateutil           2.4.2                    py35_0
pytz                      2015.7                   py35_0
pyyaml                    3.11                     py35_1
pyzmq                     15.2.0                   py35_0
readline                  6.2                           2
requests                  2.9.1                    py35_0
scikit-learn              0.17            np110py35_nomkl_2  [nomkl]
scipy                     0.17.0          np110py35_nomkl_1  [nomkl]
semver                    2.4.0                     <pip>
sense2vec (/home/ubuntu/sense2vec) 0.1.0                     <pip>
setuptools                19.6.2                   py35_0
simplegeneric             0.8.1                    py35_0
six                       1.10.0                   py35_0
smart-open                1.3.2                     <pip>
spacy                     0.99                np110py35_0
sputnik                   0.9.0                     <pip>
sqlite                    3.9.2                         0
terminado                 0.5                      py35_1
text-unidecode            1.0                      py35_0
thinc                     4.0.0                    py35_0
tk                        8.5.18                        0
toolz                     0.7.4                     <pip>
tornado                   4.3                      py35_0
traitlets                 4.1.0                    py35_0
ujson                     1.33                     py35_0
wheel                     0.29.0                   py35_0
xz                        5.0.5                         1
yaml                      0.1.6                         0
zeromq                    4.1.3                         0
zlib                      1.2.8                         0

I did something hacky. I installed openblas from source but then installed the headers with sudo apt-get install libopenblas-dev because I couldn't get the missing "cblas.h" error go otherwise. This is how my setup.py looked like in the end when it installed correctly:

compile_options =  {'msvc'  : ['/Ox', '/EHsc'],
                    'other' : ['-O3', '-Wno-unused-function',
                               '-fopenmp', '-fno-stack-protector',
                               '-I/OpenBLAS']}
link_options    =  {'msvc'  : [],
                    'other' : ['-Wl,--no-undefined',
                               '-fopenmp', '-fno-stack-protector',
                               '-L/home/ubuntu/miniconda3/envs/reddit_gensim/lib',
                               '-L/OpenBLAS',
                               '-lopenblas']}

@elyase
Copy link
Author

elyase commented Feb 17, 2016

It would be ideal if the install would work with the included conda mkl but I couldn't find the headers.

@henningpeters
Copy link
Contributor

I think we can probably rule out Blas issues as the exception occured while just loading json. My ujson lib is 1.35, not sure it makes a difference. Also, I assume a M4 Deca Extra Large is 64 bit. Can you maybe try to replace usjon with normal json to rule out ujson as problem? Also, could you try to find out which is the value that is too big that raises the exception?

@henningpeters
Copy link
Contributor

All spaCy packages are also available via conda (https://anaconda.org/spacy), hence expect that once we have sense2vec ready for release to also maintain a conda package that "just works".

@elyase
Copy link
Author

elyase commented Feb 17, 2016

Upgrading ujson did it. Thanks a lot! Apparently there is some issue with the conda version.

@elyase elyase closed this as completed Feb 17, 2016
henningpeters added a commit that referenced this issue Feb 17, 2016
@henningpeters
Copy link
Contributor

Thanks, I was able to confirm (using pip) that only ujson >= 1.34 works, applied a patch. I think conda is not to be blamed here.

@tracek
Copy link

tracek commented Mar 9, 2016

Is there a way to provide a path to the model? I am setting up a Jupyter Hub environment with customised kernels and somehow the model is not being picked up, despite successful download. Outside Jupyter Notebooks the model is loaded correctly.

Thanks,
Lucas

@henningpeters
Copy link
Contributor

Sure, just leave out the sputnik calls

from sense2vec.vectors import VectorMap
vector_map = VectorMap(128)
vector_map.load(here_goes_your_path)

In that path VectorMap looks for following files: vec.bin, strings.json and freqs.json.

Regarding the Jupyter bug: could you please open a new issue with an explanation how to reproduce the problem?

@pathapatisivayya
Copy link

hi sir

from sense2vec.vectors import VectorMap

loading getting this error

ModuleNotFoundError: No module named 'sense2vec.vectors'

Name: sense2vec
Version: 1.0.0a9

python version = 3.6.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants