Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about problems in ingest related to pynndescent #133

Open
HelloWorldLTY opened this issue Jul 24, 2021 · 12 comments
Open

Questions about problems in ingest related to pynndescent #133

HelloWorldLTY opened this issue Jul 24, 2021 · 12 comments

Comments

@HelloWorldLTY
Copy link

Sorry to disturb, but it seems that the ingest method in scanpy meets some problems caused by pynndescent and numba. Here are the details:
KeyError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/numba/core/caching.py in save(self, key, data)
486 # If key already exists, we will overwrite the file
--> 487 data_name = overloads[key]
488 except KeyError:

KeyError: ((array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C), array(float32, 2d, C), type(CPUDispatcher(<function squared_euclidean at 0x7fdefdb13830>)), array(int64, 1d, C), float64), ('x86_64-unknown-linux-gnu', 'broadwell', '+64bit,+adx,+aes,+avx,+avx2,-avx512bf16,-avx512bitalg,-avx512bw,-avx512cd,-avx512dq,-avx512er,-avx512f,-avx512ifma,-avx512pf,-avx512vbmi,-avx512vbmi2,-avx512vl,-avx512vnni,-avx512vpopcntdq,+bmi,+bmi2,-cldemote,-clflushopt,-clwb,-clzero,+cmov,+cx16,+cx8,-enqcmd,+f16c,+fma,-fma4,+fsgsbase,+fxsr,-gfni,+invpcid,-lwp,+lzcnt,+mmx,+movbe,-movdir64b,-movdiri,-mwaitx,+pclmul,-pconfig,-pku,+popcnt,-prefetchwt1,+prfchw,-ptwrite,-rdpid,+rdrnd,+rdseed,+rtm,+sahf,-sgx,-sha,-shstk,+sse,+sse2,+sse3,+sse4.1,+sse4.2,-sse4a,+ssse3,-tbm,-vaes,-vpclmulqdq,-waitpkg,-wbnoinvd,-xop,+xsave,-xsavec,+xsaveopt,-xsaves'), ('308c49885ad3c35a475c360e21af1359caa88c78eb495fa0f5e8c6676ae5019e', 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'))

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
13 frames
in ()
----> 1 sc.tl.ingest(adata, adata_ref, obs='louvain') #ingest

/usr/local/lib/python3.7/dist-packages/scanpy/tools/_ingest.py in ingest(adata, adata_ref, obs, embedding_method, labeling_method, neighbors_key, inplace, **kwargs)
131
132 if obs is not None:
--> 133 ing.neighbors(**kwargs)
134 for i, col in enumerate(obs):
135 ing.map_labels(col, labeling_method[i])

/usr/local/lib/python3.7/dist-packages/scanpy/tools/_ingest.py in neighbors(self, k, queue_size, epsilon, random_state)
469 self._nnd_idx.search_rng_state = rng_state
470
--> 471 self._indices, self._distances = self._nnd_idx.query(test, k, epsilon)
472
473 else:

/usr/local/lib/python3.7/dist-packages/pynndescent/pynndescent_.py in query(self, query_data, k, epsilon)
1564 """
1565 if not hasattr(self, "_search_graph"):
-> 1566 self._init_search_graph()
1567
1568 if not self._is_sparse:

/usr/local/lib/python3.7/dist-packages/pynndescent/pynndescent_.py in _init_search_graph(self)
1061 self._distance_func,
1062 self.rng_state,
-> 1063 self.diversify_prob,
1064 )
1065 reverse_graph.eliminate_zeros()

/usr/local/lib/python3.7/dist-packages/numba/core/dispatcher.py in _compile_for_args(self, *args, **kws)
432 e.patch_message('\n'.join((str(e).rstrip(), help_msg)))
433 # ignore the FULL_TRACEBACKS config, this needs reporting!
--> 434 raise e
435
436 def inspect_llvm(self, signature=None):

/usr/local/lib/python3.7/dist-packages/numba/core/dispatcher.py in _compile_for_args(self, *args, **kws)
365 argtypes.append(self.typeof_pyval(a))
366 try:
--> 367 return self.compile(tuple(argtypes))
368 except errors.ForceLiteralArg as e:
369 # Received request for compiler re-entry with the list of arguments

/usr/local/lib/python3.7/dist-packages/numba/core/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
30 def _acquire_compile_lock(*args, **kwargs):
31 with self:
---> 32 return func(*args, **kwargs)
33 return _acquire_compile_lock
34

/usr/local/lib/python3.7/dist-packages/numba/core/dispatcher.py in compile(self, sig)
823 raise e.bind_fold_arguments(folded)
824 self.add_overload(cres)
--> 825 self._cache.save_overload(sig, cres)
826 return cres.entry_point
827

/usr/local/lib/python3.7/dist-packages/numba/core/caching.py in save_overload(self, sig, data)
669 """
670 with self._guard_against_spurious_io_errors():
--> 671 self._save_overload(sig, data)
672
673 def _save_overload(self, sig, data):

/usr/local/lib/python3.7/dist-packages/numba/core/caching.py in _save_overload(self, sig, data)
679 key = self._index_key(sig, _get_codegen(data))
680 data = self._impl.reduce(data)
--> 681 self._cache_file.save(key, data)
682
683 @contextlib.contextmanager

/usr/local/lib/python3.7/dist-packages/numba/core/caching.py in save(self, key, data)
494 break
495 overloads[key] = data_name
--> 496 self._save_index(overloads)
497 self._save_data(data_name, data)
498

/usr/local/lib/python3.7/dist-packages/numba/core/caching.py in _save_index(self, overloads)
540 def _save_index(self, overloads):
541 data = self._source_stamp, overloads
--> 542 data = self._dump(data)
543 with self._open_for_write(self._index_path) as f:
544 pickle.dump(self._version, f, protocol=-1)

/usr/local/lib/python3.7/dist-packages/numba/core/caching.py in _dump(self, obj)
568
569 def _dump(self, obj):
--> 570 return pickle.dumps(obj, protocol=-1)
571
572 @contextlib.contextmanager

TypeError: can't pickle weakref objects

How to solve this problem? Thanks.
The code comes from :https://scanpy-tutorials.readthedocs.io/en/latest/integrating-data-using-ingest.html

@HelloWorldLTY
Copy link
Author

For more information, you can take a look athttps://github.com/scverse/scanpy/issues/1951

@lmcinnes
Copy link
Owner

This is an issue with numba caching having issues saving / loading compiled versions of functions. Are you running on colab by any chance? I cannot reproduce this issue except on colab, so if you have a specific system where you can reproduce it that would be good to know. This can be worked around (albiet strangely). See issue #131

@HelloWorldLTY
Copy link
Author

This happens on colab. I can provide you me setting plan:
Package Version


absl-py 0.12.0
alabaster 0.7.12
albumentations 0.1.12
altair 4.1.0
anndata 0.7.6
anndata2ri 1.0.6
annoy 1.17.0
appdirs 1.4.4
argon2-cffi 20.1.0
arviz 0.11.2
astor 0.8.1
astropy 4.2.1
astunparse 1.6.3
async-generator 1.10
atari-py 0.2.9
atomicwrites 1.4.0
attrs 21.2.0
audioread 2.1.9
autograd 1.3
Babel 2.9.1
backcall 0.2.0
beautifulsoup4 4.6.3
bleach 3.3.0
blis 0.4.1
bokeh 2.3.3
Bottleneck 1.3.2
branca 0.4.2
bs4 0.0.1
CacheControl 0.12.6
cached-property 1.5.2
cachetools 4.2.2
catalogue 1.0.0
certifi 2021.5.30
cffi 1.14.6
cftime 1.5.0
chardet 3.0.4
charset-normalizer 2.0.2
click 7.1.2
cloudpickle 1.3.0
cmake 3.12.0
cmdstanpy 0.9.5
colorcet 2.0.6
colorlover 0.3.0
community 1.0.0b1
contextlib2 0.5.5
convertdate 2.3.2
coverage 3.7.1
coveralls 0.5
crcmod 1.7
cufflinks 0.17.3
cupy-cuda101 9.1.0
cvxopt 1.2.6
cvxpy 1.0.31
cycler 0.10.0
cymem 2.0.5
Cython 0.29.23
daft 0.0.4
dask 2.12.0
datascience 0.10.6
debugpy 1.0.0
decorator 4.4.2
defusedxml 0.7.1
Deprecated 1.2.12
descartes 1.1.0
dill 0.3.4
distributed 1.25.3
dlib 19.18.0
dm-tree 0.1.6
docopt 0.6.2
docutils 0.17.1
dopamine-rl 1.0.5
dunamai 1.5.5
earthengine-api 0.1.272
easydict 1.9
ecos 2.0.7.post1
editdistance 0.5.3
en-core-web-sm 2.2.5
entrypoints 0.3
ephem 4.0.0.2
et-xmlfile 1.1.0
fa2 0.3.5
fastai 1.0.61
fastdtw 0.3.4
fastprogress 1.0.0
fastrlock 0.6
fbprophet 0.7.1
feather-format 0.4.1
filelock 3.0.12
firebase-admin 4.4.0
fix-yahoo-finance 0.0.22
Flask 1.1.4
flatbuffers 1.12
folium 0.8.3
future 0.16.0
gast 0.4.0
GDAL 2.2.2
gdown 3.6.4
gensim 3.6.0
geographiclib 1.52
geopy 1.17.0
get-version 3.5
gin-config 0.4.0
glob2 0.7
google 2.0.3
google-api-core 1.26.3
google-api-python-client 1.12.8
google-auth 1.32.1
google-auth-httplib2 0.0.4
google-auth-oauthlib 0.4.4
google-cloud-bigquery 1.21.0
google-cloud-bigquery-storage 1.1.0
google-cloud-core 1.0.3
google-cloud-datastore 1.8.0
google-cloud-firestore 1.7.0
google-cloud-language 1.2.0
google-cloud-storage 1.18.1
google-cloud-translate 1.5.0
google-colab 1.0.0
google-pasta 0.2.0
google-resumable-media 0.4.1
googleapis-common-protos 1.53.0
googledrivedownloader 0.4
graphtools 1.5.2
graphviz 0.10.1
greenlet 1.1.0
grpcio 1.34.1
gspread 3.0.1
gspread-dataframe 3.0.8
gym 0.17.3
h5py 2.10.0
HeapDict 1.0.1
hijri-converter 2.1.3
holidays 0.10.5.2
holoviews 1.14.4
html5lib 1.0.1
httpimport 0.5.18
httplib2 0.17.4
httplib2shim 0.0.3
humanize 0.5.1
hyperopt 0.1.2
ideep4py 2.0.0.post3
idna 2.10
imageio 2.4.1
imagesize 1.2.0
imap 1.0.0
imbalanced-learn 0.4.3
imblearn 0.0
imgaug 0.2.9
importlib-metadata 4.6.1
importlib-resources 5.2.0
imutils 0.5.4
inflect 2.1.0
iniconfig 1.1.1
install 1.3.4
intel-openmp 2021.3.0
intervaltree 2.1.0
ipykernel 4.10.1
ipython 5.5.0
ipython-genutils 0.2.0
ipython-sql 0.3.9
ipywidgets 7.6.3
itsdangerous 1.1.0
jax 0.2.17
jaxlib 0.1.69+cuda110
jdcal 1.4.1
jedi 0.18.0
jieba 0.42.1
Jinja2 2.11.3
joblib 1.0.1
jpeg4py 0.1.4
jsonschema 2.6.0
jupyter 1.0.0
jupyter-client 5.3.5
jupyter-console 5.2.0
jupyter-core 4.7.1
jupyterlab-pygments 0.1.2
jupyterlab-widgets 1.0.0
kaggle 1.5.12
kapre 0.3.5
Keras 2.4.3
keras-nightly 2.5.0.dev2021032900
Keras-Preprocessing 1.1.2
keras-vis 0.4.1
kiwisolver 1.3.1
korean-lunar-calendar 0.2.1
librosa 0.8.1
lightgbm 2.2.3
llvmlite 0.34.0
lmdb 0.99
loompy 3.0.6
louvain 0.7.0
LunarCalendar 0.0.9
lxml 4.2.6
magic-impute 3.0.0
Markdown 3.3.4
MarkupSafe 2.0.1
matplotlib 3.2.2
matplotlib-inline 0.1.2
matplotlib-venn 0.11.6
memory-profiler 0.58.0
missingno 0.5.0
mistune 0.8.4
mizani 0.6.0
mkl 2019.0
mlxtend 0.14.0
mnnpy 0.1.9.5
more-itertools 8.8.0
moviepy 0.2.3.5
mpmath 1.2.1
msgpack 1.0.2
multiprocess 0.70.12.2
multitasking 0.0.9
murmurhash 1.0.5
music21 5.5.0
natsort 5.5.0
nbclient 0.5.3
nbconvert 5.6.1
nbformat 5.1.3
nest-asyncio 1.5.1
netCDF4 1.5.7
networkx 2.5.1
nibabel 3.0.2
nltk 3.2.5
notebook 5.3.1
numba 0.51.2
numexpr 2.7.3
numpy 1.18.1
numpy-groupies 0.9.13
nvidia-ml-py3 7.352.0
oauth2client 4.1.3
oauthlib 3.1.1
okgrade 0.4.3
opencv-contrib-python 4.1.2.30
opencv-python 4.1.2.30
openpyxl 2.5.9
opt-einsum 3.3.0
osqp 0.6.2.post0
packaging 21.0
palettable 3.3.0
pandas 1.1.5
pandas-datareader 0.9.0
pandas-gbq 0.13.3
pandas-profiling 1.4.1
pandocfilters 1.4.3
panel 0.11.3
param 1.11.1
parso 0.8.2
pathlib 1.0.1
patsy 0.5.1
pexpect 4.8.0
phate 1.0.7
pickleshare 0.7.5
Pillow 7.1.2
pip 21.1.3
pip-tools 4.5.1
plac 1.1.3
plotly 4.4.1
plotnine 0.6.0
pluggy 0.7.1
pooch 1.4.0
portpicker 1.3.9
prefetch-generator 1.0.1
preshed 3.0.5
prettytable 2.1.0
progressbar2 3.38.0
prometheus-client 0.11.0
promise 2.3
prompt-toolkit 1.0.18
protobuf 3.17.3
psutil 5.4.8
psycopg2 2.7.6.1
ptyprocess 0.7.0
py 1.10.0
pyarrow 3.0.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycocotools 2.0.2
pycparser 2.20
pyct 0.4.8
pydata-google-auth 1.2.0
pydot 1.3.0
pydot-ng 2.0.0
pydotplus 2.0.2
PyDrive 1.3.1
pyemd 0.5.1
pyerfa 2.0.0
pyglet 1.5.0
Pygments 2.6.1
pygobject 3.26.1
PyGSP 0.5.1
pymc3 3.11.2
PyMeeus 0.5.11
pymongo 3.11.4
pymystem3 0.2.0
pynndescent 0.5.4
PyOpenGL 3.1.5
pyparsing 2.4.7
pyrsistent 0.18.0
pysndfile 1.3.8
PySocks 1.7.1
pystan 2.19.1.1
pytest 3.6.4
python-apt 0.0.0
python-chess 0.23.11
python-dateutil 2.8.1
python-igraph 0.9.6
python-louvain 0.15
python-slugify 5.0.2
python-utils 2.5.6
pytz 2018.9
pyviz-comms 2.1.0
PyWavelets 1.1.1
PyYAML 3.13
pyzmq 22.1.0
qdldl 0.1.5.post0
qtconsole 5.1.1
QtPy 1.9.0
regex 2019.12.20
requests 2.23.0
requests-oauthlib 1.3.0
resampy 0.2.2
retrying 1.3.3
rpy2 3.4.5
rsa 4.7.2
s-gd2 1.8
scanpy 1.8.1
scIB 0.1.1
scikit-image 0.16.2
scikit-learn 0.22.2.post1
scikit-misc 0.1.4
scipy 1.4.1
scprep 1.1.0
screen-resolution-extra 0.0.0
scs 2.1.4
seaborn 0.11.1
semver 2.13.0
Send2Trash 1.7.1
setuptools 57.2.0
setuptools-git 1.2
Shapely 1.7.1
simplegeneric 0.8.1
sinfo 0.3.4
six 1.15.0
sklearn 0.0
sklearn-pandas 1.8.0
smart-open 5.1.0
snowballstemmer 2.1.0
sortedcontainers 2.4.0
SoundFile 0.10.3.post1
spacy 2.2.4
Sphinx 1.8.5
sphinxcontrib-serializinghtml 1.1.5
sphinxcontrib-websupport 1.2.4
SQLAlchemy 1.4.20
sqlparse 0.4.1
srsly 1.0.5
statsmodels 0.10.2
stdlib-list 0.8.0
sympy 1.7.1
tables 3.4.4
tabulate 0.8.9
tasklogger 1.1.0
tblib 1.7.0
tensorboard 2.5.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.0
tensorflow 2.5.0
tensorflow-datasets 4.0.1
tensorflow-estimator 2.5.0
tensorflow-gcs-config 2.5.0
tensorflow-hub 0.12.0
tensorflow-metadata 1.1.0
tensorflow-probability 0.13.0
termcolor 1.1.0
terminado 0.10.1
testpath 0.5.0
text-unidecode 1.3
textblob 0.15.3
texttable 1.6.4
Theano-PyMC 1.1.2
thinc 7.4.0
tifffile 2021.7.2
toml 0.10.2
toolz 0.11.1
torch 1.9.0+cu102
torchsummary 1.5.1
torchtext 0.10.0
torchvision 0.10.0+cu102
tornado 5.1.1
tqdm 4.41.1
traitlets 5.0.5
tweepy 3.10.0
typeguard 2.7.1
typing-extensions 3.7.4.3
tzlocal 1.5.1
umap-learn 0.5.1
uritemplate 3.0.1
urllib3 1.24.3
vega-datasets 0.9.0
wasabi 0.8.2
wcwidth 0.2.5
webencodings 0.5.1
Werkzeug 1.0.1
wheel 0.36.2
widgetsnbextension 3.5.1
wordcloud 1.5.0
wrapt 1.12.1
xarray 0.18.2
xgboost 0.90
xkit 0.0.0
xlrd 1.1.0
xlwt 1.3.0
yellowbrick 0.9.1
zict 2.0.0
zipp 3.5.0

@keherri
Copy link

keherri commented Aug 3, 2021

running into the same issue on jupyter notebook in aws sagemaker

@lmcinnes
Copy link
Owner

lmcinnes commented Aug 3, 2021

I wish I had better answers, but this very much seems to be an issue with cloud services, and how they actually back their "local" storage which is used for caching numba compiled functions. I would strongly suggest you take this a little further upstream: wither with the cloud providers, or with the numba team, or both, since this is definitely beyond my expertise.

@stuartarchibald
Copy link

If you make a directory in e.g. /tmp for example /tmp/numba_cache and then set the environment variable NUMBA_CACHE_DIR to point to that i.e. export NUMBA_CACHE_DIR=/tmp/numba_cache, does that help?

@HelloWorldLTY
Copy link
Author

If you make a directory in e.g. /tmp for example /tmp/numba_cache and then set the environment variable NUMBA_CACHE_DIR to point to that i.e. export NUMBA_CACHE_DIR=/tmp/numba_cache, does that help?

Emm, I use this sentence to change the cache dir:
IPython.paths.set_ipython_cache_dir = '/content/tmp/numba_cache'
But I got the same errors again, which seems that this method cannot solve this problem.

Could you please be more specific? What should I do to change NUMBA_CACHE_DIR=/tmp/numba_cache? Thanks

@HPLegion
Copy link

HPLegion commented Aug 6, 2021

IPython.paths.set_ipython_cache_dir = '/content/tmp/numba_cache'

This issue is not about the IPython cache but about numba's cache. Messing with the Ipython cache is probably something you want to avoid.

NUMBA_CACHE_DIR is meant to be a system environment variable that numba is reading while it sets itself up. On POSIX systems you can usually set them with export NUMBA_CACHE_DIR=... (I don't know if COLAB allows this through shell escapes) or you can set it by using pythons os.environ. The important thing is to change it BEFORE you import numba. Then numba should try and use the given directory as a cache directory.

@HelloWorldLTY
Copy link
Author

HelloWorldLTY commented Aug 6, 2021

IPython.paths.set_ipython_cache_dir = '/content/tmp/numba_cache'

This issue is not about the IPython cache but about numba's cache. Messing with the Ipython cache is probably something you want to avoid.

NUMBA_CACHE_DIR is meant to be a system environment variable that numba is reading while it sets itself up. On POSIX systems you can usually set them with export NUMBA_CACHE_DIR=... (I don't know if COLAB allows this through shell escapes) or you can set it by using pythons os.environ. The important thing is to change it BEFORE you import numba. Then numba should try and use the given directory as a cache directory.
Emm if I use os.environ = '/content/tmp/numba_cache', it seems that it takes more time for me to load the normal packages. Is there any better solution?

@HPLegion
Copy link

HPLegion commented Aug 6, 2021

Emm if I use os.environ = '/content/tmp/numba_cache', it seems that it takes more time for me to load the normal packages. Is there any better solution?

Is this the code you use on COLAB or did you make a typo when copying it here?
It should be

import os

os.environ["NUMBA_CACHE_DIR"] = "/..."

If you did actually write os.environ = ..., then I would expect that the python interpreter session crashes, or is at least very compromised.

@HelloWorldLTY
Copy link
Author

HelloWorldLTY commented Aug 6, 2021 via email

@HelloWorldLTY
Copy link
Author

"NUMBA_CACHE_DIR"

Sorry to disturb you again, it seems that this method still cannot work.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants