Skip to content

.add_faiss_index and .add_elasticsearch_index returns ImportError at Google Colab #7456

@MapleBloom

Description

@MapleBloom

Describe the bug

At Google Colab
!pip install faiss-cpu works
import faiss no error
but
embeddings_dataset.add_faiss_index(column='embeddings')
returns

[/usr/local/lib/python3.11/dist-packages/datasets/search.py](https://localhost:8080/#) in init(self, device, string_factory, metric_type, custom_index)
247 self.faiss_index = custom_index
248 if not _has_faiss:
--> 249 raise ImportError(
250 "You must install Faiss to use FaissIndex. To do so you can run conda install -c pytorch faiss-cpu or conda install -c pytorch faiss-gpu. "
251 "A community supported package is also available on pypi: pip install faiss-cpu or pip install faiss-gpu. "

because
_has_faiss = importlib.util.find_spec("faiss") is not None at the beginning of datasets/search.py returns False
when
the same code at colab notebook returns
ModuleSpec(name='faiss', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7b7851449f50>, origin='/usr/local/lib/python3.11/dist-packages/faiss/init.py', submodule_search_locations=['/usr/local/lib/python3.11/dist-packages/faiss'])

But

import datasets
datasets.search._has_faiss

at colab notebook also returns False

The same story with _has_elasticsearch

Steps to reproduce the bug

  1. Follow https://huggingface.co/learn/nlp-course/chapter5/6?fw=pt at Google Colab
  2. till embeddings_dataset.add_faiss_index(column='embeddings')
  3. embeddings_dataset.add_elasticsearch_index(column='embeddings')
  4. https://colab.research.google.com/drive/1h2cjuiClblqzbNQgrcoLYOC8zBqTLLcv#scrollTo=3ddzRp72auOF

Expected behavior

I've only started Tutorial and don't know exactly. But something tells me that embeddings_dataset.add_faiss_index(column='embeddings')
should work without Import Error

Environment info

Google Colab notebook with default config

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions