Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: Textractor pipeline is not available - install "pipeline" extra to enable #646

Closed
berkgungor opened this issue Jan 23, 2024 · 6 comments
Assignees
Milestone

Comments

@berkgungor
Copy link

berkgungor commented Jan 23, 2024

When i run the code block below, it gives error :

from txtai.pipeline import Textractor
textractor = Textractor()

ImportError: Textractor pipeline is not available - install "pipeline" extra to enable

Note : pip install txtai[pipeline] did not work

@davidmezzetti
Copy link
Member

What environment are you running on? I would rebuild your virtual environment and reinstall with pip install txtai[pipeline]

@Lanznx
Copy link

Lanznx commented Jan 24, 2024

@berkgungor are you trying to run the colab script from here ?

if you are running on MacOS with zsh, then you might need to change the command from !pip install git+https://github.com/neuml/txtai#egg=txtai[pipeline]

to !pip install 'git+https://github.com/neuml/txtai#egg=txtai[pipeline]'

The error message you're encountering in the Z shell (zsh) is likely due to the way zsh handles square brackets. In zsh, square brackets are used for pattern matching, and when they appear in a command, zsh tries to match them to file names. If it doesn't find any matching files, you get the no matches found error.

@berkgungor
Copy link
Author

@davidmezzetti i run the code in my venv that i use for all other llm projects so rebuilding is not the best option for me, but it works in colab. I think one of the llm libraries such as transformers, langhchain etc. might affect txtai.

@Lanznx i am on a windows cuda env.

@davidmezzetti
Copy link
Member

Following up on this, Apache Tika is a Java library and requires Java to be installed.

If you install Java, does this solve the issue? I will update the FAQ and error message once confirmed.

@davidmezzetti davidmezzetti self-assigned this Feb 2, 2024
@davidmezzetti davidmezzetti added this to the v6.4.0 milestone Feb 2, 2024
@davidmezzetti davidmezzetti added bug Something isn't working and removed bug Something isn't working labels Feb 2, 2024
@davidmezzetti
Copy link
Member

Thank you @Lanznx, after researching this further, I believe escaping the brackets is the solution for both macOS zsh and anyone using Windows Powershell.

@genki-aik
Copy link

genki-aik commented Sep 16, 2024

Manually installing 3 packages worked for me (on Windows):
pip install beautifulsoup4 tika nltk

These are used in textractor.py and segmentation.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants