Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add translation pipeline parameter to return selected models #383

Closed
davidmezzetti opened this issue Nov 19, 2022 · 7 comments · Fixed by #424
Closed

Add translation pipeline parameter to return selected models #383

davidmezzetti opened this issue Nov 19, 2022 · 7 comments · Fixed by #424
Assignees
Milestone

Comments

@davidmezzetti
Copy link
Member

The translation pipeline seamlessly loads and uses a series of models to run the translations.

It would be beneficial to have a parameter to also return the associated models and detected languages to help with explainability and debugging.

@davidmezzetti davidmezzetti added this to the v5.2.0 milestone Nov 19, 2022
@davidmezzetti davidmezzetti self-assigned this Nov 19, 2022
@davidmezzetti davidmezzetti removed this from the v5.2.0 milestone Dec 12, 2022
@saucam
Copy link
Contributor

saucam commented Feb 3, 2023

@davidmezzetti can I take a shot at this one?

@davidmezzetti
Copy link
Member Author

Happy to accept PRs! Please just make sure to follow the coding convention.

Guide available here: https://github.com/neuml/.github/blob/master/CONTRIBUTING.md

@saucam
Copy link
Contributor

saucam commented Feb 4, 2023

@davidmezzetti I tried to run the unit tests after installing all dependencies by following the readme.
I got 52 failures and most of the failures are due to reasons like:

install "XXXXX" extra to enable, example

======================================================================
ERROR: testAnnoy (testann.TestANN)
Test Annoy backend
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yashdatta/Documents/Workspace/txtai/test/python/testann.py", line 24, in testAnnoy
    self.runTests("annoy", None, False)
  File "/Users/yashdatta/Documents/Workspace/txtai/test/python/testann.py", line 113, in runTests
    self.assertEqual(self.backend(name, params).config["backend"], name)
  File "/Users/yashdatta/Documents/Workspace/txtai/test/python/testann.py", line 144, in backend
    model = ANNFactory.create(config)
  File "/Users/yashdatta/Documents/Workspace/txtai/src/python/txtai/ann/factory.py", line 35, in create
    ann = Annoy(config)
  File "/Users/yashdatta/Documents/Workspace/txtai/src/python/txtai/ann/annoy.py", line 26, in __init__
    raise ImportError('Annoy is not available - install "similarity" extra to enable')
ImportError: Annoy is not available - install "similarity" extra to enable

Did i miss any step? This is what i did

conda create -n txtai
conda activate txtai
pip install -e .[dev]
pre-commit install
make coverage

@davidmezzetti
Copy link
Member Author

Sorry there was an oversight in the CONTRIBUTING.md file. It's been fixed. The install step should be:

pip install -e .[all,dev]

@saucam
Copy link
Contributor

saucam commented Feb 5, 2023

@davidmezzetti thanks! It resolves those errors. Now I get a different set of errors.

======================================================================
ERROR: testImageWorkflow (testworkflow.TestWorkflow)
Tests an image task
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yashdatta/Documents/Workspace/txtai/test/python/testworkflow.py", line 214, in testImageWorkflow
    self.assertEqual(results[0].size, (1024, 682))
AttributeError: 'str' object has no attribute 'size'


On my mac, I see failures in tests for the APIs as well

======================================================================
ERROR: testCaption (testapi.testpipelines.TestPipelines)
Test caption via API
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yashdatta/Documents/Workspace/txtai/test/python/testapi/testpipelines.py", line 114, in testCaption
    caption = self.client.get(f"caption?file={Utils.PATH}/books.jpg").json()
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/testclient.py", line 488, in get
    return super().get(
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/httpx/_client.py", line 1045, in get
    return self.request(
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/testclient.py", line 454, in request
    return super().request(
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/httpx/_client.py", line 821, in request
    return self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/httpx/_client.py", line 908, in send
    response = self._send_handling_auth(
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/httpx/_client.py", line 936, in _send_handling_auth
    response = self._send_handling_redirects(
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/httpx/_client.py", line 973, in _send_handling_redirects
    response = self._send_single_request(request)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/httpx/_client.py", line 1009, in _send_single_request
    response = transport.handle_request(request)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/testclient.py", line 337, in handle_request
    raise exc
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/testclient.py", line 334, in handle_request
    portal.call(self.app, scope, receive, send)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/anyio/from_thread.py", line 283, in call
    return cast(T_Retval, self.start_task_soon(func, *args).result())
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/anyio/from_thread.py", line 219, in _call_func
    retval = await retval
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/fastapi/applications.py", line 270, in __call__
    await super().__call__(scope, receive, send)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/applications.py", line 124, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/routing.py", line 706, in __call__
    await route.handle(scope, receive, send)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/fastapi/routing.py", line 165, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/Users/yashdatta/Documents/Workspace/txtai/src/python/txtai/api/routers/caption.py", line 26, in caption
    return application.get().pipeline("caption", (file,))
  File "/Users/yashdatta/Documents/Workspace/txtai/src/python/txtai/app/base.py", line 624, in pipeline
    return self.pipelines[name](*args)
  File "/Users/yashdatta/Documents/Workspace/txtai/src/python/txtai/pipeline/image/caption.py", line 46, in __call__
    values = [Image.open(image) if isinstance(image, str) else image for image in values]
  File "/Users/yashdatta/Documents/Workspace/txtai/src/python/txtai/pipeline/image/caption.py", line 46, in <listcomp>
    values = [Image.open(image) if isinstance(image, str) else image for image in values]
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/PIL/Image.py", line 3092, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/txtai/books.jpg'

Some tests end up with system error, like this one trying to open .wav file

======================================================================
ERROR: testTranscribe (testapi.testpipelines.TestPipelines)
Test transcribe via API
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/yashdatta/Documents/Workspace/txtai/test/python/testapi/testpipelines.py", line 296, in testTranscribe
    text = self.client.get(f"transcribe?file={Utils.PATH}/Make_huge_profits.wav").json()
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/testclient.py", line 488, in get
    return super().get(
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/httpx/_client.py", line 1045, in get
    return self.request(
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/testclient.py", line 454, in request
    return super().request(
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/httpx/_client.py", line 821, in request
    return self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/httpx/_client.py", line 908, in send
    response = self._send_handling_auth(
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/httpx/_client.py", line 936, in _send_handling_auth
    response = self._send_handling_redirects(
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/httpx/_client.py", line 973, in _send_handling_redirects
    response = self._send_single_request(request)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/httpx/_client.py", line 1009, in _send_single_request
    response = transport.handle_request(request)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/testclient.py", line 337, in handle_request
    raise exc
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/testclient.py", line 334, in handle_request
    portal.call(self.app, scope, receive, send)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/anyio/from_thread.py", line 283, in call
    return cast(T_Retval, self.start_task_soon(func, *args).result())
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/anyio/from_thread.py", line 219, in _call_func
    retval = await retval
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/fastapi/applications.py", line 270, in __call__
    await super().__call__(scope, receive, send)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/applications.py", line 124, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/routing.py", line 706, in __call__
    await route.handle(scope, receive, send)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/fastapi/routing.py", line 165, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/Users/yashdatta/Documents/Workspace/txtai/src/python/txtai/api/routers/transcription.py", line 26, in transcribe
    return application.get().pipeline("transcription", (file,))
  File "/Users/yashdatta/Documents/Workspace/txtai/src/python/txtai/app/base.py", line 624, in pipeline
    return self.pipelines[name](*args)
  File "/Users/yashdatta/Documents/Workspace/txtai/src/python/txtai/pipeline/audio/transcription.py", line 52, in __call__
    speech = self.read(values, rate)
  File "/Users/yashdatta/Documents/Workspace/txtai/src/python/txtai/pipeline/audio/transcription.py", line 76, in read
    raw, samplerate = sf.read(x)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/soundfile.py", line 282, in read
    with SoundFile(file, 'r', samplerate, channels,
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/soundfile.py", line 655, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/Users/yashdatta/.pyenv/versions/3.8.8/lib/python3.8/site-packages/soundfile.py", line 1213, in _open
    raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening '/tmp/txtai/Make_huge_profits.wav': System error.

Any pointers would be very helpful !

@davidmezzetti
Copy link
Member Author

I typically run at the lowest supported version of Python which is currently 3.7.

The GitHub Actions build workflow is another resource to check when encountering errors - https://github.com/neuml/txtai/blob/master/.github/workflows/build.yml. This builds on Windows/macOS/Linux with every check in.

Regarding the error, it looks like the test data is missing, this can be fixed with:

make data

I updated the CONTRIBUTING guide to make that clear. Thank you for diligence in getting through the build step!

@saucam
Copy link
Contributor

saucam commented Feb 5, 2023

Thanks! It works perfectly now. Let me try to make some changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants