Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add topk functionality to qa pipeline #480

Closed
wants to merge 3 commits into from
Closed

Conversation

rcali21
Copy link

@rcali21 rcali21 commented May 30, 2023

This is my first attempt. I'm having some trouble representing the data in a df as shown in #406. I also noticed that that code was borrowed from https://github.com/neuml/txtai/blob/master/examples/20_Extractive_QA_to_build_structured_data.ipynb so I assume this first attempt will need some work if that is the intended output.

@rcali21
Copy link
Author

rcali21 commented May 30, 2023

Additionally, to get ibuprofen as a returned answer, as the issue in #406 discusses, the prompt needed to be changed.

context = ["The doctor administered a 7g dose of Acetarsol and a 16mg dose of Ibuprofen",
           "The patient took one Alosetron tablet"]


queries = ["What is the specific drug taken?", "What is the dose?"]

versus:

context = ["The doctor administered a 7g dose of Acetarsol and a 16mg dose of Ibuprofen",
           "The patient took one Alosetron tablet"]


queries = ["What is the drug taken?", "What is the dose?"]```

@davidmezzetti
Copy link
Member

Thanks for taking on this issue! Couple initial things.

  1. Did you run make data && make coverage locally? I think we'll need a new unit test to keep test coverage up.
  2. The coding style for txtai veers slightly from standard Python. For example, top_k would be topk. answers_list would be answerslist and so forth.
  3. To be consistent with other pipelines, topk would be an optional argument to the __call__ method instead of an __init__ argument.
  4. Is it intended for this to work in a backwards compatible way? The hope is yes.

@rcali21
Copy link
Author

rcali21 commented Jun 2, 2023

Thanks for taking on this issue! Couple initial things.

  1. Did you run make data && make coverage locally? I think we'll need a new unit test to keep test coverage up.
  2. The coding style for txtai veers slightly from standard Python. For example, top_k would be topk. answers_list would be answerslist and so forth.
  3. To be consistent with other pipelines, topk would be an optional argument to the __call__ method instead of an __init__ argument.
  4. Is it intended for this to work in a backwards compatible way? The hope is yes.

Hi David,

I ran both of those a few weeks back when I initially cloned the repo. I just ran again to be certain and make data returned:

(txtai_work) ➜ txtai git:(add_topk) ✗ make data mkdir -p /tmp/txtai wget -N https://github.com/neuml/txtai/releases/download/v3.5.0/tests.tar.gz -P /tmp make: wget: No such file or directory make: *** [data] Error 1

However, make coverage runs with some errors.

  1. Noted.

  2. Noted

  3. Can you clarify in what way, thanks.

@davidmezzetti
Copy link
Member

Hi Ryan,

I just checked in a change that should show a better message when wget is not found. You'll need to install that package dependency at the OS level.

In terms of backwards compatibility, if the tests run without errors, then we should be good to go.

@rcali21
Copy link
Author

rcali21 commented Jun 7, 2023

Hi Ryan,

I just checked in a change that should show a better message when wget is not found. You'll need to install that package dependency at the OS level.

In terms of backwards compatibility, if the tests run without errors, then we should be good to go.

I just ran make data which now works but make coverage now returns a segfault.

coverage run -m unittest discover -v -s ./test/python testAnnoy (testann.TestANN) Test Annoy backend ... ERROR testAnnoyCustom (testann.TestANN) Test Annoy backend with custom settings ... ERROR testCustomBackend (testann.TestANN) Test resolving a custom backend ... make: *** [coverage] Segmentation fault: 11

Was this what you were referring to before or was that in a different case?

@davidmezzetti
Copy link
Member

It appears you're running into this macOS issue: #377

I've long wanted to solve it but I don't develop on macOS. From other discussions, it appears it's some sort of library conflict (see this: #300 (comment)).

Perhaps you could try to install/configure/change the OpenMP install to see if that helps? What version of Python and PyTorch are you using locally?

@davidmezzetti
Copy link
Member

Hi Ryan,

With the summer coming to a close, I assume you didn't get time to come back around on this. Where do you think this stands? Is what you submitted complete or should I close this?

@davidmezzetti
Copy link
Member

Closing this due to inactivity. Feel free to re-open should there be future interest in completing this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants