New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make possible to pip install #2
Comments
Hi Craig. Thanks for developing these wrappers! I'm working to push a number of major updates (lots of changes, but mainly end-to-end retrieval and optimized inference) to the current barebone version of the repository soon. This sounds like a helpful component to add in there. I will update here once done. |
For reference, my code is https://github.com/cmacdonald/pyterrier_bert/blob/master/pyterrier_bert/pyt_colbert.py. Being able to refer to the colbert package as a dependency would be advantageous. |
Hi Craig. FYI, I just released v0.2.0 under a new branch. This is a near-complete rewrite with lots of added functionality and much faster inference, among other things. I added the setup file although I haven't tested pip install yet. You will notice substantial (but not too massive) changes to the command-line interface. I will update here when I create new instructions for v0.2 (then will merge into master). |
Thanks, Looking forward to trying it |
I tried to use the v0.2 branch tonight, and didnt succeed. Here are some points:
I only currently use only two functions from ColBERT in "v0.1":
In essence:
|
Thanks, Craig. All good points; I think I can address most of those today (PST time). For the (current) dependencies, it seems like conda is important for faiss (end-to-end retrieval) efficiency since it builds the right setup, but maybe there's a way to get pip to match that. With conda, this is a minimal working conda env for v0.2
Is there a reason to prefer pip? |
Thanks for the quick reply. I'll give another shot about the dependencies tomorrow. I suspect if you address the last two points:
then my immediate integration will be easier. |
For the second point about
If you like, we could wrap these two lines in a function called rerank but that would be slow-ish since it would keep re-creating the inference manager. One could cache that into the colbert object though. Let me know if you prefer that. The current code happens to be more tuned for command-line usage than internal-library usage, but I see the value of the latter (e.g., in the context of pyterrier) so I'm happy to see what would make things easier on your end. This should be fairly straight-forward for "slow" re-ranking (as above). But it should be doable also for indexing, fast re-ranking, and/or end-to-end retrieval. |
Added conda_env.yml.
I don't have reliable experience installing faiss with pip (it works but could be compiled suboptimally for the architecture, not use MKL, etc.), so in general I recommend this way if end-to-end retrieval will be used.
I agree that lightweight logging makes sense for some use cases. Perhaps I can update the code to use these libraries iff they are installed. How critical do you find this?
Thanks for the point about Colab. Python 3.7 dictionaries are ordered by default (this is most relevant for defaultdict objects, since otherwise one could rely on OrderedDict). The v0.2 code isn't tested without that assumption. This only affects loading related code (mostly, if not entirely, evaluation/loaders.py and the downstream code that uses it), though, which isn't too deep into the core code.
Addressed separately above.
Extracted |
Just because its a familiar method to install library dependencies.
The CPython implementation of Python 3.6 retains dictionary order, its just not guarenteed across all Python implementations: https://stackoverflow.com/a/39980744.
I then got:
nullcontext only exists in Python 3.7 - I used a workaround from https://stackoverflow.com/a/45187287. You can see my fork. Finally, I wasnt able to load an existing ColBERT model. Is this expected?
Might this be because our models were trained with older huggingface versions? |
Thanks! Will address each. For the last item, it should work with transformers 3.0.2 as in the env. They introduced a breaking change in 3.1. |
Ah, yes, thanks. Finally, the input text is type checked for list or tuple, while I gave a numpy array. I simpy used .tolist(), but it seemed unnecessary? Ok pegging to 3.0.2 and using slow_rerank works for me. Thanks for your timely help! |
I'm not sure about a numpy array of text. It could work, possibly. These assertions guard against string inputs (at least) because the downstream code wouldn't complain about them; it would just produce weird outcomes. By the way, be sure you select args.mask_punctuation = True (default happens to be false in v0.2, just because it's a flag --mask-punctuation, but I'll try to have a negative flag instead in the next version as the intention is it should be on unless someone wants to disable it). |
Got it - https://github.com/cmacdonald/pyterrier_bert/blob/colbert0.2/pyterrier_bert/pyt_colbert.py#L40 PR sent for ColBERT. |
fixed assert in slow_rerank
Hello, thanks for your repository and SIGIR paper. We would like to develop wrappers on top of ColBERT. Would it be possible to make the repo compatible with pip. This would need:
The text was updated successfully, but these errors were encountered: