-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding first pass at Dockerfile #12
base: main
Are you sure you want to change the base?
Conversation
Dockerfile
Outdated
@@ -0,0 +1,18 @@ | |||
FROM ubuntu:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to use an ubuntu image rather than python:3-alpine
so you don't have to install python and the image can be smaller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function pthread_attr_setaffinity_np is not on alpine linux and needed for pytorch. I am not sure if there is a slimmer distro than ubuntu. Open to try something if you have any suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer we use ubuntu and maybe even package the model weights in the dockerfile (you're going to have to download it anyway, and we pin the revision in docquery).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my local when using it to develop I mount my machines huggingface cache. But I think as a general easy to use docker container that would simplify things and be respectful of huggingfaces resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to build from source than? Or should I just pip install docquery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it depends on how we want to publish it. We should probably get a better release cadence going and pin the Dockerfile to whatever the latest release is. I think the way to do that would be to somehow provide the version as a parameter to the Dockerfile and populate the parameter by looking at https://github.com/impira/docquery/blob/main/src/docquery/version.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docker run IMAGE_NAME docquery scan "What is the invoice number?" https://templates.invoicehome.com/invoice-template-us-neat-750px.png
Seems like a really nice way to use the tool.
Dockerfile
Outdated
FROM ubuntu:latest | ||
|
||
RUN apt-get update \ | ||
&& apt-get install -y python3-pip python3-dev \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be specific about which python3 version to use? I think some of the dependencies here require > 3.6.
RUN apt-get update && apt-get install -y python3.9 python3.9-dev python3-pip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to python:3.10-slim-bullseye. So we will be using 3.10.
Ideally 3.10
…--
Ankur Goyal
CEO, Impira
Make Meaningful
Sent via Superhuman iOS ( ***@***.*** )
On Sat, Sep 10 2022 at 10:09 AM, npappenhagen < ***@***.*** > wrote:
***@***.**** commented on this pull request.
In Dockerfile (
#12 (comment) ) :
> @@ -0,0 +1,18 @@
+FROM ubuntu:latest
+
+RUN apt-get update \
+ &&
apt-get install -y python3-pip python3-dev \
does this need to be specific about which python3 version to use? I think
some of the dependencies here require > 3.6.
RUN apt-get update && apt-get install -y python3.9 python3.9-dev
python3-pip
—
Reply to this email directly, view it on GitHub (
#12 (review) ) ,
or unsubscribe (
https://github.com/notifications/unsubscribe-auth/AAEKA47Y6QTY5O7Z5DRLLNDV5S6EPANCNFSM6AAAAAAQDXWIFY
).
You are receiving this because you are subscribed to this thread. Message
ID: <impira/docquery/pull/12/review/1103155127 @ github. com>
|
@amazingvince how does the scan command work from the docker container? Specifically, if you run something like |
Yeah but what happens if you point to a file on your local filesystem? |
@ankrgyl It would be something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh cool, that looks good. I think as long as we document it that works. This is very exciting, and I think we are in the homestretch.
To land this we need to sort out a few more things:
- Add a command to make, like
make docker
, that builds the Dockerfile (with the appropriate container/tag names) - Add documentation to the README that shows how to run the scan command and how to add local files
- Add something to the tests (maybe a sanity test that the container can be built, or a mode that builds the container and then runs the tests inside of it)
I've also filed some follow ups: #27 and #28. We can address these in follow ups.
COPY ["src/", "./src"] | ||
|
||
RUN pip install . | ||
CMD ["python3"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make the default entrypoint python3 -m docquery.cmd
(or just docquery
)?
COPY ["README.md", "pyproject.toml", "setup.py", "./"] | ||
COPY ["src/", "./src"] | ||
|
||
RUN pip install . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should either pip install .[all]
or a select list of extensions, e.g. [donut]
(I'm currently working on adding [web]
which will contain extras for web scraping).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also manually install transformers
(the same version that's suggested in the README)
No description provided.