Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline the build and use of the Python image #284

Merged
merged 3 commits into from
May 25, 2024

Conversation

sergiolaverde0
Copy link
Contributor

The first commit performs a series of very small refactors that I think make sense, such as moving the non configurable settings from the composefile to the image itself, and using a python base image instead of an ubuntu one just to install python on it immediately.

The second commit "fixes" a bug I encountered when testing the import book functionality where sortedItems would not be defined, most likely because sortMethod was neither of the options checked, which raised an exception when trying to access it later.

I made it so that it gets the default behaviour whenever the "spine" option is not selected, and isolated it on its own commit so it can be easily left out if you'd rather address the issue by different means.

@sergiolaverde0 sergiolaverde0 marked this pull request as ready for review May 25, 2024 01:49
@simjanos-dev
Copy link
Owner

The second commit "fixes" a bug I encountered when testing the import book functionality where sortedItems would not be defined, most likely because sortMethod was neither of the options checked, which raised an exception when trying to access it later.

That's weird. I'll check this on the other side of the program, this should not happen.

Can you please apply these changes to the dev dockerfile as well?

Thank you so much!

@mateuszmrw
Copy link
Contributor

Hi,

I think we could improve the Docker further:

  • Since we use the python:slim I am not sure that ENV PYTHONPATH="/var/www/html/storage/app/model" is needed. I tried it locally without it and it seems fine. I am not that experience with the Python to decided that is completely not needed but something to consider, since the Python image already sets the python path.

  • We could migrate to use the requirements.txt inside the tools folder instead of installing the dependencies inside the docker image. If there is a change in dependency and it starts falling imports it could have impact on users and would be hard to fix or debug without creating a new build. Using requirements.txt would mitigate this risk.

  • We should copy the Tools inside the Docker image to the ./app directory directly instead of mounting it in docker compose. That would allow us to delete both mounted volumes inside docker-compose.yaml

@sergiolaverde0
Copy link
Contributor Author

Can you please apply these changes to the dev dockerfile as well?

I did and this time I remembered to apply them to the other composefiles as well.

Since we use the python:slim I am not sure that ENV PYTHONPATH="/var/www/html/storage/app/model" is needed.

I'd rather leave it explicit for better readability. Some time ago simjanos mentioned a valid concern of docker becoming a bit of a black box to him where they don't know what is going on inside, and that line will remind him where inside the container are models for extra languages installed should they ever forget.

If there is a change in dependency and it starts falling imports it could have impact on users and would be hard to fix or debug without creating a new build. Using requirements.txt would mitigate this risk.

Would this be actually helpful given that we don't pin the package version when installing? Having to bump them manually adds a bit of maintenance burden that this project cannot afford (at least not right now).

We should copy the Tools inside the Docker image to the ./app directory directly instead of mounting it in docker compose.

That is already done for the image that is deployed to the end user. You are probably looking at the composefile that simjanos uses for development, where it is mounted to allow for hot reloading any changes made without having to rebuild the image.

@mateuszmrw
Copy link
Contributor

Yeah, I was looking at wrong Docker image, now it makes sense to me.

I think the requirements.txt is just mostly needed for the freeze of the deps. I get the it adds the maintenance burden, but it could be risky in case something goes wrong or there is a change that affects the tokenizer code. But I think that mostly to @simjanos-dev and his preferences and needs at the time. If it works well now, it's something that could be revisited in the future.

@simjanos-dev
Copy link
Owner

Can you please resolve the conflicts? I'll merge this in after that.

I am not familiar with github and open source processes. If I get a PR and it gets a conflict, should I resolve it, or should I ask for it to be resolved? I do not want to push work to people that would be expected from me.

This commit switches the base image from ubuntu to python:slim as it
makes for a more sensible default that simplifies the build process
without compromising the final size.

It also moves the setup of the environment variable for the PYTHONPATH
and the command to run the main process inside the container to the
image itself, where they belong as they are not configurable and any
change would break the service.
This commit fixes `sortedItems` not being defined if `sortMethod` was
neither "default" nor "spine" which raised an exception, by having it
take the default value whenever `sortMethod`is different from "spine".

I'm uncertain of when or what introduced this bug.
@sergiolaverde0
Copy link
Contributor Author

I don't think you will find a single answer to that, but I personally don't mind resolving the conflicts however I might mess up while doing so

@simjanos-dev
Copy link
Owner

Thank you so much!

however I might mess up while doing so

That's what I'm afraid of when I try to resolve conflicts too. But probably I should get more comfortable with it.

tzdata \

Will this part be also installed with the new version? I don't remember what, but I remember something needed this installed.

@simjanos-dev simjanos-dev merged commit fcaaad8 into simjanos-dev:dev May 25, 2024
@sergiolaverde0
Copy link
Contributor Author

Yes, tzdata is installed by default on the image. It was Thai that needed it, never really understood why.

@simjanos-dev
Copy link
Owner

simjanos-dev commented May 30, 2024

Although I merged this into dev, I didn't have the time to test it, and did not rebuild my image yet. @mateuszmrw wrote this to me on discord. I will take a look into it, but in the meantime I thought I copy it to you too, maybe you know what's wrong just by looking at it.


I have this error after running:

docker compose -f docker-compose-dev-macos.yml up --build

I added --build to build a fresh container.
I have this error

ModuleNotFoundError: No module named 'numpy.core._multiarray_umath

I attached the whole log in the message.txt file:

linguacafe-python-service-dev  | Traceback (most recent call last):
linguacafe-python-service-dev  |   File "/var/www/html/storage/app/model/numpy/core/__init__.py", line 24, in <module>
linguacafe-python-service-dev  |     from . import multiarray
linguacafe-python-service-dev  |   File "/var/www/html/storage/app/model/numpy/core/multiarray.py", line 10, in <module>
linguacafe-python-service-dev  |     from . import overrides
linguacafe-python-service-dev  |   File "/var/www/html/storage/app/model/numpy/core/overrides.py", line 8, in <module>
linguacafe-python-service-dev  |     from numpy.core._multiarray_umath import (
linguacafe-python-service-dev  | ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
linguacafe-python-service-dev  | 
linguacafe-python-service-dev  | During handling of the above exception, another exception occurred:
linguacafe-python-service-dev  | 
linguacafe-python-service-dev  | Traceback (most recent call last):
linguacafe-python-service-dev  |   File "/var/www/html/storage/app/model/numpy/__init__.py", line 130, in <module>
linguacafe-python-service-dev  |     from numpy.__config__ import show as show_config
linguacafe-python-service-dev  |   File "/var/www/html/storage/app/model/numpy/__config__.py", line 4, in <module>
linguacafe-python-service-dev  |     from numpy.core._multiarray_umath import (
linguacafe-python-service-dev  |   File "/var/www/html/storage/app/model/numpy/core/__init__.py", line 50, in <module>
linguacafe-python-service-dev  |     raise ImportError(msg)
linguacafe-python-service-dev  | ImportError: 
linguacafe-python-service-dev  | 
linguacafe-python-service-dev  | IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
linguacafe-python-service-dev  | 
linguacafe-python-service-dev  | Importing the numpy C-extensions failed. This error can happen for
linguacafe-python-service-dev  | many reasons, often due to issues with your setup or how NumPy was
linguacafe-python-service-dev  | installed.
linguacafe-python-service-dev  | 
linguacafe-python-service-dev  | We have compiled some common reasons and troubleshooting tips at:
linguacafe-python-service-dev  | 
linguacafe-python-service-dev  |     https://numpy.org/devdocs/user/troubleshooting-importerror.html
linguacafe-python-service-dev  | 
linguacafe-python-service-dev  | Please note and check the following:
linguacafe-python-service-dev  | 
linguacafe-python-service-dev  |   * The Python version is: Python3.12 from "/usr/local/bin/python"
linguacafe-python-service-dev  |   * The NumPy version is: "1.26.4"
linguacafe-python-service-dev  | 
linguacafe-python-service-dev  | and make sure that they are the versions you expect.
linguacafe-python-service-dev  | Please carefully study the documentation linked above for further help.
linguacafe-python-service-dev  | 
linguacafe-python-service-dev  | Original error was: No module named 'numpy.core._multiarray_umath'
linguacafe-python-service-dev  | 
linguacafe-python-service-dev  | 
linguacafe-python-service-dev  | The above exception was the direct cause of the following exception:
linguacafe-python-service-dev  | 
linguacafe-python-service-dev  | Traceback (most recent call last):
linguacafe-python-service-dev  |   File "/app/tokenizer.py", line 3, in <module>
linguacafe-python-service-dev  |     from spacy.language import Language
linguacafe-python-service-dev  |   File "/var/www/html/storage/app/model/spacy/__init__.py", line 6, in <module>
linguacafe-python-service-dev  |     from .errors import setup_default_warnings
linguacafe-python-service-dev  |   File "/var/www/html/storage/app/model/spacy/errors.py", line 3, in <module>
linguacafe-python-service-dev  |     from .compat import Literal
linguacafe-python-service-dev  |   File "/var/www/html/storage/app/model/spacy/compat.py", line 4, in <module>
linguacafe-python-service-dev  |     from thinc.util import copy_array
linguacafe-python-service-dev  |   File "/var/www/html/storage/app/model/thinc/__init__.py", line 2, in <module>
linguacafe-python-service-dev  |     import numpy
linguacafe-python-service-dev  |   File "/var/www/html/storage/app/model/numpy/__init__.py", line 135, in <module>
linguacafe-python-service-dev  |     raise ImportError(msg) from e
linguacafe-python-service-dev  | ImportError: Error importing numpy: you should not try to import numpy from
linguacafe-python-service-dev  |         its source directory; please exit the numpy source tree, and relaunch
linguacafe-python-service-dev  |         your python interpreter from there.

@mateuszmrw
Copy link
Contributor

Removing this line from Python Docker image fixes it:

ENV PYTHONPATH="/var/www/html/storage/app/model"

@sergiolaverde0
Copy link
Contributor Author

I think removing it would break the extra languages, does it? If it does, try moving it after everything is installed, right before the CMD command.

I will check it more in a couple of hours.

@sergiolaverde0
Copy link
Contributor Author

Could not reproduce, the image built fine from scratch (no cached layers) and I was able to install Japanese and import text in both it and Spanish. I see the issue is related to the MacOS compose file, so it might stem from trying to build an AMD64 image from an ARM system, or at least that's my best guess.

@simjanos-dev
Copy link
Owner

I got the same error after running these 3 commands:

docker builder prune
docker compose -f ./docker-compose-dev.yml build --no-cache
docker compose -f ./docker-compose-dev.yml up -d --force-recreate

I'm going to try this again with the change you suggested, will take 30~ minutes, I'll report back:

FROM python:slim

WORKDIR /app

RUN addgroup --gid 1000 laravel \
    && adduser --ingroup laravel --disabled-password --gecos "" --shell /bin/sh laravel
USER laravel

RUN pip install -U --no-cache-dir \
        setuptools \
        wheel \
        lxml[html_clean] \
#youtube api
        youtube_transcript_api \
#ebook library
        ebooklib \
#kanji reading
        pykakasi \
#bottle
        bottle \
#spacy
        spacy \
#chinese reading
        pinyin \
#subtitle file parser
        pysub-parser \
#website text parser
        newspaper3k

RUN python3 -m spacy download de_core_news_sm \
    && python3 -m spacy download nb_core_news_sm \
    && python3 -m spacy download es_core_news_sm \
    && python3 -m spacy download nl_core_news_sm \
    && python3 -m spacy download fi_core_news_sm \
    && python3 -m spacy download fr_core_news_sm \
    && python3 -m spacy download it_core_news_sm \
    && python3 -m spacy download sv_core_news_sm \
    && python3 -m spacy download en_core_web_sm \
    && python3 -m spacy download el_core_news_sm \
    && python3 -m spacy download ca_core_news_sm \
    && python3 -m spacy download hr_core_news_sm \
    && python3 -m spacy download da_core_news_sm \
    && python3 -m spacy download lt_core_news_sm \
    && python3 -m spacy download mk_core_news_sm \
    && python3 -m spacy download pl_core_news_sm \
    && python3 -m spacy download pt_core_news_sm \
    && python3 -m spacy download ro_core_news_sm \
    && python3 -m spacy download sl_core_news_sm \
    && python3 -m spacy download xx_ent_wiki_sm


CMD [ "python", "/app/tokenizer.py" ]
ENV PYTHONPATH="/var/www/html/storage/app/model"

@simjanos-dev
Copy link
Owner

Same.

2024-06-02 19:52:10 Traceback (most recent call last):
2024-06-02 19:52:10   File "/var/www/html/storage/app/model/numpy/core/__init__.py", line 24, in <module>
2024-06-02 19:52:10     from . import multiarray
2024-06-02 19:52:10   File "/var/www/html/storage/app/model/numpy/core/multiarray.py", line 10, in <module>
2024-06-02 19:52:10     from . import overrides
2024-06-02 19:52:10   File "/var/www/html/storage/app/model/numpy/core/overrides.py", line 8, in <module>
2024-06-02 19:52:10     from numpy.core._multiarray_umath import (
2024-06-02 19:52:10 ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
2024-06-02 19:52:10 
2024-06-02 19:52:10 During handling of the above exception, another exception occurred:
2024-06-02 19:52:10 
2024-06-02 19:52:10 Traceback (most recent call last):
2024-06-02 19:52:10   File "/var/www/html/storage/app/model/numpy/__init__.py", line 130, in <module>
2024-06-02 19:52:10     from numpy.__config__ import show as show_config
2024-06-02 19:52:10   File "/var/www/html/storage/app/model/numpy/__config__.py", line 4, in <module>
2024-06-02 19:52:10     from numpy.core._multiarray_umath import (
2024-06-02 19:52:10   File "/var/www/html/storage/app/model/numpy/core/__init__.py", line 50, in <module>
2024-06-02 19:52:10     raise ImportError(msg)
2024-06-02 19:52:10 ImportError: 
2024-06-02 19:52:10 
2024-06-02 19:52:10 IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
2024-06-02 19:52:10 
2024-06-02 19:52:10 Importing the numpy C-extensions failed. This error can happen for
2024-06-02 19:52:10 many reasons, often due to issues with your setup or how NumPy was
2024-06-02 19:52:10 installed.
2024-06-02 19:52:10 
2024-06-02 19:52:10 We have compiled some common reasons and troubleshooting tips at:
2024-06-02 19:52:10 
2024-06-02 19:52:10     https://numpy.org/devdocs/user/troubleshooting-importerror.html
2024-06-02 19:52:10 
2024-06-02 19:52:10 Please note and check the following:
2024-06-02 19:52:10 
2024-06-02 19:52:10   * The Python version is: Python3.12 from "/usr/local/bin/python"
2024-06-02 19:52:10   * The NumPy version is: "1.26.4"
2024-06-02 19:52:10 
2024-06-02 19:52:10 and make sure that they are the versions you expect.
2024-06-02 19:52:10 Please carefully study the documentation linked above for further help.
2024-06-02 19:52:10 
2024-06-02 19:52:10 Original error was: No module named 'numpy.core._multiarray_umath'
2024-06-02 19:52:10 
2024-06-02 19:52:10 
2024-06-02 19:52:10 The above exception was the direct cause of the following exception:
2024-06-02 19:52:10 
2024-06-02 19:52:10 Traceback (most recent call last):
2024-06-02 19:52:10   File "/app/tokenizer.py", line 3, in <module>
2024-06-02 19:52:10     from spacy.language import Language
2024-06-02 19:52:10   File "/var/www/html/storage/app/model/spacy/__init__.py", line 6, in <module>
2024-06-02 19:52:10     from .errors import setup_default_warnings
2024-06-02 19:52:10   File "/var/www/html/storage/app/model/spacy/errors.py", line 3, in <module>
2024-06-02 19:52:10     from .compat import Literal
2024-06-02 19:52:10   File "/var/www/html/storage/app/model/spacy/compat.py", line 4, in <module>
2024-06-02 19:52:10     from thinc.util import copy_array
2024-06-02 19:52:10   File "/var/www/html/storage/app/model/thinc/__init__.py", line 2, in <module>
2024-06-02 19:52:10     import numpy
2024-06-02 19:52:10   File "/var/www/html/storage/app/model/numpy/__init__.py", line 135, in <module>
2024-06-02 19:52:10     raise ImportError(msg) from e
2024-06-02 19:52:10 ImportError: Error importing numpy: you should not try to import numpy from
2024-06-02 19:52:10         its source directory; please exit the numpy source tree, and relaunch
2024-06-02 19:52:10         your python interpreter from there.

@simjanos-dev
Copy link
Owner

Tried adding numpy to the pip install chain. I got the same error.

2024-06-02 20:06:02 Traceback (most recent call last):
2024-06-02 20:06:02   File "/var/www/html/storage/app/model/numpy/core/__init__.py", line 24, in <module>
2024-06-02 20:06:02     from . import multiarray
2024-06-02 20:06:02   File "/var/www/html/storage/app/model/numpy/core/multiarray.py", line 10, in <module>
2024-06-02 20:06:02     from . import overrides
2024-06-02 20:06:02   File "/var/www/html/storage/app/model/numpy/core/overrides.py", line 8, in <module>
2024-06-02 20:06:02     from numpy.core._multiarray_umath import (
2024-06-02 20:06:02 ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
2024-06-02 20:06:02 
2024-06-02 20:06:02 During handling of the above exception, another exception occurred:
2024-06-02 20:06:02 
2024-06-02 20:06:02 Traceback (most recent call last):
2024-06-02 20:06:02   File "/var/www/html/storage/app/model/numpy/__init__.py", line 130, in <module>
2024-06-02 20:06:02     from numpy.__config__ import show as show_config
2024-06-02 20:06:02   File "/var/www/html/storage/app/model/numpy/__config__.py", line 4, in <module>
2024-06-02 20:06:02     from numpy.core._multiarray_umath import (
2024-06-02 20:06:02   File "/var/www/html/storage/app/model/numpy/core/__init__.py", line 50, in <module>
2024-06-02 20:06:02     raise ImportError(msg)
2024-06-02 20:06:02 ImportError: 
2024-06-02 20:06:02 
2024-06-02 20:06:02 IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
2024-06-02 20:06:02 
2024-06-02 20:06:02 Importing the numpy C-extensions failed. This error can happen for
2024-06-02 20:06:02 many reasons, often due to issues with your setup or how NumPy was
2024-06-02 20:06:02 installed.
2024-06-02 20:06:02 
2024-06-02 20:06:02 We have compiled some common reasons and troubleshooting tips at:
2024-06-02 20:06:02 
2024-06-02 20:06:02     https://numpy.org/devdocs/user/troubleshooting-importerror.html
2024-06-02 20:06:02 
2024-06-02 20:06:02 Please note and check the following:
2024-06-02 20:06:02 
2024-06-02 20:06:02   * The Python version is: Python3.12 from "/usr/local/bin/python"
2024-06-02 20:06:02   * The NumPy version is: "1.26.4"
2024-06-02 20:06:02 
2024-06-02 20:06:02 and make sure that they are the versions you expect.
2024-06-02 20:06:02 Please carefully study the documentation linked above for further help.
2024-06-02 20:06:02 
2024-06-02 20:06:02 Original error was: No module named 'numpy.core._multiarray_umath'
2024-06-02 20:06:02 
2024-06-02 20:06:02 
2024-06-02 20:06:02 The above exception was the direct cause of the following exception:
2024-06-02 20:06:02 
2024-06-02 20:06:02 Traceback (most recent call last):
2024-06-02 20:06:02   File "/app/tokenizer.py", line 3, in <module>
2024-06-02 20:06:02     from spacy.language import Language
2024-06-02 20:06:02   File "/var/www/html/storage/app/model/spacy/__init__.py", line 6, in <module>
2024-06-02 20:06:02     from .errors import setup_default_warnings
2024-06-02 20:06:02   File "/var/www/html/storage/app/model/spacy/errors.py", line 3, in <module>
2024-06-02 20:06:02     from .compat import Literal
2024-06-02 20:06:02   File "/var/www/html/storage/app/model/spacy/compat.py", line 4, in <module>
2024-06-02 20:06:02     from thinc.util import copy_array
2024-06-02 20:06:02   File "/var/www/html/storage/app/model/thinc/__init__.py", line 2, in <module>
2024-06-02 20:06:02     import numpy
2024-06-02 20:06:02   File "/var/www/html/storage/app/model/numpy/__init__.py", line 135, in <module>
2024-06-02 20:06:02     raise ImportError(msg) from e
2024-06-02 20:06:02 ImportError: Error importing numpy: you should not try to import numpy from
2024-06-02 20:06:02         its source directory; please exit the numpy source tree, and relaunch
2024-06-02 20:06:02         your python interpreter from there.

@mateuszmrw
Copy link
Contributor

mateuszmrw commented Jun 2, 2024

Running the same steps just for mac-os:

docker builder prune
docker compose -f ./docker-compose-dev-macos.yml build --no-cache
docker compose -f ./docker-compose-dev-macos.yml up -d --force-recreate

With this image:

FROM python:slim

WORKDIR /app

RUN addgroup --gid 1000 laravel \
    && adduser --ingroup laravel --disabled-password --gecos "" --shell /bin/sh laravel
USER laravel

RUN pip install -U --no-cache-dir \
        setuptools \
        wheel \
        lxml[html_clean] \
#youtube api
        youtube_transcript_api \
#ebook library
        ebooklib \
#kanji reading
        pykakasi \
#bottle
        bottle \
#spacy
        spacy \
#chinese reading
        pinyin \
#subtitle file parser
        pysub-parser \
#website text parser
        newspaper3k

RUN python3 -m spacy download de_core_news_sm \
    && python3 -m spacy download nb_core_news_sm \
    && python3 -m spacy download es_core_news_sm \
    && python3 -m spacy download nl_core_news_sm \
    && python3 -m spacy download fi_core_news_sm \
    && python3 -m spacy download fr_core_news_sm \
    && python3 -m spacy download it_core_news_sm \
    && python3 -m spacy download sv_core_news_sm \
    && python3 -m spacy download en_core_web_sm \
    && python3 -m spacy download el_core_news_sm \
    && python3 -m spacy download ca_core_news_sm \
    && python3 -m spacy download hr_core_news_sm \
    && python3 -m spacy download da_core_news_sm \
    && python3 -m spacy download lt_core_news_sm \
    && python3 -m spacy download mk_core_news_sm \
    && python3 -m spacy download pl_core_news_sm \
    && python3 -m spacy download pt_core_news_sm \
    && python3 -m spacy download ro_core_news_sm \
    && python3 -m spacy download sl_core_news_sm \
    && python3 -m spacy download xx_ent_wiki_sm


CMD [ "python", "/app/tokenizer.py" ]

Makes the image "work", other than the installing other languages is fricked.
I am not that familiar with Python, but I think the issue is the PYTHONPATH ENV.
It seems for me like since PYTHONPATH is set to /var/www/html/storage/app/model, the python thinks that Numpy is installed there, then tries to import a library but that library does not exists there?

@simjanos-dev
Copy link
Owner

This does not makes sense, but I'll try it anyways: maybe the problem is that the pythonpath for some reason overwrites the default one, instead of being added to it. But based on the old docker-compose file that would do the same.

I replaced it like this. I'll try it again.

ENV PYTHONPATH="${PYTHONPATH}:/var/www/html/storage/app/model"

@simjanos-dev
Copy link
Owner

Nope, same result. But based on this:

File "/var/www/html/storage/app/model/numpy/__init__.py"

I think the problem is that it want to import regular staff from the /var/www/html/storage/app/model path. But we only store language models there, other python stuff are installed to the image.

@simjanos-dev
Copy link
Owner

I've tried it with this dockerfile, did not work. However, I deleted my models folder, and it works perfectly now (still testing). Could it be maybe because the new python docker image have different python version? But whatever the reason is, this would be a problem for already existing users.

FROM python:slim

WORKDIR /app

RUN addgroup --gid 1000 laravel \
    && adduser --ingroup laravel --disabled-password --gecos "" --shell /bin/sh laravel
USER laravel

RUN pip install -U --no-cache-dir \
        setuptools \
        wheel \
        lxml[html_clean] \
#youtube api
        youtube_transcript_api \
#ebook library
        ebooklib \
#kanji reading
        pykakasi \
#bottle
        bottle \
#spacy
        spacy \
#chinese reading
        pinyin \
#subtitle file parser
        pysub-parser \
#website text parser
        newspaper3k

RUN python3 -m spacy download de_core_news_sm \
    && python3 -m spacy download nb_core_news_sm \
    && python3 -m spacy download es_core_news_sm \
    && python3 -m spacy download nl_core_news_sm \
    && python3 -m spacy download fi_core_news_sm \
    && python3 -m spacy download fr_core_news_sm \
    && python3 -m spacy download it_core_news_sm \
    && python3 -m spacy download sv_core_news_sm \
    && python3 -m spacy download en_core_web_sm \
    && python3 -m spacy download el_core_news_sm \
    && python3 -m spacy download ca_core_news_sm \
    && python3 -m spacy download hr_core_news_sm \
    && python3 -m spacy download da_core_news_sm \
    && python3 -m spacy download lt_core_news_sm \
    && python3 -m spacy download mk_core_news_sm \
    && python3 -m spacy download pl_core_news_sm \
    && python3 -m spacy download pt_core_news_sm \
    && python3 -m spacy download ro_core_news_sm \
    && python3 -m spacy download sl_core_news_sm \
    && python3 -m spacy download xx_ent_wiki_sm

ENV PYTHONPATH="${HOME}/.local/bin:${PYTHONPATH}:/var/www/html/storage/app/model"
CMD [ "python", "/app/tokenizer.py" ]

@sergiolaverde0
Copy link
Contributor Author

I just realized this is the live dev environment, which means files are not copied to the image but mounted during runtime (I even mentioned it before) while I was only testing the full image all this time; no wonder I couldn't reproduce. At least I think it is safe for users upgrading.

I will partly revert the changes to the dev tooling to see if that solves it.

@simjanos-dev
Copy link
Owner

Thank you!

I haven't had time yet, but I'll backup my prod folders, and test with those as well later.

@simjanos-dev
Copy link
Owner

So... I was overconfident. The dev environment is working now, but I did not back up my prod folder. But I ran into the same issue. Created a dev image, pulled, started the server, and the same numpy error happened. Shut down my server, deleted the model folder, and it worked again. Just to be clear, this was on my prod environment with the dev image.

However I made a completely unrelated mistake, and the webserver cannot reach the database, so I'm building a new dev image now.

If you want to reproduce, I think these are the steps:

  • Fresh install from the v0.12.6 image
  • Install Japanese (maybe it's language specific issue)
  • Import something
  • Shut down your server
  • Add VERSION=dev to your .env file
  • Pull and update to v0.13's dev image
  • Start the server

Wait.... I'm an idiot. The docker-compose.yml file was modified, however, I did not update that in my prod environment.

Is that something by design, or something that you overlooked too? We do not have a process for automatically pulling the latest docker-compose file from github anymore, and I don't think it should be expected from people to check for changes before they update.

@sergiolaverde0
Copy link
Contributor Author

I don't think it should be expected from people to check for changes before they update

My brain defaulted to this since that is what other "unstable" projects like Immich do, but now I realize Linguacafe aims for a completely different demographic and we should be more explicit about the need to download a new composefile.

I don't expect to introduce any more changes after this (moving the command to the image was the last important change) so hopefully this will be last time we annoy users with something like this.

@simjanos-dev
Copy link
Owner

Last time I told people (and believed) that it would be the last install process change. This PR does make the process more standardized, and I want to have it, but does not address any important bugs or add any important feature. I would like to revert it back, if it's okay with you.

I really don't want to bother people, and even if I create a message about this on the github update release and discord, I know there would be several people who would run into it, without knowing why.

If there are more breaking changes that requires users any additional steps from users to update, I would like to introduce those in one update after linguacafe is in a much more developed state, so we can avoid asking people constantly to migrate. One change like that I have is rewriting the database migrations, because they are a mess.

(By the way, this is my fault in the first place, because I did not have a proper docker process from the beginning.)

@sergiolaverde0
Copy link
Contributor Author

I would like to revert it back, if it's okay with you

Yes, that's fine by me. Maybe gather all these small changes and save them for an hypothetical 1.0 release instead of dropping them little by little along the path.

@simjanos-dev
Copy link
Owner

Maybe gather all these small changes and save them for an hypothetical 1.0 release instead of dropping them little by little along the path.

Yeah, that's what I thought too.

I've noticed there are some changes that #294 did not revert back from #284, like:

CMD [ "export PYTHONPATH=\"${HOME}/.local/bin:${PYTHONPATH}\"" ]

Could you please revert it back to the original working state, or if you would like to keep something, make sure it will work 100% for all users? I will delete my model folder before I update to it. I don't want to use my prod before it's fixed.

My laptop is dying, and freezes for 10+ minutes after I build a new docker image.

(Sorry for bothering you so much with it.)

@sergiolaverde0
Copy link
Contributor Author

Sure, but I have a question, should we keep the second commit?

The second commit "fixes" a bug I encountered when testing the import book functionality where sortedItems would not be defined, most likely because sortMethod was neither of the options checked, which raised an exception when trying to access it later.

I want to know if that change is redundant and should be dropped or if otherwise you now depend on it and should be kept. The only other change worth keeping is the swap of the base image, the first line of both dockerfiles.

@simjanos-dev
Copy link
Owner

Yes, we should definitely keep that.

Thank you, I really appreciate it!

@sergiolaverde0
Copy link
Contributor Author

Done at #299.

simjanos-dev added a commit that referenced this pull request Jun 7, 2024
@simjanos-dev
Copy link
Owner

Someone used the dev image I made today. They got the numpy error.

The only thing I can think of is the different python version.

I've just got a job, so I wont have a lot of times nowadays. Not sure when Ill get to testing this issue.

Docker desktop got deleted from my PC, and now Im using plain docker. It stopped freezing after a build, so I will be able to test issues easier.

@sergiolaverde0
Copy link
Contributor Author

I suspect this is the extra languages backfiring at us. When they were originally installed the whole system used a certain version of python and by extension of many other different packages. But know the half inside the container has been updated while the other half that is stored to a mounted folder is still stuck in the past.

Reverting to the old base will solve the issue in the short term, but it will eventually come back at us because we can't keep using python 3.11 forever. It is solved by uninstalling and reinstalling the extra languages, as you already saw. I think it can be solved too by forcing an update too, so I will make some tests to create a function that updates all installed languages, and you can maybe call this either on user prompt or on the background when the user makes a big update. Does that sound good?

I've just got a job

Congratulations, by the way.

@simjanos-dev
Copy link
Owner

Reverting to the old base will solve the issue in the short term

I'll do that, because I don't have much time and want to get the next update out.

but it will eventually come back at us because we can't keep using python 3.11 forever

I think it was an even older version. I'll check the main image. I think we should fix the version number to the old one, and update every 1-2 years, and display a message inside the software for the admin, that uninstalling and installing packages again are necessary. I think it is acceptable, because it is an easy task inside the UI, and has to be done rarely.

Congratulations, by the way.
Thank you! Also going to have a desktop PC on Monday, so I don't think I'll have docker problems anymore, I will be able to test and experiment with it more. I feel like my laptop is hours away from dying.

@simjanos-dev
Copy link
Owner

Just when I was relieved that we finally found the issue.... someone has the same error. He is using a fresh windows install, I'm 99% sure he is on the main image. I asked, will leave a comment if he says otherwise.

@simjanos-dev
Copy link
Owner

So. I did not want to add any breaking change before v1.0, but suddenly there are so many things I want to add, and I have a feeling we will have more. I want to add websockets, which needs another open port and optional containers for other stuff like self hosted translation service, and in the future AI.

Do you think it's reasonable to add updating the docker-compose file to the update process? I really don't want to make more problems for users.

(My new PC is so fast compared to what I had, I won't have docker problems anymore.)

@simjanos-dev
Copy link
Owner

simjanos-dev commented Jun 28, 2024

@sergiolaverde0 I do not want to bother you, but what do you think about my last comment? I asked on discord, and they are okay with it.

(Just saw native image for arm. Seems awesome!)

@sergiolaverde0
Copy link
Contributor Author

I don't know how I missed both your previous comments, so it is good you mentioned me.

I want to add websockets, which needs another open port

Is this really breaking? I would expect users that don't update won't be able to access this feature but otherwise everything will keep working.

optional containers for other stuff like self hosted translation service

Probably same as above.

Do you think it's reasonable to add updating the docker-compose file to the update process?

Doing so every other version would probably be too annoying, but if we give ample warning for people to update we can make it work:

  • Develop any breaking chance in its own branch to keep it away from non-breaking changes.
  • Announce the need for updates on the compose file with several months of anticipation.
  • Mention this in the changelog of every non-breaking release in between.
  • Pin a message with the warning in Discord where appropriate so people see it when reaching to you.
  • Pin an issue here with instructions on how to apply the changes so people see it when trying to report it.

Ideally, only break things once or twice a year. People will generally understand.

@simjanos-dev
Copy link
Owner

Is this really breaking? I would expect users that don't update won't be able to access this feature but otherwise everything will keep working.

Yes. Importing texts will happen in the background in a job queue, and websockets will be used to update the opened book's chapters when an import has finished. Other than that, chapters' word counts will also be loaded with websockets, because that is the part that slows down opening a book, and importing dictionaries already have websockets for reporting real time progress in the latest version(features/websockets-vuex branch).

optional containers: Probably same as above.

This part would not break anything.

but if we give ample warning for people to update we can make it work:

These are great ideas I'll try to do it this way. v0.14 will have the first batch of breaking docker-compose.yml changes and probably won't be many after that, but I'll tell people to check for breaking docker-compose changes before every update. Updates will be less frequent, because I have only a limited time to work on linguacafe now with a job, and I'll mostly do it on the weekends. I'll also add this PR's changes again.

Other kind of breaking changes are still collected for v1.0.

@sergiolaverde0
Copy link
Contributor Author

Actually, I think you should schedule the breaking changes for v0.15, and announce them as soon as you feel comfortable (you are technically promising a feature, which can backfire).

@simjanos-dev
Copy link
Owner

I've already announced it on Discord, and it is getting close to being finished. Will probably add it to the top of the readme file as well. I'm also working on this/prioritizing it to learn for my job, so I kind of have to actually finish it. And like I said, the next update will probably take 2~ months at least due to my limited time, or more if needed. But I see your point, it will have to be well tested and work properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants