Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support loading models eagerly #762

Closed
elonzh opened this issue May 28, 2021 · 17 comments · May be fixed by #765
Closed

Support loading models eagerly #762

elonzh opened this issue May 28, 2021 · 17 comments · May be fixed by #765

Comments

@elonzh
Copy link
Contributor

elonzh commented May 28, 2021

Currently, the grobid service will load models when the first request coming in. This behavior causes some usability issues because the first request is always slow and the API/task which depends on grobid may timeout.

I don't know why we need this lazy loading mechanism and I think we should support loading models eagerly.

@lfoppiano
Copy link
Collaborator

Hi @elonzh, you can change the load strategy from the configuration file. See here: https://grobid.readthedocs.io/en/latest/Grobid-service/#model-loading-strategy

@elonzh
Copy link
Contributor Author

elonzh commented May 28, 2021

Yes, I found it in #443 (comment).
Thanks for your quick reply and sorry for interrupting.

@elonzh elonzh closed this as completed May 28, 2021
@elonzh
Copy link
Contributor Author

elonzh commented May 28, 2021

@lfoppiano Can we config the service by environment variables just like grobid.properties

@lfoppiano
Copy link
Collaborator

lfoppiano commented May 28, 2021

Yes, you can but you need to modify the config file as follow:

modelPreload: ${MODEL_PRELOAD:- false}

and then

export MODEL_PRELOAD=true

See an example: https://github.com/lfoppiano/grobid-superconductors/blob/4bc75e47d42d43159a32b5c380a2ce16b94a1125/resources/config/config.yml#L7

@elonzh
Copy link
Contributor Author

elonzh commented May 28, 2021

Yes, you can but you need to modify the config file as follow:

modelPreload: ${MODEL_PRELOAD:- false}

and then

export MODEL_PRELOAD=true

See an example: lfoppiano/grobid-superconductors@4bc75e4/resources/config/config.yml#L7

Seems not working.

grobid  | Caused by: com.fasterxml.jackson.databind.exc.InvalidFormatException: Cannot deserialize value of type `boolean` from String "${GROBID__MODEL_PRELOAD:- true}": only "true" or "false" recognized
grobid  |  at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: org.grobid.service.GrobidServiceConfiguration["grobid"]->org.grobid.service.GrobidServicePropConfiguration["modelPreload"])
grobid  |       at com.fasterxml.jackson.databind.exc.InvalidFormatException.from(InvalidFormatException.java:67)
grobid  |       at com.fasterxml.jackson.databind.DeserializationContext.weirdStringException(DeserializationContext.java:1676)
grobid  |       at com.fasterxml.jackson.databind.DeserializationContext.handleWeirdStringValue(DeserializationContext.java:932)
grobid  |       at com.fasterxml.jackson.databind.deser.std.NumberDeserializers$BooleanDeserializer._parseBoolean(NumberDeserializers.java:251)
grobid  |       at com.fasterxml.jackson.databind.deser.std.NumberDeserializers$BooleanDeserializer.deserialize(NumberDeserializers.java:199)
grobid  |       at com.fasterxml.jackson.databind.deser.std.NumberDeserializers$BooleanDeserializer.deserialize(NumberDeserializers.java:175)
grobid  |       at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:129)
grobid  |       at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:288)
grobid  |       at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:151)
grobid  |       at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:129)
grobid  |       at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:288)
grobid  |       at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:151)
grobid  |       at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4173)
grobid  |       at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2467)
grobid  |       at io.dropwizard.configuration.BaseConfigurationFactory.build(BaseConfigurationFactory.java:127)

@lfoppiano
Copy link
Collaborator

Could you share your config file?

@elonzh
Copy link
Contributor Author

elonzh commented May 28, 2021

just one line change

modelPreload: ${GROBID__MODEL_PRELOAD:- true}
grobid:
  # NOTE: change these values to absolute paths when running on production
  grobidHome: "grobid-home"

  # how to load the models,
  # false -> models are loaded when needed (default), avoiding putting in memory useless models
  # true -> all the models are loaded into memory at the server startup, slow the start of the services and models not
  # used will take some memory
  modelPreload: ${GROBID__MODEL_PRELOAD:- true}

  # CORS configuration
  corsAllowedOrigins: "*"
  corsAllowedMethods: "OPTIONS,GET,PUT,POST,DELETE,HEAD"
  corsAllowedHeaders: "X-Requested-With,Content-Type,Accept,Origin"

server:
  type: custom
  applicationConnectors:
    - type: http
      port: 8070
  adminConnectors:
    - type: http
      port: 8071
  registerDefaultExceptionMappers: false

logging:
  level: INFO
  loggers:
    org.apache.pdfbox.pdmodel.font.PDSimpleFont: "OFF"
  appenders:
    - type: console
      threshold: ALL
      timeZone: UTC
    - type: file
      currentLogFilename: logs/grobid-service.log
      threshold: ALL
      archive: true
      archivedLogFilenamePattern: logs/grobid-service-%d.log
      archivedFileCount: 5
      timeZone: UTC

@lfoppiano
Copy link
Collaborator

Can you try to use the quotes? "true"

@elonzh
Copy link
Contributor Author

elonzh commented May 28, 2021

Cannot deserialize value of type `boolean` from String "${GROBID__MODEL_PRELOAD:- "true"}": only "true" or "false" recognized

@lfoppiano
Copy link
Collaborator

OK I need to do some more tests on it

@lfoppiano
Copy link
Collaborator

@elonzh something was missing in the code.

It's in the PR #765 or you can try it using the branch feature/enable-env-variables.

@elonzh
Copy link
Contributor Author

elonzh commented May 31, 2021

Nice work! Although I think giving some tests is a better way to test this feature.

I already use the config file to preload models so the issue is not a big deal for me.

@ishaqibrahimbot
Copy link

Hi @lfoppiano! I was facing the same issue (wanted to load models eagerly) because of the slow start and found this issue thread.

I'm a bit confused as to how I can set model_preload=true if I'm using the docker image to run the grobid server. Is there a way I can include an argument (like an environment parameter) in the "docker run..." command?

Or should I try to edit the docker container and change the default parameter in the config file to true?

Thanks in advance!

@elonzh
Copy link
Contributor Author

elonzh commented Jun 1, 2021

Hi @lfoppiano! I was facing the same issue (wanted to load models eagerly) because of the slow start and found this issue thread.

I'm a bit confused as to how I can set model_preload=true if I'm using the docker image to run the grobid server. Is there a way I can include an argument (like an environment parameter) in the "docker run..." command?

Or should I try to edit the docker container and change the default parameter in the config file to true?

Thanks in advance!

You should mount the config file to the right path.

A docker-compose.yml example:

version: "3.7"

services:
  grobid:
    image: lfoppiano/grobid:0.6.2
    environment:
      - JAVA_OPTS="-Xmx4096m"
    configs:
      - source: grobid-service-config
        target: /opt/grobid/grobid-service/config/config.yaml
configs:
  grobid-service-config:
    file: ./grobid-service-config.yml

@kermitt2
Copy link
Owner

kermitt2 commented Jun 1, 2021

@ishaqibrahimbot
Copy link

@lfoppiano @kermitt2 Really appreciate how quickly you guys responded, and apologies for getting back to you after such a delay!

Just wanted to share an update: after reading through your suggestions and the Grobid docs a few times, I was successfully able to mount the modified config.yaml to the docker image and get the models to preload. Thank you for the help!

@kermitt2
Copy link
Owner

Closing this issue because all the points have been covered and are documented. Don't hesitate to re-open it if you need more related actions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants