Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] mlflow models serve fails with HTTP 500 instead of 400 on bad input #4897

Closed
2 of 23 tasks
mmaitre314 opened this issue Oct 13, 2021 · 9 comments
Closed
2 of 23 tasks
Labels
area/scoring MLflow Model server, model deployment tools, Spark UDFs bug Something isn't working

Comments

@mmaitre314
Copy link

Thank you for submitting an issue. Please refer to our issue policy for additional information about bug reports. For help with debugging your code, please refer to Stack Overflow.

Please fill in this bug report template to ensure a timely and thorough response.

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
  • No. I cannot contribute a bug fix at this time.

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): windows 10
  • MLflow installed from (source or binary): binary
  • MLflow version (run mlflow --version): 1.20.2
  • Python version: 3.8
  • npm version, if running the dev UI: N/A
  • Exact command to reproduce:
    • Start server: mlflow models serve -m runs:/5f8aee52fcb442388368af4da658b398/model --no-conda
    • Submit an inference request: curl -i -X POST -d "{\"data\":0.0199132142]}" -H "Content-Type: application/json" http://localhost:5000/invocations

Describe the problem

Describe the problem clearly here. Include descriptions of the expected behavior and the actual behavior.

Submitting an inference requests to the MLFlow model server with invalid content returns HTTP error 500 'Internal Server Error' instead of HTTP error 400 'Bad Request'. This prevents proper error handling on the client side and blocks REST API fuzzing.

Ex:

curl -i -X POST -d "{\"data\":0.0199132142]}" -H "Content-Type: application/json" http://localhost:5000/invocations
HTTP/1.1 500 INTERNAL SERVER ERROR
Content-Length: 901
Content-Type: application/json
Date: Wed, 13 Oct 2021 22:16:44 GMT
Server: mlflow

{"error_code": "MALFORMED_REQUEST", "message": "Failed to parse input from JSON. Ensure that input is a valid JSON formatted string.", "stack_trace": "Traceback (most recent call last):\n  File \"C:\\Source\\local_training_mlflow_project\\.venv\\lib\\site-packages\\mlflow\\pyfunc\\scoring_server\\__init__.py\", line 81, in infer_and_parse_json_input\n    decoded_input = json.loads(json_input)\n  File \"C:\\Users\\mmaitre\\Anaconda3\\lib\\json\\__init__.py\", line 357, in loads\n    return _default_decoder.decode(s)\n  File \"C:\\Users\\mmaitre\\Anaconda3\\lib\\json\\decoder.py\", line 337, in decode\n    obj, end = self.raw_decode(s, idx=_w(s, 0).end())\n  File \"C:\\Users\\mmaitre\\Anaconda3\\lib\\json\\decoder.py\", line 353, in raw_decode\n    obj, end = self.scan_once(s, idx)\njson.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 21 (char 20)\n"}

Code to reproduce issue

Provide a reproducible test case that is the bare minimum necessary to generate the problem.'

curl -i -X POST -d "{\"data\":0.0199132142]}" -H "Content-Type: application/json" http://localhost:5000/invocations

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

What component(s), interfaces, languages, and integrations does this bug affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@mmaitre314 mmaitre314 added the bug Something isn't working label Oct 13, 2021
@github-actions github-actions bot added the area/scoring MLflow Model server, model deployment tools, Spark UDFs label Oct 13, 2021
@abatomunkuev
Copy link
Contributor

Hello @mmaitre314! I would like to work on this issue.

Talking about the issue, it seems that we are incorrectly passing a value to the argument error_code in function _handle_serving_error.

try:
decoded_input = json.loads(json_input)
except json.decoder.JSONDecodeError:
_handle_serving_error(
error_message=(
"Failed to parse input from JSON. Ensure that input is a valid JSON"
" formatted string."
),
error_code=MALFORMED_REQUEST,
)

The function _handle_serving_error then passes the value MALFORMED_REQUEST to a MlflowException constructor.

e = MlflowException(message=error_message, error_code=error_code, stack_trace=traceback_str)

In MlflowException constructor, it sets the value MALFORMED_REQUEST, which is not defined. So, probably the code goes to an exception, where error_code stores a value ErrorCode.Name(INTERNAL_ERROR).

try:
self.error_code = ErrorCode.Name(error_code)
except (ValueError, TypeError):
self.error_code = ErrorCode.Name(INTERNAL_ERROR)

The problem is that MALFORMED_REQUEST has not been defined in mlflow/exceptions.py

from mlflow.protos.databricks_pb2 import (
INTERNAL_ERROR,
TEMPORARILY_UNAVAILABLE,
ENDPOINT_NOT_FOUND,
PERMISSION_DENIED,
REQUEST_LIMIT_EXCEEDED,
BAD_REQUEST,
INVALID_PARAMETER_VALUE,
RESOURCE_DOES_NOT_EXIST,
INVALID_STATE,
RESOURCE_ALREADY_EXISTS,
ErrorCode,
)
ERROR_CODE_TO_HTTP_STATUS = {
ErrorCode.Name(INTERNAL_ERROR): 500,
ErrorCode.Name(INVALID_STATE): 500,
ErrorCode.Name(TEMPORARILY_UNAVAILABLE): 503,
ErrorCode.Name(REQUEST_LIMIT_EXCEEDED): 429,
ErrorCode.Name(ENDPOINT_NOT_FOUND): 404,
ErrorCode.Name(RESOURCE_DOES_NOT_EXIST): 404,
ErrorCode.Name(PERMISSION_DENIED): 403,
ErrorCode.Name(BAD_REQUEST): 400,
ErrorCode.Name(RESOURCE_ALREADY_EXISTS): 400,
ErrorCode.Name(INVALID_PARAMETER_VALUE): 400,
}

There are 2 ways to fix this issue:

  1. Instead of passing MALFORMED_REQUEST, we need to pass BAD_REQUEST. I would prefer this option.
  2. Define MALFORMED_REQUEST in mlflow/exceptions.py

@dbczumar
Copy link
Collaborator

dbczumar commented Nov 3, 2021

Hi @abatomunkuev ! Thank you for the detailed root cause analysis and willingness to contribute. We'd be very excited about your contribution for a fix; I agree that solution #1 is better. Please feel free to file a pull request, and let me know if you have any questions!

@abatomunkuev
Copy link
Contributor

Hello @dbczumar! I am currently have some issues reproducing the error.

It seems to me that to start a server, I may need a ML model.
mlflow models serve -m runs:/5f8aee52fcb442388368af4da658b398/model --no-conda

Could you please guide me through how to properly serve the model. From model building to serving. I am trying to run this script: python sklearn_elasticnet_wine/train.py. However, I got the following error with dependencies:

(mlflow-dev-env) MacBook-Pro-Andrei:examples bork$ python sklearn_elasticnet_wine/train.py
Traceback (most recent call last):
  File "sklearn_elasticnet_wine/train.py", line 9, in <module>
    import pandas as pd
ImportError: No module named pandas
(mlflow-dev-env) MacBook-Pro-Andrei:examples bork$ 

I have gone through contribution guidelines, created conda environment and installed the dependencies.

@dbczumar
Copy link
Collaborator

dbczumar commented Nov 4, 2021

Hi @abatomunkuev , are you sure that MLflow and Pandas are installed in your conda environment?

@abatomunkuev
Copy link
Contributor

@dbczumar

mlflow-dev-env) MacBook-Pro-Andrei:mlflow bork$ pip install -r dev-requirements.txt
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621
Requirement already satisfied: sphinx==3.5.4 in /usr/local/lib/python3.7/site-packages (from -r dev-requirements.txt (line 4)) (3.5.4)
Requirement already satisfied: sphinx-autobuild in /usr/local/lib/python3.7/site-packages (from -r dev-requirements.txt (line 5)) (2021.3.14)
Requirement already satisfied: sphinx-click in /usr/local/lib/python3.7/site-packages (from -r dev-requirements.txt (line 6)) (3.0.1)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.7/site-packages (from -r dev-requirements.txt (line 7)) (1.0.1)
Requirement already satisfied: scipy in /usr/local/lib/python3.7/site-packages (from -r dev-requirements.txt (line 8)) (1.7.1)
Requirement already satisfied: kubernetes in /usr/local/lib/python3.7/site-packages (from -r dev-requirements.txt (line 9)) (19.15.0)
Requirement already satisfied: docutils<0.17,>=0.12 in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (0.16)
Requirement already satisfied: sphinxcontrib-qthelp in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (1.0.3)
Requirement already satisfied: snowballstemmer>=1.1 in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (2.1.0)
Requirement already satisfied: babel>=1.3 in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (2.9.1)
Requirement already satisfied: imagesize in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (1.2.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (21.2)
Requirement already satisfied: requests>=2.5.0 in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (2.26.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (42.0.2)
Requirement already satisfied: Jinja2>=2.3 in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (3.0.2)
Requirement already satisfied: sphinxcontrib-serializinghtml in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (1.1.5)
Requirement already satisfied: sphinxcontrib-applehelp in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (1.0.2)
Requirement already satisfied: sphinxcontrib-devhelp in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (1.0.2)
Requirement already satisfied: Pygments>=2.0 in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (2.10.0)
Requirement already satisfied: sphinxcontrib-htmlhelp in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (2.0.0)
Requirement already satisfied: alabaster<0.8,>=0.7 in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (0.7.12)
Requirement already satisfied: sphinxcontrib-jsmath in /usr/local/lib/python3.7/site-packages (from sphinx==3.5.4->-r dev-requirements.txt (line 4)) (1.0.1)
Requirement already satisfied: livereload in /usr/local/lib/python3.7/site-packages (from sphinx-autobuild->-r dev-requirements.txt (line 5)) (2.6.3)
Requirement already satisfied: colorama in /usr/local/lib/python3.7/site-packages (from sphinx-autobuild->-r dev-requirements.txt (line 5)) (0.4.4)
Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.7/site-packages (from sphinx-click->-r dev-requirements.txt (line 6)) (8.0.3)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/site-packages (from scikit-learn->-r dev-requirements.txt (line 7)) (1.1.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/site-packages (from scikit-learn->-r dev-requirements.txt (line 7)) (3.0.0)
Requirement already satisfied: numpy>=1.14.6 in /usr/local/lib/python3.7/site-packages (from scikit-learn->-r dev-requirements.txt (line 7)) (1.19.5)
Requirement already satisfied: websocket-client!=0.40.0,!=0.41.*,!=0.42.*,>=0.32.0 in /usr/local/lib/python3.7/site-packages (from kubernetes->-r dev-requirements.txt (line 9)) (1.2.1)
Requirement already satisfied: certifi>=14.05.14 in /usr/local/lib/python3.7/site-packages (from kubernetes->-r dev-requirements.txt (line 9)) (2021.10.8)
Requirement already satisfied: python-dateutil>=2.5.3 in /usr/local/lib/python3.7/site-packages (from kubernetes->-r dev-requirements.txt (line 9)) (2.8.2)
Requirement already satisfied: requests-oauthlib in /usr/local/lib/python3.7/site-packages (from kubernetes->-r dev-requirements.txt (line 9)) (1.3.0)
Requirement already satisfied: pyyaml>=5.4.1 in /usr/local/lib/python3.7/site-packages (from kubernetes->-r dev-requirements.txt (line 9)) (6.0)
Requirement already satisfied: google-auth>=1.0.1 in /usr/local/lib/python3.7/site-packages (from kubernetes->-r dev-requirements.txt (line 9)) (2.3.3)
Requirement already satisfied: urllib3>=1.24.2 in /usr/local/lib/python3.7/site-packages (from kubernetes->-r dev-requirements.txt (line 9)) (1.26.7)
Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.7/site-packages (from kubernetes->-r dev-requirements.txt (line 9)) (1.15.0)
Requirement already satisfied: pytz>=2015.7 in /usr/local/lib/python3.7/site-packages (from babel>=1.3->sphinx==3.5.4->-r dev-requirements.txt (line 4)) (2021.3)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/site-packages (from click>=7.0->sphinx-click->-r dev-requirements.txt (line 6)) (4.8.1)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/site-packages (from google-auth>=1.0.1->kubernetes->-r dev-requirements.txt (line 9)) (4.7.2)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/site-packages (from google-auth>=1.0.1->kubernetes->-r dev-requirements.txt (line 9)) (4.2.4)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/site-packages (from google-auth>=1.0.1->kubernetes->-r dev-requirements.txt (line 9)) (0.2.8)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.7/site-packages (from Jinja2>=2.3->sphinx==3.5.4->-r dev-requirements.txt (line 4)) (2.0.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/site-packages (from requests>=2.5.0->sphinx==3.5.4->-r dev-requirements.txt (line 4)) (3.3)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.7/site-packages (from requests>=2.5.0->sphinx==3.5.4->-r dev-requirements.txt (line 4)) (2.0.7)
Requirement already satisfied: tornado in /usr/local/lib/python3.7/site-packages (from livereload->sphinx-autobuild->-r dev-requirements.txt (line 5)) (6.1)
Requirement already satisfied: pyparsing<3,>=2.0.2 in /usr/local/lib/python3.7/site-packages (from packaging->sphinx==3.5.4->-r dev-requirements.txt (line 4)) (2.4.7)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/site-packages (from requests-oauthlib->kubernetes->-r dev-requirements.txt (line 9)) (3.1.1)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth>=1.0.1->kubernetes->-r dev-requirements.txt (line 9)) (0.4.8)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/site-packages (from importlib-metadata->click>=7.0->sphinx-click->-r dev-requirements.txt (line 6)) (3.6.0)
Requirement already satisfied: typing-extensions>=3.6.4 in /usr/local/lib/python3.7/site-packages (from importlib-metadata->click>=7.0->sphinx-click->-r dev-requirements.txt (line 6)) (3.7.4.3)
(mlflow-dev-env) MacBook-Pro-Andrei:mlflow bork$ pip install pandas
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621
Requirement already satisfied: pandas in /usr/local/lib/python3.7/site-packages (1.3.4)
Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.7/site-packages (from pandas) (1.19.5)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/site-packages (from pandas) (2021.3)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/site-packages (from pandas) (2.8.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)
(mlflow-dev-env) MacBook-Pro-Andrei:mlflow bork$ cd examples/
(mlflow-dev-env) MacBook-Pro-Andrei:examples bork$ python sklearn_elasticnet_wine/train.py
Traceback (most recent call last):
  File "sklearn_elasticnet_wine/train.py", line 9, in <module>
    import pandas as pd
ImportError: No module named pandas
(mlflow-dev-env) MacBook-Pro-Andrei:examples bork$ 

@dbczumar
Copy link
Collaborator

dbczumar commented Nov 4, 2021

@abatomunkuev if you run “which python” and “which pip”, do the resulting locations reside within the expected conda environment?

@abatomunkuev
Copy link
Contributor

abatomunkuev commented Nov 4, 2021

@dbczumar I got it working. I had to run the python script from conda environment and also install dependencies for conda env.

@abatomunkuev
Copy link
Contributor

@dbczumar I have created a Pull Request. However, when I was performing tests by running the following command

pytest tests/pyfunc --large

Some tests have failed since I have changed the code in scoring_server/__init__.py. I am a little bit confused. Should I change the existing tests in test_scoring_server.py ? If so, could you help me to find which test function needs to be updated so that it respects the changes.

Thank you.

@dbczumar
Copy link
Collaborator

dbczumar commented Aug 9, 2022

Closing now that #5003 has been merged. Thanks @abatomunkuev !

@dbczumar dbczumar closed this as completed Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/scoring MLflow Model server, model deployment tools, Spark UDFs bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants