Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing failure when uploading a pipeline with kfp cli(or SDK) #2764

Closed
jiyongjung0 opened this issue Dec 20, 2019 · 16 comments
Closed

Parsing failure when uploading a pipeline with kfp cli(or SDK) #2764

jiyongjung0 opened this issue Dec 20, 2019 · 16 comments

Comments

@jiyongjung0
Copy link
Contributor

jiyongjung0 commented Dec 20, 2019

What happened:
When I tried to upload a pipeline with a gzip-ed yaml file, kfp cli returned following error after successfully uploading the pipeline

$ kfp --endpoint=<<XXXXXX>>.notebooks.googleusercontent.com pipeline upload -p test <<XXXXXX>>.tar.gz
Parser must be a string or character stream, not dict
$ echo $?
1

I debugged a little and found that the returned RPC response message from the server is a little bit odd.

{"id":"ec299e6a-b994-4b2c-9060-65ff48039e50","name":"test","parameters":[{"name":"pipeline-root","value":"pipelines/XXXXXXXX"}],"default_version":{"id":"ec299e6a-b994-4b2c-9060-65ff48039e50","name":"test","created_at":{"seconds":1576824474},"parameters":[{"name":"pipeline-root","value":"pipelines/XXXXXXXX"}],"resource_references":[{"key":{"type":3,"id":"ec299e6a-b994-4b2c-9060-65ff48039e50"},"relationship":1}]},"created_at":"2019-12-20T06:47:54Z"}

created_at field in the default_version node is not a string, but "created_at":{"seconds":1576824474} which results in a parse error.

Interesting point is that created_at at the root node is having string value. FYI $ kfp pipeline list works without any problem. 😅

What did you expect to happen:
created_at field should have ISO8601 string format, and kfp CLI should success.

What steps did you take:
I tested in GCP with KFP Build commit: b3171f0 for the server.
Client version is 0.1.37

$ pip show kfp
Name: kfp
Version: 0.1.37

Anything else you would like to add:
I actually was testing TFX which uses KFP Python SDK, and had the same problem. So it looks like a problem not only for CLI.

@jiyongjung0
Copy link
Contributor Author

FYI: I tried to create a pipeline in the web UI, and captured RPC response using chrome inspector. It shows same problem. (However it doesn't make any problem with web UI)

Response:

{"id":"a2e50e0a-7a71-4829-898a-62e3348ae280","name":"eee","default_version":{"id":"a2e50e0a-7a71-4829-898a-62e3348ae280","name":"eee","created_at":{"seconds":1576825669},"resource_references":[{"key":{"type":3,"id":"a2e50e0a-7a71-4829-898a-62e3348ae280"},"relationship":1}]},"created_at":"2019-12-20T07:07:49Z"}

@jiyongjung0 jiyongjung0 changed the title Parsing failure when upload a pipeline with kfp cli(or SDK) Parsing failure when uploading a pipeline with kfp cli(or SDK) Dec 20, 2019
@jiyongjung0
Copy link
Contributor Author

It looks like that this behavior is coming from special http handling of a pipeline upload in https://github.com/kubeflow/pipelines/blob/master/backend/src/apiserver/server/pipeline_upload_server.go . Because this code is manually transforming outer-most created_at field only.

How about using jsonpb for json encoding? It should be more concise while handling all the fields in the structure.

This is my simple proof of concept snippet.

	p := &api.Pipeline {
		CreatedAt: &timestamp.Timestamp{Seconds: 1539211232},
		DefaultVersion: &api.PipelineVersion{
			CreatedAt: &timestamp.Timestamp{Seconds: 1439241123},
		},
	}
	fmt.Println(p)
	m, _ := (&jsonpb.Marshaler{OrigName: true}).MarshalToString(p)
	fmt.Println(m)

This prints

created_at:<seconds:1539211232 > default_version:<created_at:<seconds:1439241123 > > 
{"created_at":"2018-10-10T22:40:32Z","default_version":{"created_at":"2015-08-10T21:12:03Z"}}

@Ark-kun
Copy link
Contributor

Ark-kun commented Dec 28, 2019

@jingzhang36 This seems to be related to the recently introduced pipeline versions feature. Can you please take a look?

@jiyongjung0
Copy link
Contributor Author

I'm sorry but is there any update?

@rmgogogo
Copy link
Contributor

would you share me the pipeline zip file, may ping me directly and OK not share here

@jingzhang36
Copy link
Contributor

Hi Jiyong, we have a recent change to our API methods as we introduced version concept into our system (FE, BE and SDK). In particular, BE change starts from 0.1.37 and FE change starts from 0.1.38. From the response you posted, I feel the pipeline is created successfully because it responds with a pipeline UUID, i.e., ec299e6a-b994-4b2c-9060-65ff48039e50 (among other information of this created pipeline, e.g., default version, pipeline name, creation time, etc.). In other words, the response seems expected to me. Is your script trying to parse the response and find it is different from previous response? If so, that is expected. As I said before, our pipeline object now has versions in it.

@jiyongjung0
Copy link
Contributor Author

Thank you for the response.

Yes, the pipeline was created successfully, but error occurs when parsing the response from the server. I'm using kfp SDK to parse the response.

I've tested with kfp server version 0.1.37, and client version 0.1.37 and 0.1.40 which all fails.

IIUC, the created_at field should be a string according to the definition of the pipeline object, but server response sets this field as a json object, for example, {"seconds":1576824474}.(A sample full response is at the issue body.) So it seems that the generated response from the server is not well-formed.

FYI, this is the stack trace(of the client side) in version 0.1.37.

  File "/home/jupyter/venv/lib/python3.5/site-packages/kfp/_client.py", line 462, in upload_pipeline
    response = self._upload_api.upload_pipeline(pipeline_package_path, name=pipeline_name)
  File "/home/jupyter/venv/lib/python3.5/site-packages/kfp_server_api/api/pipeline_upload_service_api.py", line 55, in upload_pipeline
    (data) = self.upload_pipeline_with_http_info(uploadfile, **kwargs)  # noqa: E501
  File "/home/jupyter/venv/lib/python3.5/site-packages/kfp_server_api/api/pipeline_upload_service_api.py", line 135, in upload_pipeline_with_http_info
    collection_formats=collection_formats)
  File "/home/jupyter/venv/lib/python3.5/site-packages/kfp_server_api/api_client.py", line 330, in call_api
    _preload_content, _request_timeout)
  File "/home/jupyter/venv/lib/python3.5/site-packages/kfp_server_api/api_client.py", line 169, in __call_api
    return_data = self.deserialize(response_data, response_type)
  File "/home/jupyter/venv/lib/python3.5/site-packages/kfp_server_api/api_client.py", line 241, in deserialize
    return self.__deserialize(data, response_type)
  File "/home/jupyter/venv/lib/python3.5/site-packages/kfp_server_api/api_client.py", line 280, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/home/jupyter/venv/lib/python3.5/site-packages/kfp_server_api/api_client.py", line 624, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/home/jupyter/venv/lib/python3.5/site-packages/kfp_server_api/api_client.py", line 280, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/home/jupyter/venv/lib/python3.5/site-packages/kfp_server_api/api_client.py", line 624, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/home/jupyter/venv/lib/python3.5/site-packages/kfp_server_api/api_client.py", line 278, in __deserialize
    return self.__deserialize_datatime(data)
  File "/home/jupyter/venv/lib/python3.5/site-packages/kfp_server_api/api_client.py", line 590, in __deserialize_datatime
    return parse(string)
  File "/home/jupyter/venv/lib/python3.5/site-packages/dateutil/parser/_parser.py", line 1374, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/home/jupyter/venv/lib/python3.5/site-packages/dateutil/parser/_parser.py", line 646, in parse
    res, skipped_tokens = self._parse(timestr, **kwargs)
  File "/home/jupyter/venv/lib/python3.5/site-packages/dateutil/parser/_parser.py", line 725, in _parse
    l = _timelex.split(timestr)         # Splits the timestr into tokens
  File "/home/jupyter/venv/lib/python3.5/site-packages/dateutil/parser/_parser.py", line 207, in split
    return list(cls(s))
  File "/home/jupyter/venv/lib/python3.5/site-packages/dateutil/parser/_parser.py", line 76, in __init__
    '{itype}'.format(itype=instream.__class__.__name__))
TypeError: Parser must be a string or character stream, not dict

@Ark-kun
Copy link
Contributor

Ark-kun commented Jan 14, 2020

we have a recent change to our API methods as we introduced version concept into our system (FE, BE and SDK). In particular, BE change starts from 0.1.37 and FE change starts from 0.1.38.

Correct me if I'm wrong, but as far as I remember there were not supposed to be any breaking change for the existing APIs.

k8s-ci-robot pushed a commit that referenced this issue Jan 30, 2020
* Make created_at value of default_version in UploadPiplineResponse follow RFC3339 time format.

It returned `created_at` value as a json dictionary, for example
`"created_at":{"seconds":1576824474}`, and broke SDK parser.  `jsonpb`
will be used to take care of all json marshaling including time format
conversion.

See #2764 for the detail.

* Make created_at value of default_version in UploadPiplineResponse follow RFC3339 time format.

It returned `created_at` value as a json dictionary, for example
`"created_at":{"seconds":1576824474}`, and broke SDK parser.  `jsonpb`
will be used to take care of all json marshaling including time format
conversion.

See #2764 for the detail.

* Set jsonpb marshaler to output ints for enums.

* Parse json response to check formatting of the `created_at` field.

* Lint: reformatted comments.
@JohanWork
Copy link
Contributor

Is any one working on this, I have done some digging and it seams the one timestamp is unix code but the the type is dateime. See dict below, my knowledge here is however limited when digging in the go code.

{'id': '42927185-5d13-442a-826f-9069dadef063',
 'name': 'pipeline_test.zip_71bd614f-46a7-4d7c-a31d-9dcc41908c78',
 'created_at': {'seconds': 1583953778},
 'parameters': [{'name': 'filename',
   'value': 'gs://ml-pipeline-playground/shakespeare1.txt'}],
 'resource_references': [{'key': {'type': 3,
    'id': '42927185-5d13-442a-826f-9069dadef063'},
   'relationship': 1}]}

@jiyongjung0
Copy link
Contributor Author

I think that this issue is resolved in the latest version of KFP. 😄 Please report again if the error still exists.

@JohanWork
Copy link
Contributor

@jiyongjung0, I am running kfp==0.2.5 and kfp-server-api==0.2.5 but the kubeflow deployment was setup before this release if that would matter.

@jiyongjung0
Copy link
Contributor Author

I think that the version of kubeflow deployment matters in this case because the server emits ill-formed response in this case.

@JohanWork
Copy link
Contributor

Can you specify what I need to upgrade, I am running 1.0 and can't see any more recent release ?

@jiyongjung0
Copy link
Contributor Author

Sorry, I meant kubeflow pipeline installation specifically. You can get a hint from the KFP installation guide or here. The most recent release looks like 0.3.0.

@numerology
Copy link

+1 to @jiyongjung0 's reply. Kubeflow full-fledge deployment is not picking the latest version of KFP.

@eedorenko
Copy link
Contributor

And there is no normal way to upgrade it.

Jeffwan pushed a commit to Jeffwan/pipelines that referenced this issue Dec 9, 2020
* Make created_at value of default_version in UploadPiplineResponse follow RFC3339 time format.

It returned `created_at` value as a json dictionary, for example
`"created_at":{"seconds":1576824474}`, and broke SDK parser.  `jsonpb`
will be used to take care of all json marshaling including time format
conversion.

See kubeflow#2764 for the detail.

* Make created_at value of default_version in UploadPiplineResponse follow RFC3339 time format.

It returned `created_at` value as a json dictionary, for example
`"created_at":{"seconds":1576824474}`, and broke SDK parser.  `jsonpb`
will be used to take care of all json marshaling including time format
conversion.

See kubeflow#2764 for the detail.

* Set jsonpb marshaler to output ints for enums.

* Parse json response to check formatting of the `created_at` field.

* Lint: reformatted comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants