Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bootstrapper cannot handle input with wildcards #995

Open
ptitzler opened this issue Oct 23, 2020 · 1 comment
Open

bootstrapper cannot handle input with wildcards #995

ptitzler opened this issue Oct 23, 2020 · 1 comment
Labels
component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines feedback:user kind:bug Something isn't working

Comments

@ptitzler
Copy link
Member

version 1.3

  • create a notebook1 that produces several output files, e.g. data/file1.csv, data/file2.csv, ...
  • create a notebook2 (it doesn't matter what it does because it won't run)
  • create a pipeline notebook1 -> notebook2, configuring the output files for notebook1 as data/*
  • run the pipeline on kubeflow pipelines

Result:

  • processing of notebook1 succeeds and the output files are properly uploaded to the COS bucket
  • processing of notebook2 fails:
[I 23:08:27.715] 'test_load_viz-1023160715':'Data_Viz' - downloaded Data_Viz-2289c970-b214-418b-bf93-68d880326eb0.tar.gz from bucket: pipeline-artifacts, object: test_load_viz-1023160715/Data_Viz-2289c970-b214-418b-bf93-68d880326eb0.tar.gz (0.042 secs)
Traceback (most recent call last):
  File "bootstrapper.py", line 402, in <module>
    main()
  File "bootstrapper.py", line 393, in main
    file_op.process_dependencies()
  File "bootstrapper.py", line 97, in process_dependencies
    self.get_file_from_object_storage(file.strip())
  File "bootstrapper.py", line 137, in get_file_from_object_storage
    self.cos_client.fget_object(bucket_name=self.cos_bucket,
  File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 719, in fget_object
    stat = self.stat_object(bucket_name, object_name, sse)
  File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 1138, in stat_object
    response = self._url_open('HEAD', bucket_name=bucket_name,
  File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 2017, in _url_open
    raise ResponseError(response,
minio.error.NoSuchKey: NoSuchKey: message: The specified key does not exist.

The corresponding lines of code are

        if inputs:
            input_list = inputs.split(INOUT_SEPARATOR)
            for file in input_list:
                self.get_file_from_object_storage(file.strip()).     <------ FAIL

To figure out which input file couldn't be processed I had to export the pipeline and inspect the generated bootstrapper script:

- name: data-viz
    container:
      args: ['mkdir -p ./jupyter-work-dir/ && cd ./jupyter-work-dir/ && curl -H "Cache-Control:
          no-cache" -L https://raw.githubusercontent.com/elyra-ai/kfp-notebook/v0.13.0/etc/docker-scripts/bootstrapper.py
          --output bootstrapper.py && curl -H "Cache-Control: no-cache" -L https://raw.githubusercontent.com/elyra-ai/kfp-notebook/v0.13.0/etc/requirements-elyra.txt
          --output requirements-elyra.txt && python3 -m pip install  packaging &&
          python3 -m pip freeze > requirements-current.txt && python3 bootstrapper.py
          --cos-endpoint http://devises1.fyre.ibm.com:31323 --cos-bucket pipeline-artifacts
          --cos-directory "test_load_viz" --cos-dependencies-archive "Data_Viz-2289c970-b214-418b-bf93-68d880326eb0.tar.gz"
          --file "XAI/Elyra Build/Data_Viz.ipynb" --inputs "data/bank-additional/*" ']

Input is set to data/bank-additional/*, which seems to cause the failure.

Issues:

  • when loading declared output files, special processing is required if the input "name" contains wildcards
  • the error message should identify the key that was used when an attempt was made to download an object from the bucket
@ptitzler ptitzler added component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines kind:bug Something isn't working feedback:user and removed status:Needs Triage labels Oct 23, 2020
@ptitzler
Copy link
Member Author

Temporary workaround: don't use wildcards in output file declarations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines feedback:user kind:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant