Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KF v1.6.0-rc.1 - MNIST E2E on Kubeflow on Vanilla k8s - TypeError: write() argument must be str, not <class 'bytes'> #993

Open
julioo opened this issue Aug 24, 2022 · 4 comments

Comments

@julioo
Copy link

julioo commented Aug 24, 2022

Hello,

Testing KF v1.6.0-rc.1 Mnist E2E on vanilla K8s, I get an error executing tfjob_client.wait_for_job.

from kubeflow.tfjob import TFJobClient
tfjob_client = TFJobClient()
tfjob_client.wait_for_job(train_name, namespace=namespace, watch=True)

Using KF v.1.5, it was successful but using v1.6.0-rc.1 an exception is raised.

TypeError: write() argument must be str, not <class 'bytes'>

Replacing previous iostream file version, the execution is successful.
/opt/conda/lib/python3.8/site-packages/ipykernel/iostream.py

Comparing the previous version with the latest version

The following code seems to cause the issue.

    if not isinstance(string, str):
        raise TypeError(f"write() argument must be str, not {type(string)}")

I don't know how to fix the issue but my current workaround is to replace the iostream.py file with the previous version.

Thank you

@jbottum
Copy link

jbottum commented Aug 24, 2022

@kubeflow/wg-training-leads any input on this ?

@johnugeorge
Copy link
Member

johnugeorge commented Aug 25, 2022

There are no changes in SDK with respect to dependencies. btw, ipykernel is not a dependency of training-sdk . https://pypi.org/project/kubeflow-training/

And this is tested in CI as well https://github.com/kubeflow/training-operator/blob/master/.github/workflows/integration-tests.yaml#L38

@julioo
Copy link
Author

julioo commented Aug 25, 2022

@johnugeorge

There are no changes in SDK with respect to dependencies. btw, ipykernel is not a dependency of training-sdk . https://pypi.org/project/kubeflow-training/

And this is tested in CI as well https://github.com/kubeflow/training-operator/blob/master/.github/workflows/integration-tests.yaml#L38

Reproduced the same situation
JupytherLab Version 3.4.3 using jupyter-tensorflow-full:v1.6.0-rc.1

Executing

from kubeflow.tfjob import TFJobClient
tfjob_client = TFJobClient()
tfjob_client.wait_for_job(train_name, namespace=namespace, watch=True)

Get the error

TypeError                                 Traceback (most recent call last)
Input In [18], in <cell line: 3>()
      1 from kubeflow.tfjob import TFJobClient
      2 tfjob_client = TFJobClient()
----> 3 tfjob_client.wait_for_job(train_name, namespace=namespace, watch=True)

File ~/git_tf-operator/sdk/python/kubeflow/tfjob/api/tf_job_client.py:220, in TFJobClient.wait_for_job(self, name, namespace, timeout_seconds, polling_interval, watch, status_callback)
    217   namespace = utils.get_default_target_namespace()
    219 if watch:
--> 220   tfjob_watch(
    221     name=name,
    222     namespace=namespace,
    223     timeout_seconds=timeout_seconds)
    224 else:
    225   return self.wait_for_condition(
    226     name,
    227     ["Succeeded", "Failed"],
   (...)
    230     polling_interval=polling_interval,
    231     status_callback=status_callback)

File ~/.local/lib/python3.8/site-packages/retrying.py:49, in retry.<locals>.wrap.<locals>.wrapped_f(*args, **kw)
     47 @six.wraps(f)
     48 def wrapped_f(*args, **kw):
---> 49     return Retrying(*dargs, **dkw).call(f, *args, **kw)

File ~/.local/lib/python3.8/site-packages/retrying.py:212, in Retrying.call(self, fn, *args, **kwargs)
    209 if self.stop(attempt_number, delay_since_first_attempt_ms):
    210     if not self._wrap_exception and attempt.has_exception:
    211         # get() on an attempt with an exception should cause it to be raised, but raise just in case
--> 212         raise attempt.get()
    213     else:
    214         raise RetryError(attempt)

File ~/.local/lib/python3.8/site-packages/retrying.py:247, in Attempt.get(self, wrap_exception)
    245         raise RetryError(self)
    246     else:
--> 247         six.reraise(self.value[0], self.value[1], self.value[2])
    248 else:
    249     return self.value

File /opt/conda/lib/python3.8/site-packages/six.py:703, in reraise(tp, value, tb)
    701     if value.__traceback__ is not tb:
    702         raise value.with_traceback(tb)
--> 703     raise value
    704 finally:
    705     value = None

File ~/.local/lib/python3.8/site-packages/retrying.py:200, in Retrying.call(self, fn, *args, **kwargs)
    198 while True:
    199     try:
--> 200         attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
    201     except:
    202         tb = sys.exc_info()

File ~/git_tf-operator/sdk/python/kubeflow/tfjob/api/tf_job_watch.py:55, in watch(name, namespace, timeout_seconds)
     52 status = last_condition.get('type', '')
     53 update_time = last_condition.get('lastTransitionTime', '')
---> 55 tbl(tfjob_name, status, update_time)
     57 if name == tfjob_name:
     58   if status == 'Succeeded' or status == 'Failed':

File /opt/conda/lib/python3.8/site-packages/table_logger/table_logger.py:204, in TableLogger.__call__(self, *args)
    200     raise ValueError('Expected number of columns is {}. Got {}.'.format(
    201         len(self.formatters), len(row_cells)))
    203 line = self.format_row(*row_cells)
--> 204 self.print_line(line)

File /opt/conda/lib/python3.8/site-packages/table_logger/table_logger.py:308, in TableLogger.print_line(self, text)
    307 def print_line(self, text):
--> 308     self.file.write(text.encode(self.encoding))
    309     self.file.write(b'\n')
    310     self.file.flush()

File /opt/conda/lib/python3.8/site-packages/ipykernel/iostream.py:529, in OutStream.write(self, string)
    519 """Write to current stream after encoding if necessary
    520 
    521 Returns
   (...)
    525 
    526 """
    528 if not isinstance(string, str):
--> 529     raise TypeError(f"write() argument must be str, not {type(string)}")
    531 if self.echo is not None:
    532     try:

TypeError: write() argument must be str, not <class 'bytes'>

Replacing iostream.py file with the previous version, get proper result

from kubeflow.tfjob import TFJobClient
tfjob_client = TFJobClient()
tfjob_client.wait_for_job(train_name, namespace=namespace, watch=True)`
mnist-train-05e7               Created              2022-08-22T14:52:59Z          
mnist-train-05e7               Running              2022-08-22T14:53:08Z          
mnist-train-05e7               Running              2022-08-22T14:53:08Z          
mnist-train-05e7               Succeeded            2022-08-22T14:53:30Z  

Note this error isn't blocking, the example is served and deployed with success.

@jacklu2016
Copy link

Hi Julioo, I am newbi to kubeflow. feel a little comfuse with this mnist E2E on kubeflow on Vanilla k8s example. Pls help
First, Should we run jupyter-tensorflow-full:v1.6.0-rc.1 image on k8s which install kubeflow? or we can run jupyter-tensorflow-full:v1.6.0-rc.1 anywhere in docker runtime?
Second, I found the notebook first import kubenete client, but I dont found anywhere pip install kubenete? and don't we need to obvious config how to connect to our k8s cluster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants