Enhance the runing time info in the debug model #439

xauthulei · 2020-01-08T06:43:11Z

Our current debug info is so simple,especially, the confuse time stamp, also don't show the target debug file's location ,
[I 200108 13:36:53 config:131] Using preprocessor: <kubeflow.fairing.preprocessors.base.BasePreProcessor object at 0x1017af9b0>
after changed, it will be more meaningful, for example :
INFO|2020-01-08 13:18:33|/Users/llhu/Library/Python/3.7/lib/python/site-packages/werkzeug/_internal.py|122| * Running on http://127.0.0.1:8080/

This change is

k8s-ci-robot · 2020-01-08T06:43:20Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign karthikv2k
You can assign the PR to them by writing /assign @karthikv2k in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

xauthulei · 2020-01-08T09:02:30Z

/cc @jinchihe , Could you review this enhance, many thanks.

jinchihe · 2020-01-08T09:11:04Z

Thanks @xauthulei for the contribution.

I'm thinking if need to feature that's set the LOGLEVEL from env variable, and we can define the default value in the constants, as below, so user can set the log level from env if need debug.

FAIRING_LOGLEVEL = os.environ.get('FAIRING_LOGLEVEL', 'INFO').upper()

What do you think?

jinchihe · 2020-01-08T09:13:54Z

Seems the cluster is removed,

Step #2: ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=404, message=Not found: projects/kubeflow-ci/zones/us-cen
tral1-a/clusters/kubeflow-ci-fairing.
Step #2: No cluster named 'kubeflow-ci-fairing' in kubeflow-ci.

Checked, no kubeflow-ci-fairing cluster. I will take a look this, thanks.

[root@jinchi1 ~]# gcloud --project=kubeflow-ci container clusters list
NAME              LOCATION       MASTER_VERSION  MASTER_IP       MACHINE_TYPE   NODE_VERSION     NUM_NODES  STATUS
fairing-ci-0-7    us-central1-a  1.14.7-gke.23   35.224.29.239   n1-standard-8  1.14.6-gke.2 *   2          RUNNING
kf-vmaster-n00    us-east1-b     1.12.10-gke.17  104.196.52.228  n1-standard-8  1.11.8-gke.5 *   2          RUNNING
kubeflow-testing  us-east1-d     1.12.10-gke.17  35.196.213.148  n1-standard-8  1.9.3-gke.0 ***  10         RUNNING

jinchihe · 2020-01-08T09:17:47Z

@xauthulei the cluster has been changed to fairing-ci-0-7, why still using kubeflow-ci-fairing? are you using the last code? could you please try to rebase and try again? Thanks.

xauthulei · 2020-01-08T12:13:30Z

Thanks @xauthulei for the contribution.

I'm thinking if need to feature that's set the LOGLEVEL from env variable, and we can define the default value in the constants, as below, so user can set the log level from env if need debug.
FAIRING_LOGLEVEL = os.environ.get('FAIRING_LOGLEVEL', 'INFO').upper()
What do you think?

Thanks @jinchihe , as your suggestion, I have moved the logging module setting into the constants.py and also enable the use to self-defined the log level.

jinchihe · 2020-01-09T02:25:46Z

Great, thanks @xauthulei
/retest

xauthulei · 2020-01-09T05:31:31Z

The current error log output is confuse to me, Is there some wrong from my part. Thanks @jinchihe , I am sorry for this.

jinchihe · 2020-01-09T05:35:25Z

@xauthulei I think that should be cuased by test env problem.
/retest

jinchihe · 2020-01-09T09:46:50Z

The CI test hangs again...
/cc @abhi-g

abhi-g · 2020-01-09T19:21:16Z

/test kubeflow-fairing-presubmit

abhi-g · 2020-01-09T22:21:12Z

/test kubeflow-fairing-presubmit

abhi-g · 2020-01-10T01:25:07Z

Still working on these tests. They run successfully when I launch the tests from my desktop onto the cluster. But fail when running via prow.

…

On Thu, Jan 9, 2020 at 5:22 PM Kubernetes Prow Robot < ***@***.***> wrote: @xauthulei <https://github.com/xauthulei>: The following test *failed*, say /retest to rerun all failed tests: Test name Commit Details Rerun command kubeflow-fairing-presubmit 9850457 <9850457> link <https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubeflow_fairing/439/kubeflow-fairing-presubmit/1215398305979174912/> /test kubeflow-fairing-presubmit Full PR test history <https://prow.k8s.io/pr-history?org=kubeflow&repo=fairing&pr=439>. Your PR dashboard <https://gubernator.k8s.io/pr/xauthulei>. Please help us cut down on flakes by linking to <https://git.k8s.io/community/contributors/devel/sig-testing/flaky-tests.md#filing-issues-for-flaky-tests> an open issue <https://github.com/kubeflow/fairing/issues?q=is:issue+is:open> when you hit one in your PR. Instructions for interacting with me using PR comments are available here <https://git.k8s.io/community/contributors/guide/pull-requests.md>. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra <https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:> repository. I understand the commands that are listed here <https://go.k8s.io/bot-commands>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#439?email_source=notifications&email_token=ACZ2UZSIWOEKBTL2VGOW7EDQ47EW5A5CNFSM4KEDTAF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEISLLTI#issuecomment-572831181>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACZ2UZRIZ7ZKOMOBVU7TUKTQ47EW5ANCNFSM4KEDTAFQ> .

abhi-g · 2020-01-10T17:16:49Z

/retest

abhi-g · 2020-01-14T00:15:04Z

@xauthulei please merge upstream/master into your branch to pick up CI test fixes

xauthulei · 2020-01-14T03:06:58Z

/retest

xauthulei · 2020-01-14T05:21:26Z

@xauthulei please merge upstream/master into your branch to pick up CI test fixes

@abhi-g , Thanks for your great efforts, I have refreshed my local, but it seems fail on other parts, could you check it again.

abhi-g · 2020-01-14T13:41:55Z

/retest

abhi-g · 2020-01-14T13:58:38Z

For some reason your tests are failing with errors. Such as when running pytorch jobs, the master fails with:
2020-01-14 05:46:52.053 PST
Traceback (most recent call last):
File "/app/function_shim.py", line 1, in
from kubeflow.fairing.constants import constants
ModuleNotFoundError: No module named 'kubeflow'
Hide log summary
Expand nested fields
Copy to clipboard
{
textPayload: "Traceback (most recent call last): File "/app/function_shim.py", line 1, in from kubeflow.fairing.constants import constants ModuleNotFoundError: No module named 'kubeflow' "
insertId: "99pc2kug3ujxoepyn"
resource: {2}
timestamp: "2020-01-14T13:46:52.053482016Z"
severity: "ERROR"
labels: {10}
logName: "projects/kubeflow-ci/logs/stderr"
receiveTimestamp: "2020-01-14T13:47:03.611855267Z"
}

abhi-g · 2020-01-14T14:01:34Z

Also, there are failure errors in the unittests which you should be able to run locally and verify the errors as well.

abhi-g · 2020-01-14T14:08:24Z

For eg. 1 of the failures..

____________________ test_overwrite_file_for_multiple_runs _____________________
def test_overwrite_file_for_multiple_runs():
preprocessor = ConvertNotebookPreprocessor(notebook_file=NOTEBOOK_PATH)

  files = preprocessor.preprocess()

preprocessors/test_converted_notebook_preprocessor.py:27:

../../kubeflow/fairing/preprocessors/converted_notebook.py:121: in preprocess
contents, _ = exporter.from_filename(self.notebook_file)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/exporter.py:179: in from_filename
return self.from_file(f, resources=resources, **kw)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/exporter.py:197: in from_file
return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:357: in from_notebook_node
output = self.template.render(nb=nb_copy, resources=resources)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:142: in template
self._template_cached = self._load_template()
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:328: in _load_template
return self.environment.get_template(template_file)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:154: in environment
self._environment_cached = self._create_environment()
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:436: in _create_environment
paths = self.get_template_paths()
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:465: in get_template_paths
template_names = self.get_template_names()

self = <nbconvert.exporters.python.PythonExporter object at 0x7fce2cc330b8>
def get_template_names(self):
# finds a list of template name where each successive template name is the base template
template_names = []
root_dirs = self.get_prefix_root_dirs()
template_name = self.template_name
merged_conf = {} # the configuration once all conf files are merged
while template_name is not None:
template_names.append(template_name)
conf = {}
found_at_least_one = False
for root_dir in root_dirs:
template_dir = os.path.join(root_dir, 'nbconvert', 'templates', template_name)
if os.path.exists(template_dir):
found_at_least_one = True
conf_file = os.path.join(template_dir, 'conf.json')
if os.path.exists(conf_file):
with open(conf_file) as f:
conf = recursive_update(json.load(f), conf)
if not found_at_least_one:
paths = "\n\t".join(root_dirs)

          raise ValueError('No template sub-directory with name %r found in the following paths:\n\t%s' % (template_name, paths))

E ValueError: No template sub-directory with name 'python' found in the following paths:
E /root/.local/share/jupyter
E /usr/local/share/jupyter
E /usr/share/jupyter
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:512: ValueError

abhi-g · 2020-01-15T01:53:57Z

I'd suggest that you try running the unittests locally on your deskop/laptop and that might help debugging. I see some failures in the unittests which shouldn;'t be happening. See logs of unittests at http://testing-argo.kubeflow.org/workflows/kubeflow-test-infra/kubeflow-fairing-presubmit-e2e-439-0ef38e1-9312-8e09?tab=workflow for the currently running test.

abhi-g · 2020-01-15T01:58:01Z

you should be able to run the unittests locally by cding to the kubeflow/fairing/tests dir and running the command pytest -vv --durations=10 unit/. In my case on the master branch all these tests run locally and pass without errors.

abhi-g · 2020-01-15T17:17:56Z

/test kubeflow-fairing-presubmit

abhi-g · 2020-01-15T17:22:32Z

kubeflow/fairing/frameworks/lightgbm_dist_training_init.py

+    format=constants.FAIRING_LOG_FORMAT,
+    datefmt=constants.FAIRING_LOG_DATEFMT,
+)
+logging.getLogger().setLevel(constants.FAIRING_LOG_LEVEL)


One of the things I noticed that setLevel generally takes a number value defined in log.py such as
DEBUG = 1
INFO = 2
WARN = 3
ERROR = 4
FATAL = 5

Whereas with this change, it seems like it becomes a string value. Could that be causing issues?

@xauthulei Is that possiable to set level in basicConfig? such as below?

logging.basicConfig(level=kubeflow.fairing.constants.FAIRING_LOG_LEVEL)

@abhi-g , That works in my local, even if the level is string value, :

llhu@huleis-mbp fairing % python3 Python 3.7.3 (default, Nov 15 2019, 04:04:52) [Clang 11.0.0 (clang-1100.0.33.16)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import logging >>> logging.getLogger().setLevel(logging.INFO) >>> logging.info("HI") INFO:root:HI >>> logging.getLogger().setLevel('ERROR') >>> logging.info("HI") >>> logging.critical("HI") CRITICAL:root:HI >>> logging.error("HI") ERROR:root:HI >>> logging.info("HI") >>>

jinchihe · 2020-01-16T07:10:07Z

@xauthulei That's strange, I executed unit tests using your branch, that's passed.

(python36) [root@jinchi1 fairing_leilei]# git clone https://github.com/xauthulei/fairing.git
Cloning into 'fairing'...
remote: Enumerating objects: 2, done.
remote: Counting objects: 100% (2/2), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3682 (delta 0), reused 2 (delta 0), pack-reused 3680
Receiving objects: 100% (3682/3682), 9.60 MiB | 9.71 MiB/s, done.
Resolving deltas: 100% (1617/1617), done.
(python36) [root@jinchi1 fairing_leilei]# cd fairing/
nothing to commit, working tree clean
(python36) [root@jinchi1 fairing]# git checkout correct_logging_message
Branch 'correct_logging_message' set up to track remote branch 'correct_logging_message' from 'origin'.
Switched to a new branch 'correct_logging_message'
(python36) [root@jinchi1 fairing]# git status
On branch correct_logging_message
Your branch is up to date with 'origin/correct_logging_message'.

nothing to commit, working tree clean

ow/python3/python36/lib/python3.6/site-packages/google/auth/_default.py:66
tests/unit/deployers/gcp/test_gcp.py::test_default_params
tests/unit/deployers/gcp/test_gcp.py::test_default_params
tests/unit/deployers/gcp/test_gcp.py::test_top_level_args
tests/unit/deployers/gcp/test_gcp.py::test_top_level_args

....
-- Docs: https://docs.pytest.org/en/latest/warnings.html
================================================= 52 passed, 92 warnings in 69.15s (0:01:09) =================================================

k8s-ci-robot · 2020-01-16T08:04:35Z

@xauthulei: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
kubeflow-fairing-presubmit	`e8703f2`	link	`/test kubeflow-fairing-presubmit`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

xauthulei · 2020-01-17T06:14:31Z

For eg. 1 of the failures..

____________________ test_overwrite_file_for_multiple_runs _____________________
def test_overwrite_file_for_multiple_runs():
preprocessor = ConvertNotebookPreprocessor(notebook_file=NOTEBOOK_PATH)
  files = preprocessor.preprocess()
preprocessors/test_converted_notebook_preprocessor.py:27:

../../kubeflow/fairing/preprocessors/converted_notebook.py:121: in preprocess
contents, _ = exporter.from_filename(self.notebook_file)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/exporter.py:179: in from_filename
return self.from_file(f, resources=resources, **kw)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/exporter.py:197: in from_file
return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:357: in from_notebook_node
output = self.template.render(nb=nb_copy, resources=resources)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:142: in template
self._template_cached = self._load_template()
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:328: in _load_template
return self.environment.get_template(template_file)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:154: in environment
self._environment_cached = self._create_environment()
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:436: in _create_environment
paths = self.get_template_paths()
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:465: in get_template_paths
template_names = self.get_template_names()

self = <nbconvert.exporters.python.PythonExporter object at 0x7fce2cc330b8>
def get_template_names(self):

finds a list of template name where each successive template name is the base template

template_names = []
root_dirs = self.get_prefix_root_dirs()
template_name = self.template_name
merged_conf = {} # the configuration once all conf files are merged
while template_name is not None:
template_names.append(template_name)
conf = {}
found_at_least_one = False
for root_dir in root_dirs:
template_dir = os.path.join(root_dir, 'nbconvert', 'templates', template_name)
if os.path.exists(template_dir):
found_at_least_one = True
conf_file = os.path.join(template_dir, 'conf.json')
if os.path.exists(conf_file):
with open(conf_file) as f:
conf = recursive_update(json.load(f), conf)
if not found_at_least_one:
paths = "\n\t".join(root_dirs)
          raise ValueError('No template sub-directory with name %r found in the following paths:\n\t%s' % (template_name, paths))
E ValueError: No template sub-directory with name 'python' found in the following paths:
E /root/.local/share/jupyter
E /usr/local/share/jupyter
E /usr/share/jupyter
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:512: ValueError

@abhi-g , I fond same error when doing nbconvert , in the testing env which we used , nbconvert-6.0.0a0-py3.6.egg is not included its latest fix. Even if I hard code the logging level, it still get the timeout error in the testing PR #447, Could review this

xauthulei · 2020-03-11T05:49:04Z

Closed this PR, I have used another PR #447( it have been merged), Thanks every one here for the review efforts.
/close

k8s-ci-robot · 2020-03-11T05:49:07Z

@xauthulei: Closed this PR.

In response to this:

Closed this PR, I have used another PR #447( it have been merged), Thanks every one here for the review efforts.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot requested review from jinchihe and karthikv2k January 8, 2020 06:43

k8s-ci-robot added the size/S label Jan 8, 2020

k8s-ci-robot added size/M and removed size/S labels Jan 8, 2020

k8s-ci-robot requested a review from abhi-g January 9, 2020 09:46

abhi-g assigned xauthulei Jan 14, 2020

abhi-g reviewed Jan 15, 2020

View reviewed changes

Enhance the runing time info in the debug model

e8703f2

k8s-ci-robot closed this Mar 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance the runing time info in the debug model #439

Enhance the runing time info in the debug model #439

xauthulei commented Jan 8, 2020 •

edited by jlewi

k8s-ci-robot commented Jan 8, 2020

xauthulei commented Jan 8, 2020

jinchihe commented Jan 8, 2020

jinchihe commented Jan 8, 2020

jinchihe commented Jan 8, 2020

xauthulei commented Jan 8, 2020

jinchihe commented Jan 9, 2020

xauthulei commented Jan 9, 2020

jinchihe commented Jan 9, 2020

jinchihe commented Jan 9, 2020

abhi-g commented Jan 9, 2020

abhi-g commented Jan 9, 2020

abhi-g commented Jan 10, 2020 via email

abhi-g commented Jan 10, 2020

abhi-g commented Jan 14, 2020

xauthulei commented Jan 14, 2020

xauthulei commented Jan 14, 2020

abhi-g commented Jan 14, 2020

abhi-g commented Jan 14, 2020

abhi-g commented Jan 14, 2020

abhi-g commented Jan 14, 2020

abhi-g commented Jan 15, 2020

abhi-g commented Jan 15, 2020

abhi-g commented Jan 15, 2020

abhi-g Jan 15, 2020

jinchihe Jan 16, 2020

xauthulei Jan 16, 2020

jinchihe commented Jan 16, 2020

k8s-ci-robot commented Jan 16, 2020

xauthulei commented Jan 17, 2020

finds a list of template name where each successive template name is the base template

xauthulei commented Mar 11, 2020

k8s-ci-robot commented Mar 11, 2020

Enhance the runing time info in the debug model #439

Enhance the runing time info in the debug model #439

Conversation

xauthulei commented Jan 8, 2020 • edited by jlewi

k8s-ci-robot commented Jan 8, 2020

xauthulei commented Jan 8, 2020

jinchihe commented Jan 8, 2020

jinchihe commented Jan 8, 2020

jinchihe commented Jan 8, 2020

xauthulei commented Jan 8, 2020

jinchihe commented Jan 9, 2020

xauthulei commented Jan 9, 2020

jinchihe commented Jan 9, 2020

jinchihe commented Jan 9, 2020

abhi-g commented Jan 9, 2020

abhi-g commented Jan 9, 2020

abhi-g commented Jan 10, 2020 via email

abhi-g commented Jan 10, 2020

abhi-g commented Jan 14, 2020

xauthulei commented Jan 14, 2020

xauthulei commented Jan 14, 2020

abhi-g commented Jan 14, 2020

abhi-g commented Jan 14, 2020

abhi-g commented Jan 14, 2020

abhi-g commented Jan 14, 2020

abhi-g commented Jan 15, 2020

abhi-g commented Jan 15, 2020

abhi-g commented Jan 15, 2020

abhi-g Jan 15, 2020

Choose a reason for hiding this comment

jinchihe Jan 16, 2020

Choose a reason for hiding this comment

xauthulei Jan 16, 2020

Choose a reason for hiding this comment

jinchihe commented Jan 16, 2020

k8s-ci-robot commented Jan 16, 2020

xauthulei commented Jan 17, 2020

finds a list of template name where each successive template name is the base template

xauthulei commented Mar 11, 2020

k8s-ci-robot commented Mar 11, 2020

xauthulei commented Jan 8, 2020 •

edited by jlewi