Skip to content
This repository has been archived by the owner on Aug 17, 2023. It is now read-only.

Enhance the runing time info in the debug model #439

Closed
wants to merge 1 commit into from
Closed

Enhance the runing time info in the debug model #439

wants to merge 1 commit into from

Conversation

xauthulei
Copy link
Member

@xauthulei xauthulei commented Jan 8, 2020

Our current debug info is so simple,especially, the confuse time stamp, also don't show the target debug file's location ,
[I 200108 13:36:53 config:131] Using preprocessor: <kubeflow.fairing.preprocessors.base.BasePreProcessor object at 0x1017af9b0>
after changed, it will be more meaningful, for example :
INFO|2020-01-08 13:18:33|/Users/llhu/Library/Python/3.7/lib/python/site-packages/werkzeug/_internal.py|122| * Running on http://127.0.0.1:8080/


This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign karthikv2k
You can assign the PR to them by writing /assign @karthikv2k in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@xauthulei
Copy link
Member Author

/cc @jinchihe , Could you review this enhance, many thanks.

@jinchihe
Copy link
Member

jinchihe commented Jan 8, 2020

Thanks @xauthulei for the contribution.

I'm thinking if need to feature that's set the LOGLEVEL from env variable, and we can define the default value in the constants, as below, so user can set the log level from env if need debug.

FAIRING_LOGLEVEL = os.environ.get('FAIRING_LOGLEVEL', 'INFO').upper()

What do you think?

@jinchihe
Copy link
Member

jinchihe commented Jan 8, 2020

Seems the cluster is removed,

Step #2: ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=404, message=Not found: projects/kubeflow-ci/zones/us-cen
tral1-a/clusters/kubeflow-ci-fairing.
Step #2: No cluster named 'kubeflow-ci-fairing' in kubeflow-ci.

Checked, no kubeflow-ci-fairing cluster. I will take a look this, thanks.

[root@jinchi1 ~]# gcloud --project=kubeflow-ci container clusters list
NAME              LOCATION       MASTER_VERSION  MASTER_IP       MACHINE_TYPE   NODE_VERSION     NUM_NODES  STATUS
fairing-ci-0-7    us-central1-a  1.14.7-gke.23   35.224.29.239   n1-standard-8  1.14.6-gke.2 *   2          RUNNING
kf-vmaster-n00    us-east1-b     1.12.10-gke.17  104.196.52.228  n1-standard-8  1.11.8-gke.5 *   2          RUNNING
kubeflow-testing  us-east1-d     1.12.10-gke.17  35.196.213.148  n1-standard-8  1.9.3-gke.0 ***  10         RUNNING

@jinchihe
Copy link
Member

jinchihe commented Jan 8, 2020

@xauthulei the cluster has been changed to fairing-ci-0-7, why still using kubeflow-ci-fairing? are you using the last code? could you please try to rebase and try again? Thanks.

@k8s-ci-robot k8s-ci-robot added size/M and removed size/S labels Jan 8, 2020
@xauthulei
Copy link
Member Author

Thanks @xauthulei for the contribution.

I'm thinking if need to feature that's set the LOGLEVEL from env variable, and we can define the default value in the constants, as below, so user can set the log level from env if need debug.

FAIRING_LOGLEVEL = os.environ.get('FAIRING_LOGLEVEL', 'INFO').upper()

What do you think?

Thanks @jinchihe , as your suggestion, I have moved the logging module setting into the constants.py and also enable the use to self-defined the log level.

@jinchihe
Copy link
Member

jinchihe commented Jan 9, 2020

Great, thanks @xauthulei
/retest

@xauthulei
Copy link
Member Author

The current error log output is confuse to me, Is there some wrong from my part. Thanks @jinchihe , I am sorry for this.

@jinchihe
Copy link
Member

jinchihe commented Jan 9, 2020

@xauthulei I think that should be cuased by test env problem.
/retest

@jinchihe
Copy link
Member

jinchihe commented Jan 9, 2020

The CI test hangs again...
/cc @abhi-g

@abhi-g
Copy link
Member

abhi-g commented Jan 9, 2020

/test kubeflow-fairing-presubmit

1 similar comment
@abhi-g
Copy link
Member

abhi-g commented Jan 9, 2020

/test kubeflow-fairing-presubmit

@abhi-g
Copy link
Member

abhi-g commented Jan 10, 2020 via email

@abhi-g
Copy link
Member

abhi-g commented Jan 10, 2020

/retest

@abhi-g
Copy link
Member

abhi-g commented Jan 14, 2020

@xauthulei please merge upstream/master into your branch to pick up CI test fixes

@xauthulei
Copy link
Member Author

/retest

@xauthulei
Copy link
Member Author

@xauthulei please merge upstream/master into your branch to pick up CI test fixes

@abhi-g , Thanks for your great efforts, I have refreshed my local, but it seems fail on other parts, could you check it again.

@abhi-g
Copy link
Member

abhi-g commented Jan 14, 2020

/retest

@abhi-g
Copy link
Member

abhi-g commented Jan 14, 2020

For some reason your tests are failing with errors. Such as when running pytorch jobs, the master fails with:
2020-01-14 05:46:52.053 PST
Traceback (most recent call last):
File "/app/function_shim.py", line 1, in
from kubeflow.fairing.constants import constants
ModuleNotFoundError: No module named 'kubeflow'
Hide log summary
Expand nested fields
Copy to clipboard
{
textPayload: "Traceback (most recent call last): File "/app/function_shim.py", line 1, in from kubeflow.fairing.constants import constants ModuleNotFoundError: No module named 'kubeflow' "
insertId: "99pc2kug3ujxoepyn"
resource: {2}
timestamp: "2020-01-14T13:46:52.053482016Z"
severity: "ERROR"
labels: {10}
logName: "projects/kubeflow-ci/logs/stderr"
receiveTimestamp: "2020-01-14T13:47:03.611855267Z"
}

@abhi-g
Copy link
Member

abhi-g commented Jan 14, 2020

Also, there are failure errors in the unittests which you should be able to run locally and verify the errors as well.

@abhi-g
Copy link
Member

abhi-g commented Jan 14, 2020

For eg. 1 of the failures..

____________________ test_overwrite_file_for_multiple_runs _____________________
def test_overwrite_file_for_multiple_runs():
preprocessor = ConvertNotebookPreprocessor(notebook_file=NOTEBOOK_PATH)

  files = preprocessor.preprocess()

preprocessors/test_converted_notebook_preprocessor.py:27:


../../kubeflow/fairing/preprocessors/converted_notebook.py:121: in preprocess
contents, _ = exporter.from_filename(self.notebook_file)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/exporter.py:179: in from_filename
return self.from_file(f, resources=resources, **kw)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/exporter.py:197: in from_file
return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:357: in from_notebook_node
output = self.template.render(nb=nb_copy, resources=resources)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:142: in template
self._template_cached = self._load_template()
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:328: in _load_template
return self.environment.get_template(template_file)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:154: in environment
self._environment_cached = self._create_environment()
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:436: in _create_environment
paths = self.get_template_paths()
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:465: in get_template_paths
template_names = self.get_template_names()


self = <nbconvert.exporters.python.PythonExporter object at 0x7fce2cc330b8>
def get_template_names(self):
# finds a list of template name where each successive template name is the base template
template_names = []
root_dirs = self.get_prefix_root_dirs()
template_name = self.template_name
merged_conf = {} # the configuration once all conf files are merged
while template_name is not None:
template_names.append(template_name)
conf = {}
found_at_least_one = False
for root_dir in root_dirs:
template_dir = os.path.join(root_dir, 'nbconvert', 'templates', template_name)
if os.path.exists(template_dir):
found_at_least_one = True
conf_file = os.path.join(template_dir, 'conf.json')
if os.path.exists(conf_file):
with open(conf_file) as f:
conf = recursive_update(json.load(f), conf)
if not found_at_least_one:
paths = "\n\t".join(root_dirs)

          raise ValueError('No template sub-directory with name %r found in the following paths:\n\t%s' % (template_name, paths))

E ValueError: No template sub-directory with name 'python' found in the following paths:
E /root/.local/share/jupyter
E /usr/local/share/jupyter
E /usr/share/jupyter
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:512: ValueError

@abhi-g
Copy link
Member

abhi-g commented Jan 15, 2020

I'd suggest that you try running the unittests locally on your deskop/laptop and that might help debugging. I see some failures in the unittests which shouldn;'t be happening. See logs of unittests at http://testing-argo.kubeflow.org/workflows/kubeflow-test-infra/kubeflow-fairing-presubmit-e2e-439-0ef38e1-9312-8e09?tab=workflow for the currently running test.

@abhi-g
Copy link
Member

abhi-g commented Jan 15, 2020

you should be able to run the unittests locally by cding to the kubeflow/fairing/tests dir and running the command pytest -vv --durations=10 unit/. In my case on the master branch all these tests run locally and pass without errors.

@abhi-g
Copy link
Member

abhi-g commented Jan 15, 2020

/test kubeflow-fairing-presubmit

format=constants.FAIRING_LOG_FORMAT,
datefmt=constants.FAIRING_LOG_DATEFMT,
)
logging.getLogger().setLevel(constants.FAIRING_LOG_LEVEL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things I noticed that setLevel generally takes a number value defined in log.py such as
DEBUG = 1
INFO = 2
WARN = 3
ERROR = 4
FATAL = 5

Whereas with this change, it seems like it becomes a string value. Could that be causing issues?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xauthulei Is that possiable to set level in basicConfig? such as below?

logging.basicConfig(level=kubeflow.fairing.constants.FAIRING_LOG_LEVEL)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhi-g , That works in my local, even if the level is string value, :

llhu@huleis-mbp fairing % python3 
Python 3.7.3 (default, Nov 15 2019, 04:04:52) 
[Clang 11.0.0 (clang-1100.0.33.16)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logging.getLogger().setLevel(logging.INFO)
>>> logging.info("HI")
INFO:root:HI
>>> logging.getLogger().setLevel('ERROR')
>>> logging.info("HI")
>>> logging.critical("HI")
CRITICAL:root:HI
>>> logging.error("HI")
ERROR:root:HI
>>> logging.info("HI")
>>> 

@jinchihe
Copy link
Member

@xauthulei That's strange, I executed unit tests using your branch, that's passed.

(python36) [root@jinchi1 fairing_leilei]# git clone https://github.com/xauthulei/fairing.git
Cloning into 'fairing'...
remote: Enumerating objects: 2, done.
remote: Counting objects: 100% (2/2), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3682 (delta 0), reused 2 (delta 0), pack-reused 3680
Receiving objects: 100% (3682/3682), 9.60 MiB | 9.71 MiB/s, done.
Resolving deltas: 100% (1617/1617), done.
(python36) [root@jinchi1 fairing_leilei]# cd fairing/
nothing to commit, working tree clean
(python36) [root@jinchi1 fairing]# git checkout correct_logging_message
Branch 'correct_logging_message' set up to track remote branch 'correct_logging_message' from 'origin'.
Switched to a new branch 'correct_logging_message'
(python36) [root@jinchi1 fairing]# git status
On branch correct_logging_message
Your branch is up to date with 'origin/correct_logging_message'.

nothing to commit, working tree clean

ow/python3/python36/lib/python3.6/site-packages/google/auth/_default.py:66
tests/unit/deployers/gcp/test_gcp.py::test_default_params
tests/unit/deployers/gcp/test_gcp.py::test_default_params
tests/unit/deployers/gcp/test_gcp.py::test_top_level_args
tests/unit/deployers/gcp/test_gcp.py::test_top_level_args

....
-- Docs: https://docs.pytest.org/en/latest/warnings.html
================================================= 52 passed, 92 warnings in 69.15s (0:01:09) =================================================

@k8s-ci-robot
Copy link
Contributor

@xauthulei: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
kubeflow-fairing-presubmit e8703f2 link /test kubeflow-fairing-presubmit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@xauthulei
Copy link
Member Author

For eg. 1 of the failures..

____________________ test_overwrite_file_for_multiple_runs _____________________
def test_overwrite_file_for_multiple_runs():
preprocessor = ConvertNotebookPreprocessor(notebook_file=NOTEBOOK_PATH)

  files = preprocessor.preprocess()

preprocessors/test_converted_notebook_preprocessor.py:27:

../../kubeflow/fairing/preprocessors/converted_notebook.py:121: in preprocess
contents, _ = exporter.from_filename(self.notebook_file)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/exporter.py:179: in from_filename
return self.from_file(f, resources=resources, **kw)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/exporter.py:197: in from_file
return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:357: in from_notebook_node
output = self.template.render(nb=nb_copy, resources=resources)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:142: in template
self._template_cached = self._load_template()
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:328: in _load_template
return self.environment.get_template(template_file)
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:154: in environment
self._environment_cached = self._create_environment()
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:436: in _create_environment
paths = self.get_template_paths()
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:465: in get_template_paths
template_names = self.get_template_names()

self = <nbconvert.exporters.python.PythonExporter object at 0x7fce2cc330b8>
def get_template_names(self):

finds a list of template name where each successive template name is the base template

template_names = []
root_dirs = self.get_prefix_root_dirs()
template_name = self.template_name
merged_conf = {} # the configuration once all conf files are merged
while template_name is not None:
template_names.append(template_name)
conf = {}
found_at_least_one = False
for root_dir in root_dirs:
template_dir = os.path.join(root_dir, 'nbconvert', 'templates', template_name)
if os.path.exists(template_dir):
found_at_least_one = True
conf_file = os.path.join(template_dir, 'conf.json')
if os.path.exists(conf_file):
with open(conf_file) as f:
conf = recursive_update(json.load(f), conf)
if not found_at_least_one:
paths = "\n\t".join(root_dirs)

          raise ValueError('No template sub-directory with name %r found in the following paths:\n\t%s' % (template_name, paths))

E ValueError: No template sub-directory with name 'python' found in the following paths:
E /root/.local/share/jupyter
E /usr/local/share/jupyter
E /usr/share/jupyter
/usr/local/lib/python3.6/site-packages/nbconvert-6.0.0a0-py3.6.egg/nbconvert/exporters/templateexporter.py:512: ValueError

@abhi-g , I fond same error when doing nbconvert , in the testing env which we used , nbconvert-6.0.0a0-py3.6.egg is not included its latest fix. Even if I hard code the logging level, it still get the timeout error in the testing PR #447, Could review this

@xauthulei
Copy link
Member Author

Closed this PR, I have used another PR #447( it have been merged), Thanks every one here for the review efforts.
/close

@k8s-ci-robot
Copy link
Contributor

@xauthulei: Closed this PR.

In response to this:

Closed this PR, I have used another PR #447( it have been merged), Thanks every one here for the review efforts.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants