Skip to content

Conversation

VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin VitalyFedyunin commented Jul 4, 2021

Stack from ghstack:

Differential Revision: D29588834

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 4, 2021

💊 CI failures summary and remediations

As of commit 15bfa2c (more details on the Dr. CI page and at hud.pytorch.org/pr/61235):


  • 3/3 failures possibly* introduced in this PR
    • 1/3 non-scanned failure(s)

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (1/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jul 07 01:07:08 Intel MKL ERROR: Parameter 5 was incorrect on entry to DLASCL.
Jul 07 01:07:08   if ((math.isinf(a) or math.isinf(b)) and a != b):
Jul 07 01:07:08 ok (0.059s)
Jul 07 01:07:08   test_cond_cpu_float32 (__main__.TestLinalgCPU) ... 
Jul 07 01:07:08 Intel MKL ERROR: Parameter 4 was incorrect on entry to DLASCL.
Jul 07 01:07:08 
Jul 07 01:07:08 Intel MKL ERROR: Parameter 5 was incorrect on entry to DLASCL.
Jul 07 01:07:08 ok (0.045s)
Jul 07 01:07:08   test_cond_cpu_float64 (__main__.TestLinalgCPU) ... 
Jul 07 01:07:08 Intel MKL ERROR: Parameter 4 was incorrect on entry to DLASCL.
Jul 07 01:07:08 
Jul 07 01:07:08 Intel MKL ERROR: Parameter 5 was incorrect on entry to DLASCL.
Jul 07 01:07:08 ok (0.038s)
Jul 07 01:07:08   test_cond_errors_and_warnings_cpu_complex128 (__main__.TestLinalgCPU) ... ok (0.067s)
Jul 07 01:07:08   test_cond_errors_and_warnings_cpu_complex64 (__main__.TestLinalgCPU) ... ok (0.066s)
Jul 07 01:07:08   test_cond_errors_and_warnings_cpu_float32 (__main__.TestLinalgCPU) ... ok (0.066s)
Jul 07 01:07:08   test_cond_errors_and_warnings_cpu_float64 (__main__.TestLinalgCPU) ... ok (0.066s)
Jul 07 01:07:08   test_cross_cpu_float32 (__main__.TestLinalgCPU) ... ok (0.004s)
Jul 07 01:07:08   test_cross_errors_cpu (__main__.TestLinalgCPU) ... ok (0.037s)
Jul 07 01:07:08   test_cross_with_and_without_dim_cpu_float32 (__main__.TestLinalgCPU) ... ok (0.003s)
Jul 07 01:07:08   test_det_cpu_complex128 (__main__.TestLinalgCPU) ... ok (0.026s)
Jul 07 01:07:08   test_det_cpu_float64 (__main__.TestLinalgCPU) ... ok (0.021s)

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test2 (2/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jul 07 00:29:56 SUMMARY: UndefinedBehaviorSanit.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in
Jul 07 00:29:56     #9 0x55752c72f8f2 in PyEval_EvalCode /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/ceval.c:731
Jul 07 00:29:56     #10 0x55752c797cd5 in run_mod /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:1025
Jul 07 00:29:56     #11 0x55752c799d5d in PyRun_StringFlags /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:949
Jul 07 00:29:56     #12 0x55752c799dbb in PyRun_SimpleStringFlags /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:445
Jul 07 00:29:56     #13 0x55752c79a926 in run_command /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Modules/main.c:301
Jul 07 00:29:56     #14 0x55752c79a926 in Py_Main /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Modules/main.c:749
Jul 07 00:29:56     #15 0x55752c6d4196 in main /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Programs/python.c:69
Jul 07 00:29:56     #16 0x7fe38a7f883f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291
Jul 07 00:29:56     #17 0x55752c76433d in _start (/opt/conda/bin/python3.6+0x1a733d)
Jul 07 00:29:56 
Jul 07 00:29:56 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in 
Jul 07 00:29:56 + retcode=1
Jul 07 00:29:56 + set -e
Jul 07 00:29:56 + return 1
Jul 07 00:29:56 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX-* ]]
Jul 07 00:29:56 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX2-* ]]
Jul 07 00:29:56 + '[' -n https://github.com/pytorch/pytorch/pull/61235 ']'
Jul 07 00:29:56 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 != *coverage* ]]
Jul 07 00:29:56 ++ mktemp
Jul 07 00:29:56 + DETERMINE_FROM=/tmp/tmp.u9watWAmNd
Jul 07 00:29:56 + file_diff_from_base /tmp/tmp.u9watWAmNd

1 job timed out:

  • pytorch_linux_xenial_py3_6_gcc5_4_test

ci.pytorch.org: 1 failed


Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@VitalyFedyunin VitalyFedyunin requested a review from ejguan July 5, 2021 16:55
Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

self.assertEqual(count, len(self.temp_files))


# TODO(VitalyFedyunin): Generates unclosed buffer warning, need to investigate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did look at this problem before, but cannot find a feasible solution.
Using tar as an example, we open the tar file here:

tar = tarfile.open(fileobj=cast(Optional[IO[bytes]], data_stream), mode="r:*")

and attach a reference of this tar file handle to each file within it to prevent source tar file handle is closed after yield here:
# Add a reference of the source tarfile into extracted_fobj, so the source
# tarfile handle won't be released until all the extracted file objs are destroyed.
extracted_fobj.source_ref = tar # type: ignore[attr-defined]

I added a PR #58938 to close each file explicitly after read. But, it would not close the reference of tar file handle. And python gc would close the tar file stream, which generates this warning.

captured_connections.append(obj)
return stub_unpickler, ()

# TODO(VitalyFedyunin): Better do it as `with` context for safety
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with comment. We need to also take care of the existing reduce_ex_hook before we set hook to capture the connected datapipes.

@VitalyFedyunin
Copy link
Contributor Author

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in f285788.

@facebook-github-bot facebook-github-bot deleted the gh/VitalyFedyunin/170/head branch July 16, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants