Skip to content
This repository has been archived by the owner on Jul 17, 2023. It is now read-only.

Error while executing python code on devcloud: killed python file #34

Closed
sparshgup opened this issue Jul 26, 2021 · 0 comments
Closed

Comments

@sparshgup
Copy link

sparshgup commented Jul 26, 2021

I'm trying to execute a python file on devcloud. The job script job.sh is as follows:

#!/bin/bash
source /opt/intel/inteloneapi/setvars.sh > /dev/null 2>&1
python master.py

I am assigning it using the command on Mac terminal:

qsub -l nodes=1:xeon:batch:ppn=2 -d . job.sh

The job ran for something around 3 hours and produced 2 output files: job.sh.e934264 & job.sh.o934264

The job.sh.e934264 file is as follows:

2021-07-26 03:49:45.014693: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /glob/development-tools/versions/oneapi/2021.3/inteloneapi/vpl/2021.4.0/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/tbb/2021.3.0/env/../lib/intel64/gcc4.8:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/rkcommon/1.6.1/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/ospray_studio/0.7.0/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/ospray/2.6.0/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/openvkl/0.13.0/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/oidn/1.4.0/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/mpi/2021.2.0//libfabric/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/mpi/2021.2.0//lib/release:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/mpi/2021.2.0//lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/mkl/2021.3.0/lib/intel64:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/itac/2021.3.0/slib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/ipp/2021.3.0/lib/intel64:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/ippcp/2021.3.0/lib/intel64:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/ipp/2021.3.0/lib/intel64:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/embree/3.13.0/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/dnnl/2021.3.0/cpu_dpcpp_gpu_dpcpp/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/debugger/10.1.2/gdb/intel64/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/debugger/10.1.2/libipt/intel64/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/debugger/10.1.2/dep/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/dal/2021.3.0/lib/intel64:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/x64:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/emu:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/oclfpga/host/linux64/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/oclfpga/linux64/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/compiler/lib/intel64_lin:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/ccl/2021.3.0/lib/cpu_gpu_dpcpp
2021-07-26 03:49:45.014777: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-07-26 03:49:50.062319: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /glob/development-tools/versions/oneapi/2021.3/inteloneapi/vpl/2021.4.0/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/tbb/2021.3.0/env/../lib/intel64/gcc4.8:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/rkcommon/1.6.1/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/ospray_studio/0.7.0/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/ospray/2.6.0/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/openvkl/0.13.0/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/oidn/1.4.0/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/mpi/2021.2.0//libfabric/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/mpi/2021.2.0//lib/release:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/mpi/2021.2.0//lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/mkl/2021.3.0/lib/intel64:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/itac/2021.3.0/slib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/ipp/2021.3.0/lib/intel64:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/ippcp/2021.3.0/lib/intel64:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/ipp/2021.3.0/lib/intel64:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/embree/3.13.0/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/dnnl/2021.3.0/cpu_dpcpp_gpu_dpcpp/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/debugger/10.1.2/gdb/intel64/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/debugger/10.1.2/libipt/intel64/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/debugger/10.1.2/dep/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/dal/2021.3.0/lib/intel64:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/x64:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/emu:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/oclfpga/host/linux64/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/oclfpga/linux64/lib:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/compiler/lib/intel64_lin:/glob/development-tools/versions/oneapi/2021.3/inteloneapi/ccl/2021.3.0/lib/cpu_gpu_dpcpp
2021-07-26 03:49:50.062403: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-07-26 03:49:50.062449: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (s001-n061): /proc/driver/nvidia/version does not exist
2021-07-26 03:49:50.062948: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-26 03:52:31.660446: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-07-26 03:52:31.679568: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 3400000000 Hz /var/spool/torque/mom_priv/jobs/934264.v-qsvr-1.aidevcloud.SC: line 4: 110188 Killed python master.py

`

job.sh.o934264 is:

`

########################################################################
Date: Mon 26 Jul 2021 03:49:38 AM PDT
Job ID: 934264.v-qsvr-1.aidevcloud
User: u65358
Resources: neednodes=1:xeon:batch:ppn=2,nodes=1:xeon:batch:ppn=2,walltime=06:00:00
########################################################################

########################################################################
End of output for job 934264.v-qsvr-1.aidevcloud
Date: Mon 26 Jul 2021 06:52:21 AM PDT
########################################################################

`

The desired output and code weren't produced and I am facing this issue/error. Can someone please help me with this? Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant