Skip to content
This repository was archived by the owner on Nov 13, 2024. It is now read-only.
This repository was archived by the owner on Nov 13, 2024. It is now read-only.

device_lib.list_local_devices() doesn't return in the CUDA build up to 2080 #5515

@Twenkid

Description

@Twenkid

Any batch script hangs, I traced it and it freezes in tensorflow when it calls device_lib.list_local_devices()
In: C:\DFL\DeepFaceLab_NVIDIA_up_to_RTX2080Ti_internal\DeepFaceLab\core\leras\device.py
GPU: Geforce 750 Ti
Win 10

import tensorflow as tf
from tensorflow.python.client import device_lib
print(f"list_local_devices()={device_lib.list_local_devices()}")

I tried several things: if there was an incompatibility with the installed newer CUDA, but it shouldn't as the build has its own directory and it's an old tensorflow 1.13. The paths are set by setenv.bat,, but in addition I added them in the system's Path, also I tried with copying the .dll files both in the .bat folder and in the main.py.

I've been using the DirectX12 version as an alternative. The GPU is 750 Ti and initially I thought that it was just too old, but I just discovered it's supposed to work as it supports newer CUDA versions. Also there's not an error message, but the call to "list_local_devices" doesn't return.

If I run setenv.bat and then I call the build's python, then import the tf. and call list_local_devices interactively, the function recognizes the GPU and prints a correct output, but then the CLI session hangs. The system has also an integrated Intel GPU HD530.

I understand that this seems to be a tensorflow or drivers' issue, but does anyone have solved it? Thanks.

c:\DFL\DeepFaceLab_NVIDIA_up_to_RTX2080Ti\_internal\python-3.6.8>python
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
c:\DFL\DeepFaceLab_NVIDIA_up_to_RTX2080Ti\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
c:\DFL\DeepFaceLab_NVIDIA_up_to_RTX2080Ti\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
c:\DFL\DeepFaceLab_NVIDIA_up_to_RTX2080Ti\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
c:\DFL\DeepFaceLab_NVIDIA_up_to_RTX2080Ti\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
c:\DFL\DeepFaceLab_NVIDIA_up_to_RTX2080Ti\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
c:\DFL\DeepFaceLab_NVIDIA_up_to_RTX2080Ti\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
>>>
>>> from tensorflow.python.client import device_lib
>>> print(f"list_local_devices()={device_lib.list_local_devices()}")
2022-05-09 22:11:18.429936: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2022-05-09 22:11:18.551876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.0845
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 194.50MiB
2022-05-09 22:11:18.552651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
  @staticmethod
    def _get_tf_devices_proc(q : multiprocessing.Queue):
        print("_get_tf_devices_proc")
        print(sys.platform[0:3])
        if sys.platform[0:3] == 'win':
            compute_cache_path = Path(os.environ['APPDATA']) / 'NVIDIA' / ('ComputeCache_ALL')
            os.environ['CUDA_CACHE_PATH'] = str(compute_cache_path)
            print("CUDA_CACHE_PATH={os.environ['CUDA_CACHE_PATH']}")
            
            if not compute_cache_path.exists():
                io.log_info("Caching GPU kernels...")
                compute_cache_path.mkdir(parents=True, exist_ok=True)
                
        import tensorflow
        
        tf_version = tensorflow.version.VERSION
        print(f"tf_version={tf_version}")
        #if tf_version is None:
        #    tf_version = tensorflow.version.GIT_VERSION
        if tf_version[0] == 'v':
            tf_version = tf_version[1:]
        if tf_version[0] == '2':
            tf = tensorflow.compat.v1
        else:
            tf = tensorflow
                    
        import logging
        # Disable tensorflow warnings
        tf_logger = logging.getLogger('tensorflow')
        tf_logger.setLevel(logging.ERROR)

        from tensorflow.python.client import device_lib
        print("AFTER: from tensorflow.python.client import device_lib")
        devices = []

        print(f"list_local_devices()={device_lib.list_local_devices()}")  ### HANGS HERE ###
        
        physical_devices = device_lib.list_local_devices()
        physical_devices_f = {}
        print("BEFORE: for dev in physical_devices:")
```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions