Skip to content

dvc pull: unsupported operand type(s) for +: 'builtin_function_or_method' and 'float' when pulling from HDFS #8642

@ghost

Description

Bug Report

Description

When pulling from HDFS, an unsupported operand type(s) for +: 'builtin_function_or_method' and 'float' occurs in if total and n >= (total + 0.5) of tqdm/std.py. The total turns out <built-in method size of pyarrow.lib.NativeFile object at 0x7f60a7f4c8d0>.

Here are some cues I found:

  1. In the function get_file of fspec/spec.py, the callback.set_size expects an integer or None, but the getattr(f1, "size", None) returns a builtin function. See details below.
# rpath is like "/user/abc/dvc/ae/851170d5f32eedd5d3f1d3d2af73c6.dir" in HDFS
with self.open(rpath, "rb", **kwargs) as f1: 
    callback.set_size(getattr(f1, "size", None)) # built-in method size of pyarrow.lib.NativeFile object
    data = True
    while data:
        data = f1.read(self.blocksize)
        segment_len = outfile.write(data)
        if segment_len is None:
            segment_len = len(data)
        callback.relative_update(segment_len)
  1. Then the progress_bar which is a Tqdm set its total as the builtin function. When the total+0.5 is executed in function format_meter, the error occurs.

Reproduce

# under directory A
1. dvc init
2. Copy dataset.zip to the directory
3. dvc add
4. dvc remote add -d dvc_hdfs hdfs://abc@cluster-szth/user/abc/dvc
5. dvc push
# under directory B of the same machine
1. git clone ...
2. dvc pull

Then we got an unsupported operand type(s) for +: 'builtin_function_or_method' and 'float'

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.34.2 (pip)
---------------------------------
Platform: Python 3.8.13 on Linux-5.4.0-124-generic-x86_64-with-glibc2.17
Subprojects:
	dvc_data = 0.25.3
	dvc_objects = 0.12.2
	dvc_render = 0.0.12
	dvc_task = 0.1.5
	dvclive = 1.0.1
	scmrepo = 0.1.3
Supports:
	hdfs (fsspec = 2022.11.0, pyarrow = 10.0.0),
	http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Caches: local
Remotes: hdfs
Workspace directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Repo: dvc, git
tqdm: None
self format 3: None

Additional Information (if any):

Metadata

Metadata

Assignees

No one assigned

    Labels

    fs: hdfsRelated to the HDFS filesystem

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions