-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed as not planned
Closed as not planned
Copy link
Labels
fs: hdfsRelated to the HDFS filesystemRelated to the HDFS filesystem

Description
Bug Report
Description
When pulling from HDFS, an unsupported operand type(s) for +: 'builtin_function_or_method' and 'float'
occurs in if total and n >= (total + 0.5)
of tqdm/std.py
. The total
turns out <built-in method size of pyarrow.lib.NativeFile object at 0x7f60a7f4c8d0>
.
Here are some cues I found:
- In the function
get_file
offspec/spec.py
, thecallback.set_size
expects aninteger
orNone
, but thegetattr(f1, "size", None)
returns a builtin function. See details below.
# rpath is like "/user/abc/dvc/ae/851170d5f32eedd5d3f1d3d2af73c6.dir" in HDFS
with self.open(rpath, "rb", **kwargs) as f1:
callback.set_size(getattr(f1, "size", None)) # built-in method size of pyarrow.lib.NativeFile object
data = True
while data:
data = f1.read(self.blocksize)
segment_len = outfile.write(data)
if segment_len is None:
segment_len = len(data)
callback.relative_update(segment_len)
- Then the
progress_bar
which is aTqdm
set itstotal
as the builtin function. When thetotal+0.5
is executed in functionformat_meter
, the error occurs.
Reproduce
# under directory A
1. dvc init
2. Copy dataset.zip to the directory
3. dvc add
4. dvc remote add -d dvc_hdfs hdfs://abc@cluster-szth/user/abc/dvc
5. dvc push
# under directory B of the same machine
1. git clone ...
2. dvc pull
Then we got an unsupported operand type(s) for +: 'builtin_function_or_method' and 'float'
Environment information
Output of dvc doctor
:
$ dvc doctor
DVC version: 2.34.2 (pip)
---------------------------------
Platform: Python 3.8.13 on Linux-5.4.0-124-generic-x86_64-with-glibc2.17
Subprojects:
dvc_data = 0.25.3
dvc_objects = 0.12.2
dvc_render = 0.0.12
dvc_task = 0.1.5
dvclive = 1.0.1
scmrepo = 0.1.3
Supports:
hdfs (fsspec = 2022.11.0, pyarrow = 10.0.0),
http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Caches: local
Remotes: hdfs
Workspace directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Repo: dvc, git
tqdm: None
self format 3: None
Additional Information (if any):
Metadata
Metadata
Assignees
Labels
fs: hdfsRelated to the HDFS filesystemRelated to the HDFS filesystem