Bug report
Bug description:
Bug report
Bug description
Hi ,I found a minor bug in Lib/tarfile.py, _proc_gnulong strips trailing slashes from directory
names using removesuffix("/"), with a comment stating this is "to be
consistent with frombuf()". However, frombuf() (and _proc_builtin) use
rstrip("/"). The two are not equivalent:
rstrip("/") removes all trailing slashes.
removesuffix("/") removes at most one trailing slash.
For tar archives in GNU_FORMAT, a directory entry whose name is long enough
to require a GNU long-name header AND ends with multiple slashes is normalized
differently from a short directory entry with the same trailing-slash pattern.
This contradicts the stated intent of the comment.
Code locations (line numbers from current main):
# Lib/tarfile.py — frombuf
if obj.isdir():
obj.name = obj.name.rstrip("/")
# Lib/tarfile.py — _proc_builtin
if self.isdir():
self.name = self.name.rstrip("/")
# Lib/tarfile.py — _proc_gnulong (GNU long-name path) ← inconsistent
# Remove redundant slashes from directories. This is to be consistent
# with frombuf().
if next.isdir():
next.name = next.name.removesuffix("/")
Reproducer
import io, tarfile
buf = io.BytesIO()
with tarfile.open(fileobj=buf, mode="w", format=tarfile.GNU_FORMAT) as tf:
# Long name (>100 chars) forces a GNU long-name header.
long_name = ("d" * 150) + "//"
info = tarfile.TarInfo(name=long_name)
info.type = tarfile.DIRTYPE
tf.addfile(info)
short_name = "shortdir//"
info2 = tarfile.TarInfo(name=short_name)
info2.type = tarfile.DIRTYPE
tf.addfile(info2)
buf.seek(0)
with tarfile.open(fileobj=buf, mode="r") as tf:
for m in tf.getmembers():
print(repr(m.name))
Actual output:
'dddd…dd/' # 151 chars: only one slash stripped (via _proc_gnulong)
'shortdir' # 8 chars: all slashes stripped (via frombuf / _proc_builtin)
Expected: both entries should be normalized identically (rstrip semantics,
matching what the comment claims).
Suggested fix
Change removesuffix("/") to rstrip("/") in _proc_gnulong, matching the
comment's stated intent and the behavior of frombuf / _proc_builtin.
A regression test using a long directory name with multiple trailing slashes
should be added to Lib/test/test_tarfile.py.
I'd be happy to submit a PR.
CPython versions tested on:
3.14
Operating systems tested on:
Windows
Linked PRs
Bug report
Bug description:
Bug report
Bug description
Hi ,I found a minor bug in
Lib/tarfile.py,_proc_gnulongstrips trailing slashes from directorynames using
removesuffix("/"), with a comment stating this is "to beconsistent with
frombuf()". However,frombuf()(and_proc_builtin) userstrip("/"). The two are not equivalent:rstrip("/")removes all trailing slashes.removesuffix("/")removes at most one trailing slash.For tar archives in
GNU_FORMAT, a directory entry whose name is long enoughto require a GNU long-name header AND ends with multiple slashes is normalized
differently from a short directory entry with the same trailing-slash pattern.
This contradicts the stated intent of the comment.
Code locations (line numbers from current
main):Reproducer
Actual output:
Expected: both entries should be normalized identically (
rstripsemantics,matching what the comment claims).
Suggested fix
Change
removesuffix("/")torstrip("/")in_proc_gnulong, matching thecomment's stated intent and the behavior of
frombuf/_proc_builtin.A regression test using a long directory name with multiple trailing slashes
should be added to
Lib/test/test_tarfile.py.I'd be happy to submit a PR.
CPython versions tested on:
3.14
Operating systems tested on:
Windows
Linked PRs