Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

join method for list and tuple #77395

Closed
JavierDehesa mannequin opened this issue Apr 3, 2018 · 10 comments
Closed

join method for list and tuple #77395

JavierDehesa mannequin opened this issue Apr 3, 2018 · 10 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@JavierDehesa
Copy link
Mannequin

JavierDehesa mannequin commented Apr 3, 2018

BPO 33214
Nosy @tiran, @merwok, @serhiy-storchaka, @MojoVampire, @Savier

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2018-04-03.14:33:53.200>
labels = ['3.8', 'type-feature', 'library']
title = 'join method for list and tuple'
updated_at = <Date 2019-09-16.10:04:49.392>
user = 'https://bugs.python.org/JavierDehesa'

bugs.python.org fields:

activity = <Date 2019-09-16.10:04:49.392>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2018-04-03.14:33:53.200>
creator = 'Javier Dehesa'
dependencies = []
files = []
hgrepos = []
issue_num = 33214
keywords = []
message_count = 9.0
messages = ['314881', '314882', '314883', '314885', '352387', '352530', '352531', '352532', '352534']
nosy_count = 6.0
nosy_names = ['christian.heimes', 'eric.araujo', 'serhiy.storchaka', 'josh.r', 'Javier Dehesa', 'iamsav']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue33214'
versions = ['Python 3.8']

@JavierDehesa
Copy link
Mannequin Author

JavierDehesa mannequin commented Apr 3, 2018

It is pretty trivial to concatenate a sequence of strings:

''.join([str1, str2, ...])

Concatenating a sequence of lists is for some reason significantly more convoluted. Some current options include:

    sum([lst1, lst2, ...], [])
    [x for y [lst1, lst2, ...] for x in y]
    list(itertools.chain(lst1, lst2, ...))

The first one being the less recomendable but more intuitive and the third one being the faster but most cumbersome (see https://stackoverflow.com/questions/49631326/why-is-itertools-chain-faster-than-a-flattening-list-comprehension ). None of these looks like "the one obvious way to do it" to me. Furthermore, I feel a dedicated concatenation method could be more efficient than any of these approaches.

If we accept that ''.join(...) is an intuitive idiom, why not provide the syntax:

[].join([lst1, lst2, ...])

And while we are at it:

().join([tpl1, tpl2, ...])

Like with str, these methods should only accept sequences of objects of their own class (e.g. we could do [].join(list(s) for s in seqs) if seqs contains lists, tuples and generators). The use case for non-empty joiners would probably be less frequent than for strings, but it also solves a problem that has no clean solution with the current tools. Here is what I would probably do to join a sequence of lists with [None, 'STOP', None]:

lsts = [lst1, lst2, ...]
joiner = [None, 'STOP', None]
lsts_joined = list(itertools.chain.from_iterable(lst + joiner for lst in lsts))[:-len(joiner)]

Which is awful and inefficient (I am not saying this is the best or only possible way to solve it, it is just what I, self-considered experienced Python developer, might write).

@JavierDehesa JavierDehesa mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 3, 2018
@tiran
Copy link
Member

tiran commented Apr 3, 2018

join() is a bad choice, because new developers will confusing list.join with str.join.

We could turn list.extend(iterable) into list.extend(*iterable). Or you could just use extend with a chain iterator:

>>> l = []
>>> l.extend(itertools.chain([1], [2], [3]))
>>> l
[1, 2, 3]

@tiran tiran added the 3.8 only security fixes label Apr 3, 2018
@JavierDehesa
Copy link
Mannequin Author

JavierDehesa mannequin commented Apr 3, 2018

Thanks Christian. I thought of join precisely because it performs conceptually the same function as with str, so the parallel between ''.join(), [].join() and ().join() looked more obvious. Also there is os.path.join and PurePath.joinpath, so the verb seemed well-established. As for shared method names, index and count are present both in sequences and str - although it is true that these do return the same kind of object in any cases.

I'm not saying your points aren't valid, though. Your proposed way with extend is I guess about the same as list(itertools.chain(...)), which could be considered to be enough. I just feel that is not particularly convenient, especially for newer developers, which will probably gravitate towards sum(...) more than itertools or a nested generator expression, but I may be wrong.

@serhiy-storchaka
Copy link
Member

String concatenation: f'{a}{b}{c}'
List concatenation: [*a, *b, *c]
Tuple concatenation: (*a, *b, *c)
Set union: {*a, *b, *c}
Dict merging: {**a, **b, **c}

@MojoVampire
Copy link
Mannequin

MojoVampire mannequin commented Sep 13, 2019

Note that all of Serhiy's examples are for a known, fixed number of things to concatenate/union/merge. str.join's API can be used for that by wrapping the arguments in an anonymous tuple/list, but it's more naturally for a variable number of things, and the unpacking generalizations haven't reached the point where:

[*seq for seq in allsequences]

is allowed.

    list(itertools.chain.from_iterable(allsequences))

handles that just fine, but I could definitely see it being convenient to be able to do:

[].join(allsequences)

That said, a big reason str provides .join is because it's not uncommon to want to join strings with a repeated separator, e.g.:

# For not-really-csv-but-people-do-it-anyway
','.join(row_strings)

# Separate words with spaces
' '.join(words)

# Separate lines with newlines
'\n'.join(lines)

I'm not seeing even one motivating use case for list.join/tuple.join that would actually join on a non-empty list or tuple ([None, 'STOP', None] being rather contrived). If that's not needed, it might make more sense to do this with an alternate constructor (a classmethod), e.g.:

    list.concat(allsequences)

which would avoid the cost of creating an otherwise unused empty list (the empty tuple is a singleton, so no cost is avoided there). It would also work equally well with both tuple and list (where making list.extend take varargs wouldn't help tuple, though it's a perfectly worthy idea on its own).

Personally, I don't find using itertools.chain (or its from_iterable alternate constructor) all that problematic (though I almost always import it with from itertools import chain to reduce the verbosity, especially when using chain.from_iterable). I think promoting itertools more is a good idea; right now, the notes on concatenation for sequence types mention str.join, bytes.join, and replacing tuple concatenation with a list that you call extend on, but doesn't mention itertools.chain at all, which seems like a failure to make the best solution the discoverable/obvious solution.

@Savier
Copy link
Mannequin

Savier mannequin commented Sep 16, 2019

in javascript join() is made the other way around
['1','2','3'].join(', ')
so, [].join() may confuse some peoples.

@tiran
Copy link
Member

tiran commented Sep 16, 2019

in javascript join() is made the other way around
['1','2','3'].join(', ')
so, [].join() may confuse some peoples.

It would be too confusing to have two different approaches to join strings in Python. Besides ECMAScript 1 came out in 1997, 5 years after Python was first released. By that argument JavaScript that should.

@serhiy-storchaka
Copy link
Member

How common is the case of variable number of things to concatenate/union/merge?

From my experience, in most ceases this looks like:

    result = []
    for ...:
        # many complex statements
        # may include continue and break
        result.extend(items) # may be intermixed with result.append(item)

So concatenating purely lists from some sequence is very special case. And there are several ways to perform it.

    result = []
    for items in seq:
        result.extend(items)
        # nothing wrong with this simple code, really

    result = [x for items in seq for x in items]
    # may be less effective for really long sublists,
    # but looks simple

    result = list(itertools.chain.from_iterable(items))
    # if you are itertools addictive ;-)

@serhiy-storchaka
Copy link
Member

It is history, but in 1997 Python had the same order of arguments as ECMAScript: string.join(words [, sep]). str.join() was added only in 1999 (226ae6c).

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@serhiy-storchaka
Copy link
Member

I think this idea has no future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants