## Concatenating Sequences

Often we run into situations where we need to concatenate two or more sequences.

We can certainly use the `+` operator to do this:

In [1]:
l1 = list(range(4))
l2 = list(range(4, 8))
l3 = list(range(8, 12))

combo = l1 + l2 + l3

for el in combo:
    print(el, end=" ")

0 1 2 3 4 5 6 7 8 9 10 11 

But this approach can be costly, and complexity is quadratic.

Instead, we can use the `chain` method from the `itertools` module (you may have guessed from my courses and other videos, `itertools` is one of my favorite standard library modules in Python!!).

Whereas concatenation is an O(n^2) operation, using `chain` is an O(n) operation.

In [2]:
from itertools import chain

In [3]:
for el in chain(l1, l2, l3):
    print(el, end=" ")

0 1 2 3 4 5 6 7 8 9 10 11 

In fact, `chain` is also a bit more versatile than concatenation, in that we can mix sequence types with chain - something we cannot do with concatenation:

In [4]:
l1 = [1, 2, 3]
t2 = (4, 5, 6)
s3 = "xyz"

try:
    l1 + t2 + s3
except TypeError as ex:
    print(f"TypeError: {ex}")

TypeError: can only concatenate list (not "tuple") to list


`chain` however can handle that no problem:

In [5]:
for el in chain(l1, t2, s3):
    print(el, end=" ")

1 2 3 4 5 6 x y z 

So, given the complexity difference, we should expect faster run times for the `chain` approach.

Let's try it out and see how that pans out for concatenating a large number of large lists.

In [6]:
from timeit import timeit

Let's create some large lists:

In [7]:
lists = [
    list(range(1_000))
    for _ in range(1_000)
]

Now, to concatenate these lists we're not going to use the `+` operator itself - instead, we are going to use the corresponding special function `__add__` that is used by the `+` operator.

In [8]:
lists[0].__add__(lists[1])

[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,


We need to do this for all the lists, and to do that we'll use the `reduce` function from the `functools` module, and the `add` function from the `operator` module (which calls `__add__`):

In [9]:
from functools import reduce
from operator import add

In [10]:
concatenated = reduce(add, lists)
print(len(concatenated))

1000000


In [11]:
def concatenate(lists):
    return reduce(add, lists)

In [12]:
def chained(lists):
    return list(chain(*lists))

In [13]:
timeit("concatenate(lists)", globals=globals(), number=1)

1.101773957999285

In [14]:
timeit("chained(lists)", globals=globals(), number=1)

0.006431040999814286

For the special case where we are dealing with character sequences (strings), we can also use the `join` method available for strings.

Let's see how it works:

In [15]:
l1 = "I'm a lumberjack"
l2 = "and I'm OK"
l3 = "I sleep all night"
l4 = "and I work all day"

We can concatenate these sequences this way:

In [16]:
''.join((l1, l2, l3, l4))

"I'm a lumberjackand I'm OKI sleep all nightand I work all day"

Let's time it and see how it performs compare to the other options.

In [17]:
import random
import string
from itertools import repeat

random.seed(0)
strings = [
    ''.join(random.choice(string.ascii_uppercase) for _ in repeat(None, 1_000))
    for _ in repeat(None, 1_000)
]

In [18]:
print(len(strings), len(strings[0]), strings[0][0:20])

1000 1000 MYNBIQPMZJPLSGQEJEYD


OK, so we now have a list of 1,000 strings, each containing 1,000 characters.

Let's write one more function to do the concatenation using `join`:

In [19]:
def joined(strings):
    return ''.join(strings)

Now, we can't really use chain in this context (unless we just want to iterate over the characters of the "combined" strings), since `chain` returns an iterator over each character, and we would have to join these characters into a string - and we can just join the strings directly. So we won't compare to `chain` for this particular case.

Let's make sure each function returns what we expect:

In [20]:
result_concat = concatenate(strings)
result_joined = joined(strings)

In [21]:
assert result_concat == result_joined

Now, let's time it:

In [22]:
timeit("concatenate(strings)", globals=globals(), number=10)

0.12401341700024204

In [23]:
timeit("joined(strings)", globals=globals(), number=10)

0.0005822090006404324

As you can see, timing for `join` is definitely faster than concatenation for this example.

A special case of string concatenation is when we have a pre-determined (and relatively small) number of strings we want to concatenate.

For example, suppose we want to join these two strings, with a space in between:

In [24]:
s1 = "Python"
s2 = "rocks!"

We could do this using several approaches:

In [25]:
s1 + " " + s2

'Python rocks!'

In [26]:
' '.join((s1, s2))  # note we have to create a sequence out of s1 and s2 - so there's overhead!

'Python rocks!'

In [27]:
'{} {}'.format(s1, s2)

'Python rocks!'

In [28]:
"%s %s" % (s1, s2)  # note we have to create a tuple out of s1 and s2 - so there's overhead!

'Python rocks!'

In [29]:
f"{s1} {s2}"

'Python rocks!'

Amd timing these may surprise you:

In [30]:
timeit("f'{s1} {s2}'", globals=globals(), number=10_000_000)

0.4672245000001567

In [31]:
timeit("s1 + ' ' + s2", globals=globals(), number=10_000_000)

0.5262187500002256

In [32]:
timeit("' '.join((s1, s2))", globals=globals(), number=10_000_000)

0.6376680829998804

> Yes, `join` is slower because we have the overhead of creating the tuple `(s1, s2)` - for a large number of strings, this overhead negates any performance gains we might have had using `join` over `+`.

In [33]:
timeit("'%s %s' % (s1, s2)", globals=globals(), number=10_000_000)

0.9125240409994149

> Again, we have the overhead of creating that tuple

In [34]:
timeit("'{} {}'.format(s1, s2)", globals=globals(), number=10_000_000)

1.0964794589999656

As you can see, f-strings is actually the most efficient approach here!

The usual caveat I give when I discuss optimizing your code - **do not optimize prematurely**.

Write your code in the most readable manner possible (without a total disregard for efficiency of course!) - but don't start optimizing your code and refactoring until you understand **where** your code is slow. In the above example, we saved about less than a second - but if your code takes 10 minutes to run, then shaving off one second might be meaningless (by itself). 

**First** identify the bottlenecks in your code, **then** optimize your code.