# Concatenate and Split Operations

**Module 02 | Notebook 02**

---

## Objective
By the end of this notebook, you will master:
- Concatenating arrays along different axes
- Splitting arrays into multiple parts
- Append and insert operations
- Practical use cases for joining/splitting data

In [2]:
import numpy as np
np.set_printoptions(precision=2)

---
## 1. np.concatenate() - Join Arrays Along Axis

In [3]:
# 1D concatenation
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.array([7, 8, 9])

result = np.concatenate([a, b, c])
print(f"Concatenated 1D: {result}")

Concatenated 1D: [1 2 3 4 5 6 7 8 9]


In [4]:
# 2D concatenation - axis=0 (vertical/row-wise)
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])

print(f"Array 1:\n{arr1}")
print(f"Array 2:\n{arr2}")

vertical = np.concatenate([arr1, arr2], axis=0)
print(f"Concatenate axis=0 (vertical):\n{vertical}")

Array 1:
[[1 2]
 [3 4]]
Array 2:
[[5 6]
 [7 8]]
Concatenate axis=0 (vertical):
[[1 2]
 [3 4]
 [5 6]
 [7 8]]


In [5]:
# 2D concatenation - axis=1 (horizontal/column-wise)
horizontal = np.concatenate([arr1, arr2], axis=1)
print(f"Concatenate axis=1 (horizontal):\n{horizontal}")

Concatenate axis=1 (horizontal):
[[1 2 5 6]
 [3 4 7 8]]


In [6]:
# Shape requirements: All dimensions except concat axis must match
a = np.zeros((2, 3))
b = np.zeros((2, 4))

# This works (axis=1, different columns)
result = np.concatenate([a, b], axis=1)
print(f"Shape: {result.shape}")  # (2, 7)

# This fails (axis=0, different columns)
try:
    np.concatenate([a, b], axis=0)
except ValueError as e:
    print(f"Error: {e}")

Shape: (2, 7)
Error: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 4


---
## 2. np.append() - Append Values

In [7]:
# 1D append
arr = np.array([1, 2, 3])

# Append single value
result = np.append(arr, 4)
print(f"Append 4: {result}")

# Append multiple values
result = np.append(arr, [4, 5, 6])
print(f"Append [4,5,6]: {result}")

Append 4: [1 2 3 4]
Append [4,5,6]: [1 2 3 4 5 6]


In [8]:
# 2D append WITHOUT axis - flattens!
arr = np.array([[1, 2], [3, 4]])
result = np.append(arr, [[5, 6]])  # No axis specified
print(f"Append without axis (flattened): {result}")

Append without axis (flattened): [1 2 3 4 5 6]


In [9]:
# 2D append WITH axis
arr = np.array([[1, 2], [3, 4]])

# Append row (axis=0)
result = np.append(arr, [[5, 6]], axis=0)
print(f"Append row:\n{result}")

# Append column (axis=1)
result = np.append(arr, [[5], [6]], axis=1)
print(f"Append column:\n{result}")

Append row:
[[1 2]
 [3 4]
 [5 6]]
Append column:
[[1 2 5]
 [3 4 6]]


### Warning: np.append is inefficient in loops!

In [10]:
# BAD: O(n^2) complexity
# result = np.array([])
# for i in range(1000):
#     result = np.append(result, i)  # Creates new array each time!

# GOOD: Pre-allocate or use list then convert
result = np.arange(1000)  # Pre-allocated

# Or use list first
data = []
for i in range(1000):
    data.append(i)
result = np.array(data)

---
## 3. np.insert() and np.delete()

In [11]:
# np.insert - insert values at index
arr = np.array([1, 2, 3, 4, 5])

# Insert single value
result = np.insert(arr, 2, 99)  # Insert 99 at index 2
print(f"Insert 99 at index 2: {result}")

# Insert multiple values
result = np.insert(arr, 2, [99, 100])
print(f"Insert [99,100] at index 2: {result}")

Insert 99 at index 2: [ 1  2 99  3  4  5]
Insert [99,100] at index 2: [  1   2  99 100   3   4   5]


In [12]:
# 2D insert
arr = np.array([[1, 2], [3, 4], [5, 6]])
print(f"Original:\n{arr}")

# Insert row at index 1
result = np.insert(arr, 1, [99, 99], axis=0)
print(f"Insert row at index 1:\n{result}")

# Insert column at index 1
result = np.insert(arr, 1, [99, 99, 99], axis=1)
print(f"Insert column at index 1:\n{result}")

Original:
[[1 2]
 [3 4]
 [5 6]]
Insert row at index 1:
[[ 1  2]
 [99 99]
 [ 3  4]
 [ 5  6]]
Insert column at index 1:
[[ 1 99  2]
 [ 3 99  4]
 [ 5 99  6]]


In [13]:
# np.delete - remove elements
arr = np.array([1, 2, 3, 4, 5])

# Delete single index
result = np.delete(arr, 2)  # Remove index 2
print(f"Delete index 2: {result}")

# Delete multiple indices
result = np.delete(arr, [0, 4])  # Remove first and last
print(f"Delete indices 0 and 4: {result}")

Delete index 2: [1 2 4 5]
Delete indices 0 and 4: [2 3 4]


In [14]:
# 2D delete
arr = np.arange(12).reshape(3, 4)
print(f"Original:\n{arr}")

# Delete row
result = np.delete(arr, 1, axis=0)
print(f"Delete row 1:\n{result}")

# Delete column
result = np.delete(arr, 2, axis=1)
print(f"Delete column 2:\n{result}")

Original:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Delete row 1:
[[ 0  1  2  3]
 [ 8  9 10 11]]
Delete column 2:
[[ 0  1  3]
 [ 4  5  7]
 [ 8  9 11]]


---
## 4. np.split() - Split Array Into Parts

In [15]:
# Split into equal parts
arr = np.arange(12)
print(f"Original: {arr}")

# Split into 3 equal parts
parts = np.split(arr, 3)
for i, part in enumerate(parts):
    print(f"Part {i}: {part}")

Original: [ 0  1  2  3  4  5  6  7  8  9 10 11]
Part 0: [0 1 2 3]
Part 1: [4 5 6 7]
Part 2: [ 8  9 10 11]


In [16]:
# Split at specific indices
arr = np.arange(10)
print(f"Original: {arr}")

# Split at indices 2 and 7
parts = np.split(arr, [2, 7])
for i, part in enumerate(parts):
    print(f"Part {i}: {part}")
# Result: [0,1], [2,3,4,5,6], [7,8,9]

Original: [0 1 2 3 4 5 6 7 8 9]
Part 0: [0 1]
Part 1: [2 3 4 5 6]
Part 2: [7 8 9]


In [17]:
# 2D split
arr = np.arange(16).reshape(4, 4)
print(f"Original:\n{arr}")

# Split into 2 parts along axis=0 (rows)
top, bottom = np.split(arr, 2, axis=0)
print(f"Top half:\n{top}")
print(f"Bottom half:\n{bottom}")

Original:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
Top half:
[[0 1 2 3]
 [4 5 6 7]]
Bottom half:
[[ 8  9 10 11]
 [12 13 14 15]]


In [18]:
# Split along columns
left, right = np.split(arr, 2, axis=1)
print(f"Left half:\n{left}")
print(f"Right half:\n{right}")

Left half:
[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
Right half:
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


---
## 5. np.array_split() - Unequal Splits

In [19]:
# np.split requires equal division, array_split doesn't
arr = np.arange(10)
print(f"Original: {arr}")

# Split 10 elements into 3 parts
parts = np.array_split(arr, 3)
for i, part in enumerate(parts):
    print(f"Part {i} (len={len(part)}): {part}")
# First parts get extra elements

Original: [0 1 2 3 4 5 6 7 8 9]
Part 0 (len=4): [0 1 2 3]
Part 1 (len=3): [4 5 6]
Part 2 (len=3): [7 8 9]


In [20]:
# Compare with np.split
try:
    np.split(arr, 3)  # 10 not divisible by 3
except ValueError as e:
    print(f"np.split error: {e}")

np.split error: array split does not result in an equal division


---
## 6. Specialized Split Functions

In [21]:
arr = np.arange(16).reshape(4, 4)
print(f"Original:\n{arr}")

Original:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


In [22]:
# hsplit - split horizontally (along columns)
left, right = np.hsplit(arr, 2)
print(f"hsplit - Left:\n{left}")
print(f"hsplit - Right:\n{right}")

hsplit - Left:
[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
hsplit - Right:
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


In [23]:
# vsplit - split vertically (along rows)
top, bottom = np.vsplit(arr, 2)
print(f"vsplit - Top:\n{top}")
print(f"vsplit - Bottom:\n{bottom}")

vsplit - Top:
[[0 1 2 3]
 [4 5 6 7]]
vsplit - Bottom:
[[ 8  9 10 11]
 [12 13 14 15]]


In [24]:
# dsplit - split along depth (3D arrays)
arr3d = np.arange(24).reshape(2, 3, 4)
print(f"3D shape: {arr3d.shape}")

parts = np.dsplit(arr3d, 2)
print(f"dsplit part 0 shape: {parts[0].shape}")
print(f"dsplit part 1 shape: {parts[1].shape}")

3D shape: (2, 3, 4)
dsplit part 0 shape: (2, 3, 2)
dsplit part 1 shape: (2, 3, 2)


---
## 7. Practical Examples

In [25]:
# Train/Test split
data = np.random.rand(100, 5)  # 100 samples, 5 features
labels = np.random.randint(0, 2, 100)  # Binary labels

# 80/20 split
split_idx = int(0.8 * len(data))
train_data, test_data = np.split(data, [split_idx])
train_labels, test_labels = np.split(labels, [split_idx])

print(f"Train: {train_data.shape}, Test: {test_data.shape}")

Train: (80, 5), Test: (20, 5)


In [26]:
# Combine features from different sources
features_A = np.random.rand(100, 3)
features_B = np.random.rand(100, 5)
features_C = np.random.rand(100, 2)

all_features = np.concatenate([features_A, features_B, features_C], axis=1)
print(f"Combined features shape: {all_features.shape}")  # (100, 10)

Combined features shape: (100, 10)


In [27]:
# Batch processing - split data into batches
data = np.arange(100)
batch_size = 32

# Calculate split points
n_full_batches = len(data) // batch_size
split_points = [batch_size * (i+1) for i in range(n_full_batches)]

batches = np.array_split(data, len(data) // batch_size + 1)
print(f"Number of batches: {len(batches)}")
for i, batch in enumerate(batches):
    print(f"Batch {i}: size={len(batch)}")

Number of batches: 4
Batch 0: size=25
Batch 1: size=25
Batch 2: size=25
Batch 3: size=25


---
## Key Points Summary

**Concatenation:**
| Function | Description |
|----------|-------------|
| `np.concatenate()` | Join arrays along existing axis |
| `np.append()` | Append values (avoid in loops!) |
| `np.insert()` | Insert at specific position |
| `np.delete()` | Remove at specific position |

**Splitting:**
| Function | Description |
|----------|-------------|
| `np.split()` | Equal splits (error if not divisible) |
| `np.array_split()` | Unequal splits allowed |
| `np.hsplit()` | Split along columns (axis=1) |
| `np.vsplit()` | Split along rows (axis=0) |
| `np.dsplit()` | Split along depth (axis=2) |

**Key Rules:**
- Non-concat dimensions must match
- Avoid `np.append` in loops (O(n^2)!)
- Use `array_split` for non-divisible splits

---
## Interview Tips

**Q1: What is the difference between concatenate and stack?**
> - `concatenate` joins along EXISTING axis
> - `stack` creates a NEW axis then joins
> - Example: Two (3,4) arrays -> concat axis=0: (6,4), stack: (2,3,4)

**Q2: Why is np.append in a loop bad?**
> It creates a new array each iteration, causing O(n^2) time and memory. Use list.append() then convert, or pre-allocate.

**Q3: How do you split data for cross-validation?**
> Use `np.array_split(data, k)` for k-fold CV, or calculate indices manually for stratified splits.

**Q4: What happens if split count doesn't divide evenly?**
> `np.split` raises ValueError. Use `np.array_split` which distributes elements as evenly as possible.

---
## Practice Exercises

### Exercise 1: Concatenate arrays with different shapes (where valid)

In [28]:
# Given arrays:
a = np.ones((2, 3))
b = np.zeros((2, 4))
c = np.ones((3, 3))

# Find valid concatenations


In [29]:
# Solution
a = np.ones((2, 3))
b = np.zeros((2, 4))
c = np.ones((3, 3))

# a and b: same rows, different cols -> axis=1 works
ab = np.concatenate([a, b], axis=1)
print(f"a+b axis=1: {ab.shape}")  # (2, 7)

# a and c: same cols, different rows -> axis=0 works
ac = np.concatenate([a, c], axis=0)
print(f"a+c axis=0: {ac.shape}")  # (5, 3)

a+b axis=1: (2, 7)
a+c axis=0: (5, 3)


### Exercise 2: Split array and process each part

In [30]:
# Split this array into 4 parts and find mean of each
arr = np.random.rand(100)


In [31]:
# Solution
arr = np.random.rand(100)
parts = np.split(arr, 4)

for i, part in enumerate(parts):
    print(f"Part {i} mean: {part.mean():.4f}")

Part 0 mean: 0.4855
Part 1 mean: 0.4856
Part 2 mean: 0.4672
Part 3 mean: 0.4763


### Exercise 3: Implement a simple sliding window

In [32]:
# Create overlapping windows of size 3 from array [1,2,3,4,5,6]
# Expected: [[1,2,3], [2,3,4], [3,4,5], [4,5,6]]


In [33]:
# Solution
arr = np.array([1, 2, 3, 4, 5, 6])
window_size = 3

# Method 1: Using list comprehension
windows = np.array([arr[i:i+window_size] for i in range(len(arr) - window_size + 1)])
print(f"Windows:\n{windows}")

# Method 2: Using stride_tricks (advanced)
from numpy.lib.stride_tricks import sliding_window_view
windows2 = sliding_window_view(arr, window_size)
print(f"Using sliding_window_view:\n{windows2}")

Windows:
[[1 2 3]
 [2 3 4]
 [3 4 5]
 [4 5 6]]
Using sliding_window_view:
[[1 2 3]
 [2 3 4]
 [3 4 5]
 [4 5 6]]


---
## Next Notebook
**03_stacking_and_tiling.ipynb** - Stack arrays along new dimensions and tile/repeat arrays.