# __Generating Permutations__

Let us run through differnt algorithms to help us generate all permutations of an `array` or `string`.

First, we note that if we can generate all integer sequences from $0..(n-1)$, we can map such sequences to other data structures.

For instance,

`A = [10, 21, 16, 8]`

`n = 4`

The integer __path__ `<0, 2, 3, 1>` corresponds to the array `[10, 16, 8, 21]`

Also, remember that there are `n!` ways to order `n` items.

In [1]:
def map_indices(path, array):
	return (array[i] for i in path)

def print_path(path):
	print(', '.join(str(x) for x in path))

## Table of Contents

- [BFS](#bfs)

- [DFS](#dfs)

- [DFS Optimizations](#optim)

- [Advanced Algorithms](#advanced-algorithms)

- [Empirical Time Tests](#time-tests)

- [Summary](#summary)

- [Bonus (Generate Unique)](#unique)

<h2 id="bfs">Tree Traversals – Breadth-First Search</h2>

In [2]:
from collections import deque

# Time:  O(n! * n^2)
# Space: O(n! * n)

def gen_bfs(n):
	queue = deque() # FIFO queue of partial paths to process
	queue.append([]) # Start with an empty path
	
	# While there are paths to process
	while queue:
		path = queue.popleft()
		
		# If the path is complete, print the permutation
		if len(path) == n:
			print_path(path)
			continue
		
		# Explore every unused number from 0 to n-1
		for i in range(n):
			if i not in path:
				new_path = path + [i]
				queue.append(new_path)

In [3]:
# Example usage
gen_bfs(3)

0, 1, 2
0, 2, 1
1, 0, 2
1, 2, 0
2, 0, 1
2, 1, 0


### Time Complexity Analysis

The function `gen_bfs(n)` generates all permutations of integers from $0$ to $n - 1$ using a **BFS**. We'll analyze the total time complexity in two parts: the number of paths processed and the work done per path.

---

**1. Number of Paths**

At each level $k$ $(0 \leq k \leq n)$ of the BFS tree, the algorithm generates all permutations of length $k$. The number of such partial paths is:

$$
\sum_{k=0}^{n} \frac{n!}{(n-k)!} = n! \cdot \sum_{k=0}^{n} \frac{1}{k!}
$$

The Taylor series for $e^x$ is $\displaystyle e^x = \sum_{i=0}^{\infin} \frac{x^i}{i!}$

Therefore, $\displaystyle e^1 = \sum_{i=0}^{\infin} \frac{1}{i!}$

Then, we know $\displaystyle \sum_{k=0}^{n} \frac{1}{k!} < e$, a constant, so:

$$
\text{Total paths} = O(n! \cdot e) = O(n!)
$$

---

**2. Work Per Path**

The following loop is run for each partial path:

```python
for i in range(n):
    if i not in path:
        new_path = path + [i]
        queue.append(new_path)
```

Each line inside the loop contributes:

- `for i in range(n)`: runs everything $n$ times
	- `if i not in path`: linear search in a list of length up to $n$ $\Rightarrow O(n)$
	- `new_path = path + [i]`: list copy (size up to $n$) $\Rightarrow O(n)$
	- `queue.append(...)`: constant time $\Rightarrow O(1)$

So each iteration is $O(n)$, and with $n$ iterations total, the full block runs in:

$$
O(n) \cdot O(n + n + 1) = O(n^2)
$$

---

**3. Total Time Complexity**

We process $O(n!)$ partial and complete paths. Each path requires $O(n^2)$ work in the worst case.

Therefore, the total time complexity is:

$$
O(n!) \cdot O(n^2) = \boxed{O(n! \cdot n^2)}
$$


### Space Complexity Analysis

We know the `queue` will store $O(n!)$ partial paths throughout the algorithm.

Each path takes up to $O(n)$ space (a list of integers), so total queue space is:

$$
O(n!) \cdot O(n) = \boxed{O(n! \cdot n)}
$$

<h2 id="dfs">Depth-First Search</h2>

In [4]:
# Time:  O(n! * n)
# Space: O(n)

def gen_dfs(n):
	used = [False] * n

	def advance(path):
		# New permutation
		if len(path) == n:
			# Do something here
			print_path(path) # -- we will just print the integer path from now on
			return
		# Go down all remaining pathways
		for i in range(n):
			if not used[i]:
				used[i] = True
				path.append(i)
				advance(path)
				path.pop() # Backtrack
				used[i] = False

	advance([])

In [5]:
# Example usage
gen_dfs(3) # Output: 3! = 6 permutations

0, 1, 2
0, 2, 1
1, 0, 2
1, 2, 0
2, 0, 1
2, 1, 0


### Time Complexity Analysis

The function `gen_dfs(n)` generates all permutations of integers from $0$ to $n - 1$ using a **DFS**. We'll analyze the time complexity in terms of:

- The number of recursive calls (i.e., number of paths visited)

- The work done per recursive call

---

**1. Number of Recursive Calls**

At each level $k$ of the recursion tree $(0 \leq k \leq n)$, we build partial permutations of length $k$.

At level $k$, the number of paths is:

$$
\frac{n!}{(n - k)!}
$$

Summing over all levels:

$$
\sum_{k=0}^{n} \frac{n!}{(n-k)!} = n! \cdot \sum_{k=0}^{n} \frac{1}{k!} = O(n!)
$$

So, total recursive calls (nodes in the DFS tree) is:

$$
O(n!)
$$

---

**2. Work Per Call**

Each recursive call does:

- `if len(path) == n`: $O(1)$
- Loop over all $n$ values:
  - `if not used[i]`: $O(1)$
  - `used[i] = True`, `path.append(i)`: $O(1)$
  - `advance(path)`: recursive call
  - `path.pop()`, `used[i] = False`: $O(1)$

Each call performs a loop of $n$ iterations, with $O(1)$ work per iteration (excluding the recursive call itself).

So, each call performs:

$$
O(n)
$$

---

**3. Total Time Complexity**

We perform $O(n!)$ recursive calls, each doing $O(n)$ work.

Hence, total time complexity is:

$$
O(n!) \cdot O(n) = \boxed{O(n! \cdot n)}
$$


### Space Complexity Analysis

**1. Recursive Call Stack**

The function makes recursive calls, with the depth of recursion being at most $n$—since we are building paths of length $n$.

Each recursive call adds a new frame to the call stack. Therefore, the space required for the recursive call stack is:

$$
O(n)
$$

---

**2. Auxiliary Data Structures**

- `used`: This is an array of size $n$, used to track which numbers have been used in the current path.

	It is shared between calls through closure. Therefore, it independently takes:

$$
O(n)
$$

- `path`: The path stores the current permutation being built. It's passed by reference, so only 1 array is used in memory.

	At the deepest level of recursion, the path contains $n$ integers. So, the space taken by the path is:

$$
O(n)
$$

---

**3. Total Space Complexity**

The total space complexity is the sum of:

- Space for the recursive call stack: $O(n)$
- Space for the `used` array: $O(n)$
- Space for the `path` array: $O(n)$

Thus, the total space complexity is:

$$
O(n) + O(n) + O(n) = \boxed{O(n)}
$$

<h2 id="optim">Optimizations</h2>

In the case of generating permutations, it seems that __DFS__ is superior to __BFS__ in terms of both time and memory.

Now, let us try to optimize the __DFS__ further:

In [6]:
# Time:  O(n! * n)
# Space: O(n) 
# We use an integer bitmask, instead of an array of size n

def gen_dfs_bitmask(n):
	def dfs(path, mask):
		if len(path) == n:
			print_path(path)
			return
		for i in range(n):
			if not (mask & (1 << i)):
				path.append(i)
				dfs(path, mask | (1 << i))
				path.pop()
	dfs([], 0)

In [7]:
# Example usage
gen_dfs_bitmask(3)

0, 1, 2
0, 2, 1
1, 0, 2
1, 2, 0
2, 0, 1
2, 1, 0


We can also use an iterative __DFS__ approach instead of relying on recursion.

This algorithm will behave identically in terms of asymptotic time and space complexity,  
but it may reduce constant overhead associated with setting up call frames on the call stack during recursion.

In [8]:
# Time:  O(n! * n)
# Space: O(n)

def gen_dfs_iterative(n):
	stack = [(None, 0)] # Stack of (value_to_add_next, depth)
	path = []
	used = set()

	while stack:
		val, depth = stack.pop()

		if depth > 0:
			# Ensure anything at or above the current solution depth is removed
			while len(path) >= depth:
				x = path.pop()
				used.remove(x)

			# Add next integer to the path
			path.append(val)
			used.add(val)

			# Full permutation
			if depth == n:
				print_path(path)
				continue

		# Add next nodes to the stack
		for i in range(n-1, -1, -1):
			if i not in used:
				stack.append((i, depth + 1))

In [9]:
# Example usage
gen_dfs_iterative(3)

0, 1, 2
0, 2, 1
1, 0, 2
1, 2, 0
2, 0, 1
2, 1, 0


In [10]:
# Time:  O(n! * n)
# Space: O(n)

def gen_dfs_bitmask_iterative(n):
	stack = [(None, 0, 0)] # Stack of (value_to_add_next, depth, mask)
	path = [None] * n # Reuse this array

	while stack:
		val, depth, mask = stack.pop()

		if depth > 0:
			path[depth-1] = val

		if depth == n:
			print_path(path)
			continue

		for i in range(n-1, -1, -1):
			if not (mask & (1 << i)):
				stack.append((i, depth + 1, mask | (1 << i)))

In [11]:
# Example usage
gen_dfs_bitmask_iterative(3)

0, 1, 2
0, 2, 1
1, 0, 2
1, 2, 0
2, 0, 1
2, 1, 0


We can also use a generator instead of a function to __lazily__ compute permutations only when they are needed,  
rather than evaluating everything at once.

The number of operations and memory-usage will be exactly the same as the previous __DFS__,  
but we can choose to execute the algorithm in stages, dispersing the work load over time.

In [12]:
# Time:  O(n! * n)
# Space: O(n)

def gen_lazy(n):
	def dfs(path, mask):
		if len(path) == n:
			yield path
		else:
			for i in range(n):
				if not (mask & (1 << i)):
					path.append(i)
					# Start generating from a sub-generator (itself)
					yield from dfs(path, mask | (1 << i))
					path.pop()
	yield from dfs([], 0)

In [13]:
# Example usage
path_generator = gen_lazy(20)

# First 5 permutations
for _ in range(5):
	print_path(next(path_generator))

# Next 5 permutations (called later)
print('\nDo other things...\n')
for _ in range(5):
	print_path(next(path_generator))

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 18
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 17, 19
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 17
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 19, 17, 18

Do other things...

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 19, 18, 17
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 16, 18, 19
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 16, 19, 18
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 16, 19
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 16


<h2 id="advanced-algorithms">Advanced Algorithms</h2>

Next, we will look at some advanced permutation-generating algorithms that may reduce the time complexity further or offer other benefits.

These algorithms all start with an initial permutation and modify it in-place to run through all possibilities once.

- [Heap's Algorithm](https://en.wikipedia.org/wiki/Heap%27s_algorithm)

- [Knuth's Algorithm L](https://guptamukul.blogspot.com/2009/12/understanding-algorithm-l_05.html)

- [Steinhaus–Johnson–Trotter Algorithm](https://en.wikipedia.org/wiki/Steinhaus%E2%80%93Johnson%E2%80%93Trotter_algorithm)

In [14]:
# Time:  O(n! * n)
# Space: O(n)

def gen_heaps(n):
	path = list(range(n)) # First permutation

	def dfs(k):
		if k == 1:
			print_path(path)
			return
		for i in range(k):
			dfs(k-1)
			# In-place swapping
			if k % 2 == 0: # Even k
				path[i], path[k-1] = path[k-1], path[i]
			else: # Odd k
				path[0], path[k-1] = path[k-1], path[0]

	dfs(n)

In [15]:
# Example usage
gen_heaps(3) # Order NOT lexicographic, but very efficient

0, 1, 2
1, 0, 2
2, 0, 1
0, 2, 1
1, 2, 0
2, 1, 0


In [16]:
# Time:  O(n! * n)
# Space: O(n)  Just the permutation array (in-place modification)

# Guarantees lexicographic order of permutations

def gen_knuth_L(n):
	path = list(range(n)) # Start with the lowest lex permutation

	def next_permutation():
		# Step 1: Find largest i such that path[i] < path[i+1]
		i = n - 2
		while i >= 0 and path[i] >= path[i+1]:
			i -= 1
		if i < 0:
			return False # Last permutation reached
		
		# Step 2: Find largest j > i such that path[i] < path[j]
		j = n - 1
		while path[j] <= path[i]:
			j -= 1

		# Step 3: Swap elements at i and j
		path[i], path[j] = path[j], path[i]

		# Step 4: Reverse suffix
		path[i + 1:] = reversed(path[i + 1:])
		return True
	
	# First permutation
	print_path(path)
	while next_permutation():
		print_path(path)

In [17]:
# Example usage
gen_knuth_L(3) # Lexicographic order

0, 1, 2
0, 2, 1
1, 0, 2
1, 2, 0
2, 0, 1
2, 1, 0


In [18]:
# Time:  O(n! * n)
# Space: O(n)

# Generates permutations in "minimal-change order" 
# - where each new permutation differs from the previous by a single adjacent swap

def gen_sjt(n):
	path = list(range(n))
	dirs = [-1] * n # Directions: -1 = left, +1 = right

	def find_largest_mobile():
		largest = -1
		index = -1
		for i in range(n):
			j = i + dirs[i]
			if 0 <= j < n and path[i] > path[j]:
				if path[i] > largest:
					largest = path[i]
					index = i
		return index
	
	print_path(path)
	while True:
		i = find_largest_mobile()
		if i == -1:
			break # No mobile elements left

		j = i + dirs[i]
		path[i], path[j] = path[j], path[i]
		dirs[i], dirs[j] = dirs[j], dirs[i]

		# After move, reverse direction of all elements > moved one
		moved_val = path[j]
		for k in range(n):
			if path[k] > moved_val:
				dirs[k] *= -1

		print_path(path)

In [19]:
# Example usage
gen_sjt(3) # Order NOT lexicographic

0, 1, 2
0, 2, 1
2, 0, 1
2, 1, 0
1, 2, 0
1, 0, 2


<h2 id="time-tests">Time & Space Comparisons</h2>

In [20]:
import time

# First, let us replace the output function to something trivial
def print_path(path):
	return None # Do nothing

def measure_time(func, n, iterations):
	''' Runs the `func` `iterations` times,  
		measuring only CPU processing time, and averaging the results
	'''
	total_time = 0
	for _ in range(iterations):
		start = time.process_time()
		func(n) # Call the algorithm with input n
		end = time.process_time()
		total_time += (end - start)

	avg_time = total_time / iterations
	return avg_time

def test_lazy_dfs(n):
	''' Helper for testing the lazy generator '''
	g = gen_lazy(n)
	for _ in g:
		pass

algorithms = {
		'gen_bfs': gen_bfs,
		'gen_dfs': gen_dfs,
		'gen_dfs_bitmask': gen_dfs_bitmask,
		'gen_dfs_iterative': gen_dfs_iterative,
		'gen_dfs_bitmask_iterative': gen_dfs_bitmask_iterative,
		'gen_lazy': test_lazy_dfs,
		'gen_heaps': gen_heaps,
		'gen_knuth_L': gen_knuth_L,
		'gen_sjt': gen_sjt
	}

def test_all(n, iterations=5):
	''' Measure the average processing time of each algorithm. '''
	results = {} # name -> avg_time (in seconds)

	print(f'Average times for generating {n}-permutations, {iterations} times each:\n')
	for name, func in algorithms.items():
		avg_time = measure_time(func, n, iterations)
		results[name] = avg_time
		print(f'{name:>26}: {avg_time:.10f}s')

	return results

In [21]:
results = test_all(10)

Average times for generating 10-permutations, 5 times each:

                   gen_bfs: 12.7187500000s
                   gen_dfs: 7.8843750000s
           gen_dfs_bitmask: 10.5218750000s
         gen_dfs_iterative: 10.0968750000s
 gen_dfs_bitmask_iterative: 12.4656250000s
                  gen_lazy: 13.7968750000s
                 gen_heaps: 2.3281250000s
               gen_knuth_L: 3.3812500000s
                   gen_sjt: 10.0937500000s


In [22]:
import plotly.graph_objects as go

sorted_results = sorted(results.items(), key=lambda x: x[1])  # Sort by time

# Algorithm titles
label_map = {
	'gen_heaps': "Heap's Algorithm",
	'gen_knuth_L': "Knuth's Algorithm L",
	'gen_dfs': "DFS",
	'gen_dfs_bitmask': "DFS with bitmask",
	'gen_dfs_iterative': "Iterative DFS",
	'gen_dfs_bitmask_iterative': "Iterative DFS with bitmask",
	'gen_sjt': "Steinhaus–Johnson–Trotter",
	'gen_lazy': "Lazy DFS",
	'gen_bfs': "BFS"
}

# Prepare x and y values
algorithms = [label_map[name] for name, _ in sorted_results]
times = [time for _, time in sorted_results]

# Create bar chart
fig = go.Figure(data=[
	go.Bar(
		x=algorithms,
		y=times,
		marker=dict(color='rgba(52, 152, 219, 0.65)'),
		text=[f'{t:.4f}s' for t in times],
		textposition='outside'
	)
])

# Layout config
fig.update_layout(
	title='Average Time to Generate All 10-Permutations (5 Runs)',
	xaxis_title='Algorithm',
	yaxis_title='Average Time (seconds)',
	yaxis=dict(tickformat='.2f'),
	margin=dict(l=40, r=40, t=60, b=100),
	height=500,
	showlegend=False
)

fig.show(config={
	'staticPlot': True
})

<h2 id="summary">Summary</h2>

| Algorithm                         | Time Complexity        | Space Complexity      | Lexicographic Order | Empirical Time for n=10 (avg. 5 reps) | Extra Notes                              |
|-----------------------------------|:----------------------:|:---------------------:|:-------------------:|--------------------------------------:|:----------------------------------------:|
| Heap's Algorithm                  | $O(n! \cdot n)$        | $O(n)$                | No                  | 2.4468750000s                         | In-place permutation                     |
| Knuth's Algorithm L               | $O(n! \cdot n)$        | $O(n)$                | Yes                 | 3.6437500000s                         | Lexicographic order                      |
| DFS                               | $O(n! \cdot n)$        | $O(n)$                | Yes                 | 6.3687500000s                         | Simple recursive DFS                     |
| DFS with bitmask                  | $O(n! \cdot n)$        | $O(n)$                | Yes                 | 9.8500000000s                         | Uses bitmask instead of `used` array     |
| Iterative DFS with bitmask        | $O(n! \cdot n)$        | $O(n)$                | Yes                 | 10.2562500000s                        |                                          |
| Iterative DFS                     | $O(n! \cdot n)$        | $O(n)$                | Yes                 | 10.6406250000s                        | No recursion, explicit stack             |
| Steinhaus–Johnson–Trotter         | $O(n! \cdot n)$        | $O(n)$                | No                  | 10.9468750000s                        | Minimal-change order                     |
| Lazy DFS                          | $O(n! \cdot n)$        | $O(n)$                | Yes                 | 12.2437500000s                        | Lazy generation (yield)                  |
| BFS                               | $O(n! \cdot n^2)$      | $O(n! \cdot n)$       | Yes                 | 12.6812500000s                        | Uses a queue (FIFO)                      |

#### __Conclusions__

All but __BFS__ had the same time complexity, yet we observed drastic empirical time differences when running the algorithms,  
	with __Heap's algorithm__ performing about 5 times the rate of __BFS__. 
	
This is a good lesson teaching us that asymptotic growth rates simply explain how quickly the  
algorithms' overhead increase as `n` grows large. It does not tell us exactly how two algorithms with the same  
asymptotic behavior will compare. And since all these algorithms must grow at a rate of at least $O(n!)$,  
as $n!$ permutations must be generated, even small `n`-values will take quite a long time to process.

Consequently, we are never testing $n$ large enough for the growth difference between $O(n! \cdot n^2)$ and $O(n! \cdot n)$  
to be shown effectively, as seen in the empirical time delta between __BFS__ and __Iterative DFS__.

<br>

- 🥇 __Best Performer – Heap’s Algorithm__

	Heap's algorithm is clearly the fastest by a large margin. Despite not generating permutations in lexicographic order,  
	it benefits from in-place, minimal memory manipulation and avoids the overhead of deep recursion or stack management.

- 📈 __Strong Runner-Up – Knuth’s Algorithm L__

	It’s highly optimized in practice for generating permutations in lexicographic order with minimal changes per step.  

- 🧠 __Recursive DFS Outperforms Iterative Variants__

	Surprisingly, the plain recursive __DFS__ outperforms both iterative __DFS__ and the bitmask versions. This is likely due to:

	- **Function Call Optimization**: 

		Python's recursive calls leverage efficient C-level stack frames, with predictable patterns that the interpreter can optimize.  
		In contrast, manual stacks and objects like lists are subject to duck typing and dispatch overhead,  
		as their operations are more generalized to handle various use cases, which prevents the same level of optimization.

	- **Manual Stack Overhead**:  

		The iterative __DFS__ implementations require explicit stack management,  
		which in Python is less efficient due to its dynamic typing and higher-level abstractions.  
		Each operation on the stack can incur overhead from Python’s object management and dynamic resizing of memory.

	- **Bitmasking Performance**: 
	
		While bitmasking seems like a good optimization for space efficiency, it introduces extra arithmetic and logic operations,  
		which slow things down. 
		In Python, where integers are treated as objects with additional overhead (like type-checking)  
		compared to low-level languages, bitmasking isn't as fast as manipulating simple lists of booleans.

- 🐢 __Worst Performers – Lazy DFS and BFS__

	- __Lazy DFS__ suffers due to generator overhead. The use of generators (`yield`) incurs extra overhead due to state management.  
		In Python, this overhead is significant because generators are implemented using closures, which require more operations per iteration.

	- __BFS__ is by far the slowest due to its time growth rate, $O(n! \cdot n^2)$,  
		frequent list operations, and the construction and copying of partial paths (list objects) in the queue.

- ⚙️ __Optimizations Gone Wrong__

	Some optimizations (like bitmasking and iterative control) didn’t help performance here. 
	
	While conceptually they reduce space or improve control, Python’s high-level nature and dynamic typing mean  
	the cost of arithmetic and manual control structures outweighs their benefit—in contrast to languages with fixed-size primitive types, like C.

<h2 id="unique">Bonus – Generating Unique Permutations</h2>

Let's adapt our easy-to-implement and fairly efficient __DFS__ to avoid generating duplicate permutations when we have objects that are identical.

For instance, look at the permutations of `aab`.

In [23]:
string = 'aab'

def print_path(path):
	print(''.join(map_indices(path, string))) # Convert integer path to a sequence of objects

gen_dfs(len(string))

aab
aba
aab
aba
baa
baa


We have generated 3 duplicate permutations!

This is because for each level in the __DFS__ we can either use the _first_ `a` or the _second_.

The same possible branches stem from either decision since both cases use an `a`  
at a particular depth, and the other is leftover.

To illustrate this further, visualize 3 slots:

`_ _ _`

We have `a1`, `a2`, and `b` remaining.

We can fill the first slot with `a1` or `a2`.

- `a1 _ _` &ensp; with `a2` and `b` remaining.

- `a2 _ _` &ensp; with `a1` and `b` remaining.

Now, if we remove the identifiers, since `a1` and `a2` are identical, we are left with:

- `a _ _` &ensp; with `a` and `b` remaining.

- `a _ _` &ensp; with `a` and `b` remaining.

Two duplicate states!

Let us prune duplicate branches before being explored.

In [24]:
def gen_unique_dfs(objects):
	chars = sorted(objects) # Sort to enforce adjacent duplicates
	used = [False] * len(objects)

	# Now, we will build the permutations of the objects `chars` directly
	def advance(p: list):
		if len(p) == len(chars): # Found permutation
			print(''.join(p))
			return
		for i in range(len(chars)):
			if used[i]: # Already used this object in current p (not including duplicates)
				continue
			# Skip branches that are duplicates
			# -- choose only one `a` from a situation with many that remain unused (in the current path)
			if i > 0 and chars[i] == chars[i - 1] and not used[i - 1]:
				continue
			used[i] = True
			p.append(chars[i])
			advance(p)
			p.pop()
			used[i] = False
	
	advance([])

In [25]:
# Example usage
gen_unique_dfs('aab')

aab
aba
baa


In [26]:
gen_unique_dfs('aabbcc')

aabbcc
aabcbc
aabccb
aacbbc
aacbcb
aaccbb
ababcc
abacbc
abaccb
abbacc
abbcac
abbcca
abcabc
abcacb
abcbac
abcbca
abccab
abccba
acabbc
acabcb
acacbb
acbabc
acbacb
acbbac
acbbca
acbcab
acbcba
accabb
accbab
accbba
baabcc
baacbc
baaccb
babacc
babcac
babcca
bacabc
bacacb
bacbac
bacbca
baccab
baccba
bbaacc
bbacac
bbacca
bbcaac
bbcaca
bbccaa
bcaabc
bcaacb
bcabac
bcabca
bcacab
bcacba
bcbaac
bcbaca
bcbcaa
bccaab
bccaba
bccbaa
caabbc
caabcb
caacbb
cababc
cabacb
cabbac
cabbca
cabcab
cabcba
cacabb
cacbab
cacbba
cbaabc
cbaacb
cbabac
cbabca
cbacab
cbacba
cbbaac
cbbaca
cbbcaa
cbcaab
cbcaba
cbcbaa
ccaabb
ccabab
ccabba
ccbaab
ccbaba
ccbbaa
