# Question 386

## Description

This problem was asked by Twitter.

Given a string, sort it in decreasing order based on the frequency of characters. If there are multiple possible solutions, return any of them.

For example, given the string `tweet`, return `tteew`. `eettw` would also be acceptable.


In [None]:
def sort_by_frequency(s):
    # Count the frequency of each character in the string
    frequency = {}
    for char in s:
        if char in frequency:
            frequency[char] += 1
        else:
            frequency[char] = 1

    # Sort the characters based on frequency (and alphabetically for equal frequencies)
    sorted_chars = sorted(frequency, key=lambda x: (-frequency[x], x))

    # Build the sorted string
    sorted_string = "".join(char * frequency[char] for char in sorted_chars)

    return sorted_string


# Test the function with the example 'tweet'
sort_by_frequency("tweet")

## Complexity Analysis

To analyze the complexity of the `sort_by_frequency` function, let's break it down into its major components:

1. **Counting the Frequency of Each Character**:

   - This step involves iterating over each character in the string `s`.
   - If the length of the string is \( n \), this step has a time complexity of \( O(n) \) since each character is visited once.
   - The space complexity for this part is \( O(k) \), where \( k \) is the number of distinct characters in the string. In the worst case (all characters are distinct), this would be \( O(n) \).

2. **Sorting the Characters**:

   - The sorting is based on the frequency of characters. The complexity of the sorting operation depends on the sorting algorithm used by Python's `sorted` function, which is Timsort.
   - The worst-case time complexity of Timsort is \( O(k \log k) \), where \( k \) is the number of distinct characters. In the worst case, this becomes \( O(n \log n) \) if all characters are distinct.
   - The space complexity of Timsort is \( O(k) \) for the temporary storage used in sorting.

3. **Building the Sorted String**:
   - The final string is constructed by concatenating characters based on their frequency.
   - This operation will again be \( O(n) \) in time complexity, as each character in the original string is processed once.
   - The space complexity for the final string is \( O(n) \), which is the size of the output string.

Overall, the time complexity of the function is dominated by the sorting step, which is \( O(n \log n) \) in the worst case. The space complexity is \( O(n) \), primarily due to the storage needed for the frequency dictionary and the final sorted string.


To optimize the solution for sorting a string based on the frequency of characters, we can leverage a data structure like a heap (priority queue) to efficiently organize characters by their frequency. This approach generally improves performance because it avoids the need to sort all distinct characters and directly constructs the result based on frequency. Here's how you can do it:

1. **Count Frequencies**: First, count the frequency of each character in the string, similar to the previous approach.

2. **Use a Max Heap**: Use a max heap (priority queue) where each element is a pair (frequency, character). Python's `heapq` module can be used, but since it provides a min heap, we can invert the frequencies to simulate a max heap.

3. **Build the Result String**: Extract elements from the heap and build the result string by appending each character times its frequency.

This approach potentially offers better performance, especially when the number of distinct characters \( k \) is much smaller than the length of the string \( n \).

Let's implement this approach and see how it performs.


In [None]:
import heapq

def sort_by_frequency_efficient(s):
    # Count the frequency of each character
    frequency = {}
    for char in s:
        frequency[char] = frequency.get(char, 0) + 1

    # Create a max heap based on frequency
    max_heap = [(-freq, char) for char, freq in frequency.items()]
    heapq.heapify(max_heap)

    # Build the sorted string
    sorted_string = []
    while max_heap:
        freq, char = heapq.heappop(max_heap)
        sorted_string.append(char * -freq)

    return ''.join(sorted_string)

# Test the efficient function with the example 'tweet'
sort_by_frequency_efficient('tweet')


The more efficient method using a max heap successfully sorts the string "tweet" into "eettw", which is a valid solution based on the frequency of characters.

### Complexity Analysis

1. **Count Frequencies**:

   - Time Complexity: \( O(n) \), iterating over each character in the string.
   - Space Complexity: \( O(k) \), where \( k \) is the number of distinct characters.

2. **Creating and Managing the Max Heap**:

   - The heap is created from \( k \) elements (distinct characters).
   - Time Complexity: Building the heap is \( O(k \log k) \).
   - Extracting all elements from the heap involves \( k \) operations, each of which is \( O(\log k) \), leading to a total of \( O(k \log k) \).

3. **Building the Result String**:
   - Time Complexity: \( O(n) \), as each character is appended based on its frequency.
   - Space Complexity: \( O(n) \) for the output string.

### Overall Complexity

- Time Complexity: \( O(n + k \log k) \). For a large \( n \) and small \( k \), this can be more efficient than the initial approach.
- Space Complexity: \( O(n) \), mainly for the frequency map and the output string.

This method is more efficient when the number of distinct characters \( k \) is much smaller than the length of the string \( n \), which is often the case in practical scenarios.
