# Question 395

## Description

This problem was asked by Robinhood.

Given an array of strings, group anagrams together.

For example, given the following array:

`['eat', 'ate', 'apt', 'pat', 'tea', 'now']`

Return:

```python
[['eat', 'ate', 'tea'],
 ['apt', 'pat'],
 ['now']]
```


In [3]:
from collections import defaultdict


def group_anagrams(words):
    anagrams = defaultdict(list)

    for word in words:
        # count characters in word
        char_count = [0] * 26
        for char in word:
            char_count[ord(char) - ord("a")] += 1
        # use the character count as the key
        anagrams[tuple(char_count)].append(word)

    # return the grouped anagrams
    return list(anagrams.values())

In [4]:
# Test the function with the provided example
test_words = ["eat", "ate", "apt", "pat", "tea", "now"]
group_anagrams(test_words)

[['eat', 'ate', 'tea'], ['apt', 'pat'], ['now']]

Improving the efficiency of the anagram grouping algorithm mainly revolves around reducing the time complexity associated with sorting each word, which is \(O(n \cdot k \log k)\) in the current approach. One way to do this is by using a character count instead of sorting. Here's how you can modify the approach:

1. **Use Character Count as Key**: Instead of sorting each word, count the frequency of each character in the word. This can be represented as a tuple of 26 elements (assuming the English alphabet), where each element represents the count of a particular character (a-z) in the word. For example, 'aab' would be represented as (2, 1, 0, 0, ..., 0), indicating 2 'a's, 1 'b', and 0 of every other letter.

2. **Hash Map for Grouping**: Use this tuple as the key in the hash map. Words that are anagrams of each other will have the same character count and thus the same key.

This method avoids the \(O(k \log k)\) sorting time for each word, replacing it with \(O(k)\) for counting characters, which can be significant, especially for longer words.

Let's implement this improved method in Python:

The more efficient method also correctly grouped the anagrams. The result is as follows:

- \[['eat', 'ate', 'tea'], ['apt', 'pat'], ['now']\]

This method is more efficient, especially for longer words, because it avoids the \(O(k \log k)\) sorting time for each word and replaces it with \(O(k)\) for counting characters. Here's the updated complexity analysis:

### Time Complexity

1. **Iterating Through Each Word**: This remains \(O(n)\), where \(n\) is the number of words.

2. **Counting Characters in Each Word**: The time to count characters in each word is \(O(k)\), where \(k\) is the average length of a word. Since this is done for each word, the total time for this step is \(O(n \cdot k)\).

3. **Inserting into a Hash Map**: The insertion into a hash map remains \(O(1)\) on average for each word, so this doesn't significantly change the overall time complexity.

Overall, the time complexity is now \(O(n \cdot k)\), which is an improvement over the previous \(O(n \cdot k \log k)\) when \(k\) is large.

### Space Complexity

The space complexity remains largely the same, \(O(n \cdot k)\), because the space is primarily used to store the anagrams in the hash map and the output list. The character counts do not significantly add to the space complexity since they are of fixed size (26 for the English alphabet).

In summary:

- **Time Complexity**: \(O(n \cdot k)\)
- **Space Complexity**: \(O(n \cdot k)\)

This method is more efficient, particularly for large datasets with longer words.
