#### [Python <img src="../../assets/pythonLogo.png" alt="py logo" style="height: 1em; vertical-align: sub;">](../README.md) | Medium 🟠 | [Arrays & Hashing](README.md)
# [49. Group Anagrams ](https://leetcode.com/problems/group-anagrams/description/)

Given an array of strings `strs`, group **the anagrams** together. You can return the answer in **any order**.

An **Anagram** is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

**Example 1:**
> **Input**:`strs = ["eat","tea","tan","ate","nat","bat"]`  
> **Output**: `[["bat"],["nat","tan"],["ate","eat","tea"]]`  

**Example 2:**
> **Input**: `strs = [""]`  
> **Output**: `[[""]]`  

**Example 3:**
> **Input**: `strs = ["a"]`  
> **Output**: `[["a"]]`   

#### Constraints
- `1 <= strs.length <=` $10^4$
- `0 <= strs[i].length <= 100`
- `strs[i]` consists of lowercase English letters.

## Problem Explanation
- The problem "Group Anagrams" asks us to group together strings that are anagrams of each other. 
- An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, using all the original letters exactly once.
- For instance, "bat" and "tab" are anagrams. 
- The task is to create groups of anagrams from the given array of strings and return them in a list of lists.
***

# Approach 1: Character Count 
A solid approach to tackling this problem is counting the frequency of each character in a string and using that count as a key to group anagrams.

## Intuition
- The intuition behind this approach is that two strings are anagrams if and only if their character counts (for each letter) are the same. 
- Instead of sorting the strings to check if they are anagrams, we can count the occurrences of each character. 
- This count, represented as a tuple of 26 elements (for each letter in the English alphabet), serves as a unique identifier for each group of anagrams.

## Algorithm
1. **Initialize:** a dictionary (hashmap) to store the lists of anagrams, where the keys will be the unique character count tuples and the values will be the lists of strings that share the same character count.
2. **Iterate over each string in the input list**
3. **Count characters:** For each string, create a count of 26 zeroes, one for each letter. Then, increment the count for each character in the string.
4. **Group anagrams:** Use the tuple of counts as a key in the dictionary. Then append the current string to the list corresponding to this key.
5. **Return Groups:** Since the dictionary's values hold the grouped anagrams, we can return those.

## Code Implementation

In [9]:
from collections import defaultdict

class Solution:
    def groupAnagrams(self, strs):
        ans = defaultdict(list)   # Create a defaultdict to store anagram groups

        for s in strs:  # Iterate through each string in the input array
            count = [0] * 26  # Create a count array of size 26
            for c in s:# Increment the count in the corresponding index of the count array
                count[ord(c) - ord("a")] += 1
            ans[tuple(count)].append(s)  
            # Convert the count array to a tuple and use it as the key in the dict, appending
            # the current string to the corresponding list
        return list(ans.values())  # Return groups of anagrams

### Testing

In [10]:
def test_solution(solution_class):
    test_cases = [
        {
            "input": ["eat", "tea", "tan", "ate", "nat", "bat"],
            "expected": [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]],
        },
        {
            "input": [""],
            "expected": [[""]],
        },
        {
            "input": ["a"],
            "expected": [["a"]],
        },
    ]

    solution = solution_class()
    passed_count = 0

    for i, test_case in enumerate(test_cases):
        strs = test_case["input"]
        expected = test_case["expected"]
        result = solution.groupAnagrams(strs)

        print(f"Test Case {i+1}:")
        print("Input:", strs)
        print("Expected Output:", expected)
        print("Result:", result)
        
        # Sort the expected and result lists for comparison
        expected.sort()
        result.sort()
        
        if expected == result:
            print("Test Case Passed!\n")
            passed_count += 1
        else:
            print("Test Case Failed!\n")

    total_test_cases = len(test_cases)
    if passed_count == total_test_cases:
        print("All Test Cases Passed!")
    else:
        print(f"{passed_count}/{total_test_cases} Test Cases Passed.")
test_solution(Solution) 

Test Case 1:
Input: ['eat', 'tea', 'tan', 'ate', 'nat', 'bat']
Expected Output: [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
Result: [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
Test Case Passed!

Test Case 2:
Input: ['']
Expected Output: [['']]
Result: [['']]
Test Case Passed!

Test Case 3:
Input: ['a']
Expected Output: [['a']]
Result: [['a']]
Test Case Passed!

All Test Cases Passed!


## Complexity Analysis
- **Variables**:
    - $n$ is the number of strings in the input list.
    - $k$ is the maximum length of a string.

- ### Time Complexity: $O(nk \log{k})$ 
    - Since we sort each string individually, the time to sort takes $O(k\log k)$ time for a string of length $k$.
- ### Space Complexity: $O(n \cdot k)$ 
    - The space complexity comes from the storage needed for the output, which, in the worst case would include space for all the strings in their sorted form. 
    - While the keys of the dictionary (sorted tuples) have a combined space complexity of $O(nk)$, the list of strings also requires $O(nk)$ space, so thus the space complexity overall is still $O(nk)$.
***

# Approach 2: Categorize by Sorted String


## Intuition: 
The idea is quite straightforward: if two strings are anagrams, sorting both the strings alphabetically will result in two identical strings. Thus, the sorted string serves as an ideal identifier for the anagram group. This method relies on sorting as a way to standardize the representation of the angrams which makes it easier to group them together.

## Algorithm
1. **Initialize** a dictionary of lists. This is of course for storing the lists of anagrams where the sorted strings are the keys.
2. **Iterate** over each string in the input list.
3. **Sort and Categorize:** For each string, sort the characters alphabetically and use the sorted string as a key in the dictionary. Append the original string to the list corresponding to the key.
4. **Return Groups:** The values of the dictionary are the desired groups of anagrams so return those values.

## Code Implementation

In [11]:
from collections import defaultdict

class Solution2:
    def groupAnagrams(self, strs):
        anagrams = defaultdict(list)  # Initializes a defaultdict to hold lists of anagrams
        
        for word in strs:  # Iterates over each string in the input list
            sorted_word = ''.join(sorted(word))  # Sorts the string and joins it back
            anagrams[sorted_word].append(word)  # Appends the original string to the list for that sorted key
        
        return list(anagrams.values())  # Returns all the lists of anagrams

### Testing

In [12]:
test_solution(Solution2) 

Test Case 1:
Input: ['eat', 'tea', 'tan', 'ate', 'nat', 'bat']
Expected Output: [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
Result: [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
Test Case Passed!

Test Case 2:
Input: ['']
Expected Output: [['']]
Result: [['']]
Test Case Passed!

Test Case 3:
Input: ['a']
Expected Output: [['a']]
Result: [['a']]
Test Case Passed!

All Test Cases Passed!


## Complexity Analysis
- **Variables**:
    - $n$ is the number of strings in the input list.
    - $k$ is the maximum length of a string.

- ### Time Complexity: $O(n \cdot k)$ 
    - Since we iterate through each string and then iterate through each character of the string to count the characters thus we have $n\cdot k$.
- ### Space Complexity: $O(n \cdot k)$ 
    - Given that we store each string in the dictionary, in the worst casem if all strings are unique anagrams, we may possibly have to store all the strings in the dictionary.
***

# Approach 2.1: Sorting and Hashing


## Intuition: 
The core idea remains the same: sort each string to use as a key in a dictionary (or defaultdict), where each key maps to a list of strings that, when sorted, are identical to the key. Though, we have a few differences:
- **Direct String Manipulation**: Instead of converting the sorted string into a tuple, this approach directly joins the sorted characters back into a string. This makes the key more recognizable and easier to understand at a glance.
- **Simpler Transformation**: The transformation from the input string to the key is done in a single line (sorted_word = ''.join(sorted(word))).

## Algorithm
1. **Initialize a defaultdict:** A defaultdict with lists as default values is initialized to store groups of anagrams.
2. **Iterate over the input strings:** Go through each string in the provided list of strings.
3. **Sort each string:** For each string, sort its characters alphabetically to determine its anagram group. This sorted string acts as a key.
4. **Group anagrams:** Append the original string to the list in the defaultdict corresponding to the sorted string key.
5. **Return the grouped anagrams:** Convert the values in the defaultdict (each of which is a list of anagrams) into a list and return it.

## Code Implementation

In [13]:
from collections import defaultdict

class Solution2v1:
    def groupAnagrams(self, strs):
        anagrams = defaultdict(list)  # Initializes a defaultdict to hold lists of anagrams
        
        for word in strs:  # Iterates over each string in the input list
            sorted_word = ''.join(sorted(word))  # Sorts the string and joins it back
            anagrams[sorted_word].append(word)  # Appends the original string to the list for that sorted key
        
        return list(anagrams.values())  # Returns all the lists of anagrams


### Testing

In [14]:
test_solution(Solution2v1) 

Test Case 1:
Input: ['eat', 'tea', 'tan', 'ate', 'nat', 'bat']
Expected Output: [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
Result: [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
Test Case Passed!

Test Case 2:
Input: ['']
Expected Output: [['']]
Result: [['']]
Test Case Passed!

Test Case 3:
Input: ['a']
Expected Output: [['a']]
Result: [['a']]
Test Case Passed!

All Test Cases Passed!


## Complexity Analysis
Same as previous approach.
- ### Time Complexity: $O(nk \log{k})$ 
- ### Space Complexity: $O(n \cdot k)$ 
***

# Final Conclusion
The "Group Anagrams" problem is a classic example of categorization based on string properties, where the challenge is to group strings that are anagrams of each other. An anagram involves rearranging the letters of a word or phrase to form another, using all the original letters exactly once.

## Approach Overview

### Approach 1: Character Count
This method involves creating a character frequency count for each string, using this count as a unique identifier to group anagrams. It's efficient because it directly compares the composition of the strings without altering their content.

**Key Characteristics:** Utilizes character count as a direct comparison metric, avoids sorting.  
**Complexity:** Time complexity is O(NK), and space complexity is O(NK), making it highly efficient, especially for strings of moderate length.

### Approach 2: Categorize by Sorted String
Approach 2 sorts each string and uses the sorted string as a key in a defaultdict to group anagrams. It's straightforward and intuitive, relying on the fact that anagrams will always sort into the same sequence of characters.

**Key Characteristics:** Leverages sorting for easy group identification, intuitive and simple to implement.  
**Complexity:** With a time complexity of O(NKlogK) and space complexity of O(NK), it's slightly less efficient than Approach 1 due to the sorting operation.

### Approach 2.1: Sorting and Hashing (A slight variation of Approach 2)
Approach 2.1 is essentially the same as Approach 2, categorizing by sorted strings. The slight variation might be in implementation details, but the core idea and the algorithms' complexities remain the same.

**Key Characteristics:** Identical in logic to Approach 2, variations may lie in code semantics or structure.  
**Complexity:** Shares the same complexity analysis as Approach 2 - O(NKlogK) for time and O(NK) for space.

## Comparison and Efficiency
- Approach 1 is generally more efficient for strings of moderate length due to its linear complexity with respect to the length of the strings. It avoids the overhead of sorting, making it faster for large datasets.  
- Approaches 2 and 2.1 are more intuitive and straightforward, making the solution easier to understand and implement at the cost of slightly higher time complexity due to sorting. These approaches are particularly effective when the maximum string length (K) is relatively small, as the sorting overhead becomes negligible.

## Most Efficient Approach
For large datasets with relatively short strings, Approaches 2 and 2.1 offer a good balance of efficiency and simplicity.  
When dealing with longer strings or aiming for maximum performance, Approach 1 (Character Count) is more efficient due to its lower time complexity and direct approach to comparing string compositions.  
Both sorting-based approaches (2 and 2.1) and the character count approach (1) provide valuable techniques for solving problems involving string manipulation and categorization, with the choice of approach depending on specific problem constraints and performance requirements.
