## 721. Accounts Merge

### Description

Given a list of accounts where each element `accounts[i]` is a list of strings, where the first element `accounts[i][0]` is a name, and the rest of the elements are emails representing emails of the account.

Now, we would like to merge these accounts. Two accounts definitely belong to the same person if there is some common email to both accounts. Note that even if two accounts have the same name, they may belong to different people as people could have the same name. A person can have any number of accounts initially, but all of their accounts definitely have the same name.

After merging the accounts, return the accounts in the following format: the first element of each account is the name, and the rest of the elements are emails in sorted order. The accounts themselves can be returned in any order.

### Examples

Input: accounts = `[["John","johnsmith@mail.com","john_newyork@mail.com"],["John","johnsmith@mail.com","john00@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]
Output: [["John","john00@mail.com","john_newyork@mail.com","johnsmith@mail.com"],["Mary","mary@mail.com"],["John","johnnybravo@mail.com"]]
`

Explanation:

The first and second John's are the same person as they have the common email `johnsmith@mail.com`.
The third John and Mary are different people as none of their email addresses are used by other accounts.
We could return these lists in any order, for example the answer `[['Mary', 'mary@mail.com'], ['John', 'johnnybravo@mail.com'], 
['John', 'john00@mail.com', 'john_newyork@mail.com', 'johnsmith@mail.com']]` would still be accepted.

### Solution (Find connected components )

In this problem, we are asked to group emails in the given lists by their owners. If we take each email as a vertex, then this problem quickly becomes finding connected componnets in an undirected graph. But first, let's take a look on how to find connected components. 

**How to find connected components in a graph**

In graph theory, **connected component**, a set of vertices in an undirected graph that are linked to each other by paths. In order for a subgraph $S$, to be 'connected', there must be paths from vertex $u$ to another vertex $v$, where both $u, v$ are arbitrary vertices in $S$. That being said, we can visit entire connected component by traversing any vertex of it. In other words, we can can get one component by continously adding node that are reachable from our source node. Does this sound familiar? Yes, **DFS**. To see exactly how it worked, see the following implementation

In [3]:
import collections
import math

class Graph():
    def __init__(self):
        self.adjacency_list = collections.defaultdict(list)
    
    def add_edge(self, edges):
        for edge in edges:
            src, dest = edge[0], edge[1]
            self.adjacency_list[src].append(dest)
            self.adjacency_list[dest].append(src) 
            
    def DFS(self, component, v, visited):
        component.append(v)
        visited.add(v)
        for neighbor in self.adjacency_list[v]:
            if neighbor not in visited:
                self.DFS(component, neighbor, visited)
        return component
    
    def find_connected_component(self):
        vertices = list(self.adjacency_list.keys())
        components = []
        visited = set()
        for v in range(len(vertices)):
            if v not in visited:
                component = self.DFS([], v, visited)
                components.append(component)
        return components
    
edges = [[0, 1], [2, 1], [3, 4]]
g = Graph()
g.add_edge(edges)
components = g.find_connected_component()

print("Components of g are:")
for component in components:
    print(component)

Components of g are:
[0, 1, 2]
[3, 4]


Back to the problem, now the solution should look very trivial. But there still one thing left: how to construct the graph we desire from the input array. First, we observe the structure of the array, we can find that it's alwaays the name of owner then followed by series of emails that belong to him/her. As mentioned previously, emails are vertices in the graph. So here comes the strategy: first we intilize an dict of list called `g`(hashmap in other languages), then for each `email` in `accounts[i][1:]`, we add `email` to `g[accounts[i][1]` and `accounts[i][1]` to `g[email]`. Meanwhile, we maintain a dictionary `names` to keep track of the owners of emails, which is important for the merge operation at the end of the algorithm. For example, `g` will look like this with the example provided:

```
johnsmith@mail.com: [johnsmith@mail.com, john_newyork@mail.com, john00@mail.com]
johnnybravo@mail.com: [johnnybravo@mail.com]
mary@mail.com: [mary@mail.com]

```
`names` will look like:

```
{johnsmith@mail.com: John, john_newyork@mail.com: John, john00@mail.com: John, johnnybravo@mail.com: John, mary@mail.com: Mary}
```

Now we are able to apply DFS on the graph to obtain the connected components. Finally, we merge the components with the corresponsing owner. 

### Pseudocode

```
func DFS(g, component, v, visited):
    component.add(v)
    visited.add(v)
    for neighbor in g[v]:
        if v not in visited:
            DFS(g, component, neighbor, visited)
    return component
    
func merge_accounts(accounts):
    g = {}
    names = {}
    for acc in accounts:
        name = acc[0]
        for email in acc[1, 2...]:
            g[acc[1]].add(email)
            g[email].add(acc[1])
            names[email] = name
    visited = {}
    components = {}
    merged_accounts = {}
    for v in g:
        if v not in visited:
           component = DFS(g, [], v, visited) 
           components.addd(sorted(component))
           
    for component in components:
        merged_accounts.add([[component[0]] + component])
    
    return merged_accounts
    
```

In [17]:
def accountsMerge(accounts):
    if len(accounts) == 1:
        return accounts
    
    # Construct the graph
    g = collections.defaultdict(list)
    names = {}
    for acc in accounts:
        name = acc[0]
        for email in acc[1:]:
            g[acc[1]].append(email)
            g[email].append(acc[1])
            names[email] = name

    # Dict to store connected component of g
    components = collections.defaultdict(list)
    visited = set()

    # DFS, function to find connected component
    def DFS(v, i):
        components[i].append(v)
        visited.add(v)
        for neighbor in g[v]:
            if neighbor not in visited:
                DFS(neighbor, i)
        return component

    # Driver of DFS
    i = 0
    for u in g.keys():
        if u not in visited:
            DFS(u, i)
        i += 1

    # Merge name and emails associate with it
    merged_accounts = [[] for _ in range(len(components.keys()))]
    j = 0
    for _, comp in components.items():
        merged_accounts[j] = [names[comp[0]]] + sorted(comp)
        j += 1
    return merged_accounts

accounts = [["John","johnsmith@mail.com","john_newyork@mail.com"],
            ["John","johnsmith@mail.com","john00@mail.com"],
            ["Mary","mary@mail.com"],
            ["John","johnnybravo@mail.com"]]

merged_acc = accountsMerge(accounts)
print(merged_acc)

[['John', 'john00@mail.com', 'john_newyork@mail.com', 'johnsmith@mail.com'], ['Mary', 'mary@mail.com'], ['John', 'johnnybravo@mail.com']]


### Complexity Analysis

#### Time complexity

Suppose there are $n$ accounts, and the possible max length of one account is $k$, then DFS on the graph will take time complexity of $O(n+k)$. However, notice that when we merge accounts, sortings are called on each connected component obtained from DFS. Assume in average case, each component has size of $k$, then sorting one would take $O(klogk)$, and there are $n$ accounts in total. In the worst case, all emails would belong one person, which turns the length of component to $nk$. Put them all together, we have $T(n) = m+k+nk*lognk = O(nklognk)$. 

#### Space complexity

$O(nk)$, where $n$ is the number of accounts and $k$ is the length of connected component. 


### Solution (Union Find)