Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LeetCode] 839. Similar String Groups #839

Open
grandyang opened this issue May 30, 2019 · 0 comments
Open

[LeetCode] 839. Similar String Groups #839

grandyang opened this issue May 30, 2019 · 0 comments

Comments

@grandyang
Copy link
Owner

grandyang commented May 30, 2019

Two strings X and Y are similar if we can swap two letters (in different positions) of X, so that it equals Y.

For example, "tars" and "rats" are similar (swapping at positions 0 and 2), and "rats" and "arts" are similar, but "star" is not similar to "tars""rats", or "arts".

Together, these form two connected groups by similarity: {"tars", "rats", "arts"} and {"star"}.  Notice that "tars" and "arts" are in the same group even though they are not similar.  Formally, each group is such that a word is in the group if and only if it is similar to at least one other word in the group.

We are given a list A of strings.  Every string in A is an anagram of every other string in A.  How many groups are there?

Example 1:

Input: ["tars","rats","arts","star"]
Output: 2

Note:

  1. A.length <= 2000
  2. A[i].length <= 1000
  3. A.length * A[i].length <= 20000
  4. All words in A consist of lowercase letters only.
  5. All words in A have the same length and are anagrams of each other.
  6. The judging time limit has been increased for this question.

这道题定义了字符串之间的一种相似关系,说是对于字符串X和Y,交换X中两个不同位置上的字符,若可以得到Y的话,就说明X和Y是相似的。现在给了我们一个字符串数组,要将相似的字符串放到一个群组里,这里同一个群组里的字符串不必任意两个都相似,而是只要能通过某些结点最终连着就行了,有点像连通图的感觉,将所有连通的结点算作一个群组,问整个数组可以分为多少个群组。由于这道题的本质就是求连通图求群组个数,既然是图,考察的就是遍历啦,就有 DFS 和 BFS 的解法。先来看 DFS 的解法,虽说本质是图的问题,但并不是真正的图,没有邻接链表啥的,这里判断两个结点是否相连其实就是判断是否相似。所以可以写一个判断是否相似的子函数,实现起来也非常的简单,只要按位置对比字符,若不相等则 diff 自增1,若 diff 大于2了直接返回 false,因为只有 diff 正好等于2或者0的时候才相似。题目中说了字符串之间都是异构词,说明字符的种类个数都一样,只是顺序不同,就不可能出现奇数的 diff,而两个字符串完全相等时也是满足要求的,是相似的。下面来进行 DFS 遍历,用一个 HashSet 来记录遍历过的字符串,对于遍历到的字符串,若已经在 HashSet 中存在了,直接跳过,否则结果 res 自增1,并调用递归函数。这里递归函数的作用是找出所有相似的字符串,首先还是判断当前字符串 str 是否访问过,是的话直接返回,否则加入 HashSet 中。然后再遍历一遍原字符串数组,每一个遍历到的字符串 word 都和 str 检测是否相似,相似的话就对这个 word 调用递归函数,这样就可以找出所有相似的字符串啦,参见代码如下:
解法一:

class Solution {
public:
    int numSimilarGroups(vector<string>& A) {
        int res = 0, n = A.size();
        unordered_set<string> visited;
        for (string str : A) {
			if (visited.count(str)) continue;
			++res;
			helper(A, str, visited);
		}
		return res;
   	}
   	void helper(vector<string>& A, string& str, unordered_set<string>& visited) {
   		if (visited.count(str)) return;
   		visited.insert(str);
   		for (string word : A) {
   			if (isSimilar(word, str)) {
   				helper(A, word, visited);
   			}
   		}
   	}
   	bool isSimilar(string& str1, string& str2) {
   		for (int i = 0, cnt = 0; i < str1.size(); ++i) {
   			if (str1[i] == str2[i]) continue;
   			if (++cnt > 2) return false;
   		}
   		return true;
   	}
};

我们也可以使用 BFS 遍历来做,用一个 bool 型数组来标记访问过的单词,同时用队列 queue 来辅助计算。遍历所有的单词,假如已经访问过了,则直接跳过,否则就要标记为 true,然后结果 res 自增1,这里跟上面 DFS 的解法原理一样,要一次找完和当前结点相连的所有结点,只不过这里用了迭代的 BFS 的写法。先将当前字符串加入队列 queue 中,然后进行 while 循环,取出队首字符串,再遍历一遍所有字符串,遇到访问过的就跳过,然后统计每个字符串和队首字符串之间的不同字符个数,假如最终 diff 为0的话,说明是一样的,此时不加入队列,但是要标记这个字符串为 true;若最终 diff 为2,说明是相似的,除了要标记字符串为 true,还要将其加入队列进行下一轮查找,参见代码如下:
解法二:

class Solution {
public:
    int numSimilarGroups(vector<string>& A) {
        int res = 0, n = A.size();
        vector<bool> visited(n);
        queue<string> q;
        for (int i = 0; i < n; ++i) {
        	if (visited[i]) continue;
        	visited[i] = true;
        	++res;
        	q.push(A[i]);
        	while (!q.empty()) {
        		string t = q.front(); q.pop();
        		for (int j = 0; j < n; ++j) {
        			if (visited[j]) continue;
        			int diff = 0;
        			for (int k = 0; k < A[j].size(); ++k) {
        				if (t[k] == A[j][k]) continue;
        				if (++diff > 2) break;
        			}
                    if (diff == 0) visited[j] = true;
        			if (diff == 2) {
        				visited[j] = true;
        				q.push(A[j]);
        			}
        		}
        	}
        }
        return res;
    }
};

对于这种群组归类问题,很适合使用联合查找 Union Find 来做,LeetCode 中也有其他用到这个思路的题目,比如 Friend CirclesAccounts MergeRedundant Connection IIRedundant ConnectionNumber of Islands IIGraph Valid Tree,和 Number of Connected Components in an Undirected Graph。都是要用一个 root 数组,每个点开始初始化为不同的值,如果两个点属于相同的组,就将其中一个点的 root 值赋值为另一个点的位置,这样只要是相同组里的两点,通过 getRoot 函数得到相同的值。所以这里对于每个结点 A[i],都遍历前面所有结点 A[j],假如二者不相似,直接跳过;否则将 A[j] 结点的 root 值更新为i,这样所有相连的结点的 root 值就相同了,一个群组中只有一个结点的 root 值会保留为其的初始值,所以最后只要统计到底还有多少个结点的 root 值还是初始值,就知道有多少个群组了,参见代码如下:
解法三:

class Solution {
public:
    int numSimilarGroups(vector<string>& A) {
        int res = 0, n = A.size();
        vector<int> root(n);
        for (int i = 0; i < n; ++i) root[i] = i;
        for (int i = 1; i < n; ++i) {
            for (int j = 0; j < i; ++j) {
                if (!isSimilar(A[i], A[j])) continue;
                root[getRoot(root, j)] = i;
            }
        }
        for (int i = 0; i < n; ++i) {
        	if (root[i] == i) ++res;
        }
        return res;
    }
    int getRoot(vector<int>& root, int i) {
        return (root[i] == i) ? i : getRoot(root, root[i]);
    }
    bool isSimilar(string& str1, string& str2) {
   		for (int i = 0, cnt = 0; i < str1.size(); ++i) {
   			if (str1[i] == str2[i]) continue;
   			if (++cnt > 2) return false;
   		}
   		return true;
   	}
};

Github 同步地址:

#839

类似题目:

Friend Circles

Accounts Merge

Redundant Connection II

Redundant Connection

Number of Islands II

Graph Valid Tree

Number of Connected Components in an Undirected Graph

参考资料:

https://leetcode.com/problems/similar-string-groups/

https://leetcode.com/problems/similar-string-groups/discuss/200934/My-Union-Find-Java-Solution

https://leetcode.com/problems/similar-string-groups/discuss/282212/Java-two-solutions-BFS-and-Union-Find

https://leetcode.com/problems/similar-string-groups/discuss/296251/DFS-with-Explanation-(Also-Highlighting-Misleading-Strategy)

LeetCode All in One 题目讲解汇总(持续更新中...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant