Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command to remove bad aliases. #3

Closed
fimad opened this issue Oct 17, 2013 · 5 comments
Closed

Command to remove bad aliases. #3

fimad opened this issue Oct 17, 2013 · 5 comments
Assignees

Comments

@fimad
Copy link
Collaborator

fimad commented Oct 17, 2013

It's possible to troll the nick clustering service into grouping all names together by creating a sequence of nicks of small edit distance. As of now the only recourse is to manually edit the persistent nick cluster file.

Example:
22:54 -!- Zhenya is now known as z
22:55 < z> nvm
22:55 @jesse almost
22:55 @jesse !alias z
22:55 < zhenya_bot> Zhenya, Zhenya_home, Zhenya_work, Zhenya, will, will_, z, zbot_will 22:55 <@jesse> oh weird 22:55 <@jesse> !alias Zhenya 22:55 < zhenya_bot> Zhenya, Zhenya_home, Zhenya_work,Zhenya, will, will_, z, zbot_will
22:55 @jesse lol
22:55 @jesse you made a z name
22:55 @jesse and it combined your group with wills
22:55 @jesse you ruined all of name clustering
22:55 @jesse hahahaha

@fimad
Copy link
Collaborator Author

fimad commented Oct 21, 2013

Changed to complete linkage clustering instead of single linkage. This should stop the above from happening, but a more general solution is still needed.

https://en.wikipedia.org/wiki/Single-linkage_clustering
https://en.wikipedia.org/wiki/Complete-linkage_clustering

@numberten
Copy link
Owner

With complete-linkage we can still have two pre-existing clusters joined together right? It just requires a stricter match between the two?

@fimad
Copy link
Collaborator Author

fimad commented Oct 21, 2013

I don't think we will see something like the above. If we have the two clusters:

a, a_, a_c
b, b_, b_c

And then the nick c gets added, I think we would be more likely to see the two existing clusters broken apart like so:

a, a_
b, b_
c, a_c, b_c

@numberten
Copy link
Owner

Seems like that fixes this issue, but opens up other problems with our clustering algorithm.

Since nick clusters represent a single identity, it doesn't really make sense for a new cluster to be formed with the pieces of others. Assuming the current clusters are correct (every nick is clustered into the cluster representing the correct identity), then all new aliases should either be grouped with a previously existing cluster, or form a group of their own.

My suggestion is that any new clustering algorithm obey this invariant:
- After clustering, every cluster group must be a superset of its previous iteration.

Assuming that any clustering algorithm has some likelihood of clustering incorrectly, I think this would allow for the most ease of use (as it would only require the rearrangement of a single nick). What we've got now could cause any number of distortions to the group clusters, per nick introduction, depending upon similarities between groups.

@ghost ghost assigned fimad Oct 25, 2013
@fimad
Copy link
Collaborator Author

fimad commented May 4, 2014

Closing this issue because the original premise is solve. Opening a new issue for improving the algorithm #22 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants