-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix computing the attraction values #5
Merged
vikramalva
merged 70 commits into
proteinevolution:master
from
MartinGuehmann:FixComputingTheAttractionValues
Dec 9, 2020
Merged
Fix computing the attraction values #5
vikramalva
merged 70 commits into
proteinevolution:master
from
MartinGuehmann:FixComputingTheAttractionValues
Dec 9, 2020
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
HashMap requires a key and a value, both must be Objects, that means both are pointers in the HasMap in require 8 bytes each in 64 bit Java. Additionally, comes the memory for the Objects, if we use the same object for key and value, we can save that memory. To achieve that we need be able to use MinimalHsp as key in a HashMap, since we only want to use query and hit of MinimalHsp, the overriden methods hashCode and equals should only depend on those. And query and hit should be final so that they cannot be changed once MinimalHsp is in a HashMap, since this would screw up the HashMap.
In particular, declare local variables as close to where they are used. Especially keep them in a local scope.
… key and value in the HashTable to save memory
…n the hHashMaps in BlastVersion2.gethits
…n the HashMaps in FileHandling2.blast
…trings Strings need a lot of memory for representing two numbers seperated by an underscore. However, the value for the key is already contained in the MinimalAttractionValue object itself. To use an MinimalAttractionValue as key to itsself, it quals and hashCode function must depend on its values of query and hit. Since the att field is supposed the value part in the HashTable, this is ignored by quals and hashCode. This is a bit wired, but HashMap does not allow primitive types then a long as key would be the choice or a pair of two ints, if Java would allow to pass the object itself than a pointer onto it.
…n unused parameter from a function
…a String to save memory
…mory Internally the HashSet also uses a HashMap, which is filled with a pointer to a static dummy object, so we save memory on the value object, but not on the pointer itself, which is not a smart design.
Tempory variables should not be members of a class, especially if they are only used locally
…o set the members on construction
…f it contains a gap String.replaceAll can be implemented in a way that it returns a new String even so the original String does not contain the gap character. This wastes time and memory for allocating the new String.
…ction.java to reduce meomory usage However, since hashkeys[i] = i, this looks to be superflous
This wasn't really a map and just used memory without need. And made the code harder to read.
… it is used as such anyway This simplifies the code and saves memory for the HashMap and the wrappers it requires for primitives. In fact the Integer objects were basically used as indeces.
…mplexity This avoids adding dummy objects or fields. In principle, this could reduce memory needs, however HashSet uses internally a HashMap and uses a static dummy Object for filling the value part. That is a not very nice implementation. However, this is now out of sight of the programmer so that other code issues get clearer.
…ith a captial "F" in line with all the other menu items
…uster.java Don't use "booleanVar == false" use instead "!booleanVar"
… name better what it does
…nterests on ConvexClustering In the worst case, each node has an attraction value to every other node. That are O(N²) attraction avlues, if N is the number of nodes (aka sequences). The old version looked for each node on all the connections instead of just the connections of that particular node, which needed thus O(N²) loop iterations. With the new implementation it just needs in the inner loop O(N) iteration. Which improves the overall algorithm from O(N⁴) to O(N³). This is a big improvement in speed. However, it cost O(N²) extra memory, which was however transiently needed to load the data.
This is not only a useful feature for checking that the cluster algorithm produces the same output after modification, for instance adding multi threading, but also useful for the user.
This helps to check whether changes, such as adding multithreading to the clustering code, indeed speed it up.
…elCasing, more concrete names
…s and premature declaration
…rDetectionResults
…sterDetectionResults
…dex shown in the cluster result window
…lue and ClusterMethods.computeComplexAttractionValue
… and computeComplexAttractionValue This value is provide by the ClusterData object already.
…king the copies more similar
This way it is easier to maintain
…_values The attraction values for the same edges seem to be supposed to be avaraged. However, it was something else then avaraging. If there was only an edge between node A and B but not between B and A, then the attraction value would be only half the size if it where. In fact, it is questionable whether the attraction values should be treated differently if they come from two different HSPs, then if they came from the same.
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix averaging the attraction values in ClusterData.compute_attraction_values
The attraction values for the same edges seem to be supposed to be avaraged.
However, it was something else then avaraging.
If there was only an edge between node A and B but not between B and A, then
the attraction value would be only half the size if it where.
In fact, it is questionable whether the attraction values should be treated differently
if they come from two different HSPs, then if they came from the same.
Obviously, this changes the output of CLANS, however it seems only be supple. If accepted, this pull request will also merge #1, #2, and #4.