Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use convex clustering without the GUI #6

Commits on Oct 7, 2020

  1. Configuration menu
    Copy the full SHA
    07d9f14 View commit details
    Browse the repository at this point in the history
  2. Prepare MinimalHsp so that it can be used as key in a HashMap

    HashMap requires a key and a value, both must be Objects, that means both
    are pointers in the HasMap in require 8 bytes each in 64 bit Java.
    
    Additionally, comes the memory for the Objects, if we use the same object for
    key and value, we can save that memory.
    
    To achieve that we need be able to use MinimalHsp as key in a HashMap, since
    we only want to use query and hit of MinimalHsp, the overriden methods
    hashCode and equals should only depend on those.
    And query and hit should be final so that they cannot be changed once 
    MinimalHsp is in a HashMap, since this would screw up the HashMap.
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    08694fe View commit details
    Browse the repository at this point in the history
  3. Cleanup places for better HasMap usage, before replace

    In particular, declare local variables as close to where they are used.
    Especially keep them in a local scope.
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    be2e352 View commit details
    Browse the repository at this point in the history
  4. Make ClusterDataLoadHelper.parse_hsp_block use the same MinimalHsp as…

    … key and value in the HashTable to save memory
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    3f3485c View commit details
    Browse the repository at this point in the history
  5. Reduce memory by using the same MinimalHsp object for key and value i…

    …n the hHashMaps in BlastVersion2.gethits
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    ce0fbb4 View commit details
    Browse the repository at this point in the history
  6. Reduce memory by using the same MinimalHsp object for key and value i…

    …n the HashMaps in FileHandling2.blast
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    9bac3cb View commit details
    Browse the repository at this point in the history
  7. Use MinimalAttractionValue in HashMaps as keys to itself instead of S…

    …trings
    
    Strings need a lot of memory for representing two numbers seperated by an
    underscore. However, the value for the key is already contained in the
    MinimalAttractionValue object itself.
    
    To use an MinimalAttractionValue as key to itsself, it quals and hashCode
    function must depend on its values of query and hit. Since the att field is
    supposed the value part in the HashTable, this is ignored by quals and hashCode.
    
    This is a bit wired, but HashMap does not allow primitive types then a long
    as key would be the choice or a pair of two ints, if Java would allow to pass
    the object itself than a pointer onto it.
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    707d861 View commit details
    Browse the repository at this point in the history
  8. Cleanup code initialize a local HasMap as late as possible, removed a…

    …n unused parameter from a function
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    3892993 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    fc22539 View commit details
    Browse the repository at this point in the history
  10. Use HashSets instead of HashMaps in SelectedSubsetHandling to save me…

    …mory
    
    Internally the HashSet also uses a HashMap, which is filled with a pointer to
    a static dummy object, so we save memory on the value object, but not on the
    pointer itself, which is not a smart design.
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    92b77f1 View commit details
    Browse the repository at this point in the history
  11. Remove tmp members from IterationsComputerThread

    Tempory variables should not be members of a class, especially if they are only used locally
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    9942c68 View commit details
    Browse the repository at this point in the history
  12. Simplify code: Use the two argement construcor of AminoAcidSequence t…

    …o set the members on construction
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    b808005 View commit details
    Browse the repository at this point in the history
  13. In ClusterMethods.removeGapsFromSequences only replace the sequence i…

    …f it contains a gap
    
    String.replaceAll can be implemented in a way that it returns a new String even so the original 
    String does not contain the gap character. This wastes time and memory for allocating the new
    String.
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    881d405 View commit details
    Browse the repository at this point in the history
  14. Use Integer instead of String as type for the HashMaps in ClusterDete…

    …ction.java to reduce meomory usage
    
    However, since hashkeys[i] = i, this looks to be superflous
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    28cdf87 View commit details
    Browse the repository at this point in the history
  15. Remove hashkeys from ClusterDetection.java, since hashkeys[i] = i

    This wasn't really a map and just used memory without need.
    And made the code harder to read.
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    d646435 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    d6ad91d View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    71f6358 View commit details
    Browse the repository at this point in the history
  18. Turn clusterhash in ClusterDetection.multilinkage into a 2D-array, as…

    … it is used as such anyway
    
    This simplifies the code and saves memory for the HashMap and the wrappers it requires for
    primitives. In fact the Integer objects were basically used as indeces.
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    a394ea5 View commit details
    Browse the repository at this point in the history
  19. Replace HashMap by HashSet in ClusterDetectuin.java to reduce code co…

    …mplexity
    
    This avoids adding dummy objects or fields. In principle, this could reduce memory
    needs, however HashSet uses internally a HashMap and uses a static dummy Object for
    filling the value part. That is a not very nice implementation.
    
    However, this is now out of sight of the programmer so that other code issues get clearer.
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    70f12f9 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    5d82e11 View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    bf613bc View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    f313999 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    fa2059c View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    84140a6 View commit details
    Browse the repository at this point in the history
  25. Having the "Find clusters" menu item in the "Windows" menu starting w…

    …ith a captial "F" in line with all the other menu items
    MartinGuehmann committed Oct 7, 2020
    Configuration menu
    Copy the full SHA
    b336f7d View commit details
    Browse the repository at this point in the history

Commits on Oct 8, 2020

  1. Configuration menu
    Copy the full SHA
    bcd7387 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9743c8f View commit details
    Browse the repository at this point in the history
  3. Clean up white space and add camelCasing for varibables in SequenceCl…

    …uster.java
    
    Don't use "booleanVar == false" use instead "!booleanVar"
    MartinGuehmann committed Oct 8, 2020
    Configuration menu
    Copy the full SHA
    c5265ec View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    1d295c5 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    9faab95 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    5b8e815 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    cd671f4 View commit details
    Browse the repository at this point in the history
  8. Turn attvals, sigmafac, minseqnum, and seqnum into members of ConvexC…

    …lustering and give them to the constructor
    MartinGuehmann committed Oct 8, 2020
    Configuration menu
    Copy the full SHA
    2752c7f View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    c0e5574 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    4fa6e6e View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    0c648d2 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    ffa262d View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    d921188 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    4497d38 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    11f9cbc View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    346c04b View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    cc796f5 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    06ae700 View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    ade2e40 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    64b7cf9 View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    fe03af5 View commit details
    Browse the repository at this point in the history

Commits on Oct 9, 2020

  1. Loop only through the attraction values that belong to the nodes of i…

    …nterests on ConvexClustering
    
    In the worst case, each node has an attraction value to every other node. That are O(N²) attraction
    avlues, if N is the number of nodes (aka sequences).  The old version looked for each node on all the
    connections instead of just the connections of that particular node, which needed thus O(N²) loop
    iterations. With the new implementation it just needs in the inner loop O(N) iteration. Which improves
    the overall algorithm from O(N⁴) to O(N³).
    
    This is a big improvement in speed. However, it cost O(N²) extra memory, which was however transiently
    needed to load the data.
    MartinGuehmann committed Oct 9, 2020
    Configuration menu
    Copy the full SHA
    0ba71a8 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    7c086b3 View commit details
    Browse the repository at this point in the history

Commits on Oct 10, 2020

  1. Add the number of found clusters to the cluster output window title bar

    This is not only a useful feature for checking that the cluster algorithm
    produces the same output after modification, for instance adding
    multi threading, but also useful for the user.
    MartinGuehmann committed Oct 10, 2020
    Configuration menu
    Copy the full SHA
    8f718d9 View commit details
    Browse the repository at this point in the history
  2. Report to the command line how long clustering took

    This helps to check whether changes, such as adding multithreading
    to the clustering code, indeed speed it up.
    MartinGuehmann committed Oct 10, 2020
    Configuration menu
    Copy the full SHA
    47ab00e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    de6e8f1 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    225b65c View commit details
    Browse the repository at this point in the history

Commits on Oct 11, 2020

  1. Configuration menu
    Copy the full SHA
    de93573 View commit details
    Browse the repository at this point in the history
  2. Improve variable names in WindowClusterDetectionResults.java, use cam…

    …elCasing, more concrete names
    MartinGuehmann committed Oct 11, 2020
    Configuration menu
    Copy the full SHA
    d76963e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f1c00f0 View commit details
    Browse the repository at this point in the history
  4. Cleanup variable names in WindowClusterDetectionResults further: Name…

    …s and premature declaration
    MartinGuehmann committed Oct 11, 2020
    Configuration menu
    Copy the full SHA
    150477c View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    0666831 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    e096511 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    3c079c4 View commit details
    Browse the repository at this point in the history
  8. Flexibilize the naming of the new sequence groups and conserve the in…

    …dex shown in the cluster result window
    MartinGuehmann committed Oct 11, 2020
    Configuration menu
    Copy the full SHA
    b3a489a View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    e9e4b0b View commit details
    Browse the repository at this point in the history

Commits on Oct 13, 2020

  1. Remove unused parameter from ClusterMethods.computeSimpleAttractionVa…

    …lue and ClusterMethods.computeComplexAttractionValue
    MartinGuehmann committed Oct 13, 2020
    Configuration menu
    Copy the full SHA
    b1dd26b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    44dc625 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    bfa9844 View commit details
    Browse the repository at this point in the history
  4. Remove minpal from the argument lists of computeSimpleAttractionValue…

    … and computeComplexAttractionValue
    
    This value is provide by the ClusterData object already.
    MartinGuehmann committed Oct 13, 2020
    Configuration menu
    Copy the full SHA
    eed57af View commit details
    Browse the repository at this point in the history
  5. Prepare to merge duplicated code in compute_attraction_values() by ma…

    …king the copies more similar
    MartinGuehmann committed Oct 13, 2020
    Configuration menu
    Copy the full SHA
    c7dcedb View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    08e6da3 View commit details
    Browse the repository at this point in the history
  7. Merge duplicated code in ClusterData.compute_attraction_values

    This way it is easier to maintain
    MartinGuehmann committed Oct 13, 2020
    Configuration menu
    Copy the full SHA
    15189d5 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    bec46f2 View commit details
    Browse the repository at this point in the history
  9. Fix averaging the attraction values in ClusterData.compute_attraction…

    …_values
    
    The attraction values for the same edges seem to be supposed to be avaraged.
    However, it was something else then avaraging.
    
    If there was only an edge between node A and B but not between B and A, then 
    the attraction value would be only half the size if it where.
    
    In fact, it is questionable whether the attraction values should be treated differently
    if they come from two different HSPs, then if they came from the same.
    MartinGuehmann committed Oct 13, 2020
    Configuration menu
    Copy the full SHA
    5dbbb2e View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    a801c31 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    7fe6d05 View commit details
    Browse the repository at this point in the history