Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading Somoclu.get_bmus() result #145

Closed
giacomolanciano opened this issue Oct 16, 2019 · 3 comments
Closed

Misleading Somoclu.get_bmus() result #145

giacomolanciano opened this issue Oct 16, 2019 · 3 comments

Comments

@giacomolanciano
Copy link
Contributor

giacomolanciano commented Oct 16, 2019

Hi,

I'd like to point out a possible source of confusion in the implementation of Somoclu.get_bmus(). Here's an example output:

array([[3, 1],
       [3, 3],
       [3, 3],
       ...,
       [3, 0],
       [2, 0],
       [0, 1]])

The BMUs indices are returned as a 2D numpy array, each row indicating the BMU for the corresponding sample. Individual BMU indices are 1D numpy arrays, represented in a [column, row] format. Such format totally makes sense if you look at the map as a cartesian plan (I'm guessing this is your interpretation because of the naming convention used in the source code: X and Y over col and row, respectively).

Although, such design choice is not documented anywhere and, as in my case, it could be a source of problems if one uses the returned BMUs indices to access the corresponding map's cells (i.e., columns are treated as rows and vice-versa). If it was explicitly documented, one could have known that, for instance, a numpy.flip(..., axis=1) is needed to transform the result in advance.

Possibly, the method could be improved by adding an order parameter (in line with what numpy does) to let the user pick the desired format. The method's signature could be something like

def get_bmus(self, activation_map, order='F'):
    ...

with order accepting 'F' for column-major order (i.e., the current default) and 'C' for row-major order. In this way, referring to the example above, setting order='C' would produce the following output instead:

array([[1, 3],
       [3, 3],
       [3, 3],
       ...,
       [0, 3],
       [0, 2],
       [1, 0]])

Thanks for your consideration and for the effort put in developing this great package, I've found it very useful to produce the source code for a recent publication of mine.

xgdgsc added a commit that referenced this issue Oct 25, 2019
@xgdgsc
Copy link
Collaborator

xgdgsc commented Oct 25, 2019

Thanks. Could you check if it is the simple T or not T implemented in the above commit?

@giacomolanciano
Copy link
Contributor Author

Hi @xgdgsc,

thanks for following up on this. I have slightly updated the issue description to better explain what is my expected result.

In case of order='C', you should still transpose after inverting the order of X and Y, i.e.:

if order == 'F':
    return np.vstack((X, Y)).T
elif order == 'C':
    return np.vstack((Y, X)).T

For convenience, I have also implemented the fix in PR #146.

@xgdgsc
Copy link
Collaborator

xgdgsc commented Oct 27, 2019

Thanks!

@xgdgsc xgdgsc closed this as completed Oct 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants