Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activation on the surface of the map #43

Closed
oliviaguest opened this issue Aug 11, 2016 · 46 comments
Closed

Activation on the surface of the map #43

oliviaguest opened this issue Aug 11, 2016 · 46 comments

Comments

@oliviaguest
Copy link
Contributor

I would like to obtain the activation of every SOM unit for a given input, i.e., not just the MAU/BMU.

I see from the comment here #39 (comment) by @ghost that obtaining the BMUs is relatively simple. This makes me assume/infer that the activation across the whole map is calculated by the codebook times the input - is this correct?

For example:

def get_surface_state(som, X):
    D = np.dot(som.codebook.reshape((som.codebook.shape[0] * som.codebook.shape[1], som.codebook.shape[2])), X.T).T
    return D

If yes, can D be used as the input to another SOM? Or would that be meaningless, in your opinion?

@oliviaguest oliviaguest changed the title Activation on the surface on the map Activation on the surface of the map Aug 11, 2016
@peterwittek
Copy link
Owner

This activation matrix is calculated explicitly only for the GPU kernel and it is not directly accessible to the user. The solution you are suggestion is flawless, perhaps it is worth merging back to the Python code, although it is bound to be sluggish for large maps and very high dimensional spaces (unless your numpy was compiled with a parallel BLAS/LAPACK).

I don't necessary see what you mean by using D as the input to another SOM. This D is basically nothing but a special Gram matrix.

@oliviaguest
Copy link
Contributor Author

I have to admit I have never heard of a Gramian matrix before, so I just looked it up on Wikipedia. I obviously don't fully understand it yet, however maybe reposing my question might help you understand my idea.

So I was wondering if I could create a 2nd SOM, which took the D (the special Gram matrix - in what way is it special by the way?) as input in order to self-organise itself. Is it a meaningful thing to attempt? This would be a hierarchical SOM, similar to, e.g., Hierarchical SOM-Based Detection of Novel Behavior for 3D Human Tracking, by German Ignacio Parisi and Stefan Wermter.

Am I making any sense? 😄

@oliviaguest
Copy link
Contributor Author

PS: If you want to merge it I can set up a pull request, which .py shall I put the function in and what name would you like it to have? 😉

@peterwittek
Copy link
Owner

I did not know about this hierarchical variant, but I think you need something different than D. So you have some n/10 nodes, where n is the number of data instances, and you initialize the map with randomly selected vectors of your data instances. Then, if I got it right, they identify the non-outliers, and either the BMU's weight vector, or a non-outlier data instance is used to construct the feature vectors described in Eqs. (2-4) -- I am not sure which. In any case, D does not seem to play a role.

The helper functions go into train.py, preferably as part of the Somoclu object. Many thanks!

@oliviaguest
Copy link
Contributor Author

OK, sorry that might have confused things more.

I am not trying to build on or replicate their work. I just wanted to use the activation on the map (as the function I will add to somoclu now does). Is your intuition that that will not work?

@peterwittek
Copy link
Owner

Let me try to understand it. You get an #rows x #columns matrix as the activation map. Then what would be the training data for the next SOM?

@oliviaguest
Copy link
Contributor Author

It would be trained on the activation for each input on the surface of the map below it.

@oliviaguest
Copy link
Contributor Author

PR for the above function: #44 (hope that is consistent with the way you have been doing things as much as possible) 😀

@peterwittek
Copy link
Owner

Thanks, I merged it.

@oliviaguest
Copy link
Contributor Author

One last silly question: When does np.argmax(map_state, axis=1) (where map_state is the output of get_surface_state) become equal to the BMUs? Sometimes never?

@peterwittek
Copy link
Owner

I am looking into now. They should always match. An obvious difference is that your get_surface_state calculates a dot product, and not a distance, so that argmax looks suspicious.

@oliviaguest
Copy link
Contributor Author

Oooh, how else do you locate the most active unit? I must have something conceptually wrong in my head.

@peterwittek
Copy link
Owner

I added a new method view_activation_map to study what could possible go wrong (commit aa4d017). As yet, I have no idea.

@peterwittek
Copy link
Owner

Okay, the BMUs are fine. If I calculate the activation map with the Euclidean distance, the argmin coincides with the BMU. I forgot why, but in the BMUs, the second index is the row.

@oliviaguest
Copy link
Contributor Author

That makes absolute sense. Thanks.

@peterwittek
Copy link
Owner

I still don't get it, though. The argmax on the dot product often gives the same value, even when the BMUs are different. I have to leave it here now, but I'll try to get back to this

@oliviaguest
Copy link
Contributor Author

Yes, that bit doesn't make sense to me either. Have you managed to get that on a map on your end too?

@peterwittek
Copy link
Owner

I keep forgetting that the codebook entries are initialized randomly between zero and one. This is fine as long as your data is also such. I played with random data initialized somewhere in [-1,1]^2, so the map itself was garbage, the activation map should have been my least problem. So once I work with data in [0,1]^n, the map is fine.

The next issue is that the dot product and the Euclidean distance should generate the same topology. Actually, this is only true if the vectors are normalized (both codebook and data vectors), as I myself pointed out in issue #30 and then duly forgot.

If you factor in these two observations, I think the results of the activation map should be fine. Without normalization, you still have many identical values for the argmax, but otherwise this bit of code generates nice maps and matching activation maps (only one point is labelled to avoid mess):

import somoclu
import numpy as np

c1 = np.random.rand(50, 2)/5
c2 = (0.2, 0.5) + np.random.rand(50, 2)/5
c3 = (0.4, 0.1) + np.random.rand(50, 2)/5
data = np.float32(np.concatenate((c1, c2, c3)))
colors = ["red"] * 50
colors.extend(["green"] * 50)
colors.extend(["blue"] * 50)
n_rows, n_columns = 30, 50
som = somoclu.Somoclu(n_columns, n_rows, data=data, maptype="toroid",
                      gridtype="rectangular")
som.train(epochs=10)
labels = ["" for _ in range(150)]
labels[0] = 0
som.view_umatrix(bestmatches=True, bestmatchcolors=colors, labels=labels)
map_state = som.get_surface_state()
som.view_activation_map(map_state, 0, labels=labels)

@oliviaguest
Copy link
Contributor Author

Thanks so much. 😄

@peterwittek
Copy link
Owner

Honestly, I quite like this activation map. I just made the plotting a bit smarter, so you can pass an arbitrary vector and see the activation (commit b718f01). For instance, you can do this:

new_vector = np.array([0.2, 0.3])
som.view_activation_map(data_vector=new_vector)

If you have time, it might be worth changing the calculation of the activation to the Euclidean distance.

@oliviaguest
Copy link
Contributor Author

Yes, that's great. I would really like to explore this more, I am hitting up against a deadline (tomorrow and next week) but will when I have time.

@oliviaguest
Copy link
Contributor Author

Done! See: #46
This also fixes everything with respect to my initial question!

@peterwittek
Copy link
Owner

Many thanks. I pushed commit debac27 just now to achieve the same thing for new vectors when viewing the activation map. Well, I still did not catch your original question, but if it fixes things for you, maybe we can close this.

@oliviaguest
Copy link
Contributor Author

Yeah sure - the reason I/we (on my project) came up with the idea for the surface map will become clear hopefully when I can more freely discuss the model I am working on. But I do agree with you that it is nonetheless useful more generally too.

When the time comes how would you like me to cite you/Somoclu? Tangentially, have you heard of CITATION files?

@peterwittek
Copy link
Owner

Thanks for asking, this is a bit of a sad story. The paper on Somoclu was accepted for publication ages ago by the Journal of Statistical Software, after a two-year review process. As of today, the editor still refuses to tell anything about the publication date. This is one reason why there is no CITATION file, the other being my lack of awareness.

The manuscript is on arXiv, so as for now, I would like to ask you to cite that version, as referenced in the Readme:

Peter Wittek, Shi Chao Gao, Ik Soo Lim, Li Zhao (2015). Somoclu: An Efficient Parallel Library for Self-Organizing Maps. arXiv:1305.1422.

Then Depsy theoretically scans manuscripts for links to PyPI or CRAN packages, so a link to Somoclu's PyPI page would also be welcome. Unfortunately, Depsy is not doing a great job, at least not yet.

You might want to consider making your result available as a notebook or as a repository. I am ardent supporter of open science, and I believe that making the results of a paper easily reproducible is hugely important. Well, anyhow, no pressure, and thanks again for asking.

@peterwittek
Copy link
Owner

To be on the safe side, I added the CITATION file. Thanks for the hint!

@oliviaguest
Copy link
Contributor Author

That's so sad, can't you un-submit it and send it to a more welcoming journal - perhaps?

@oliviaguest
Copy link
Contributor Author

Oh, and with respect to my results, the github repo I have written will be made available as will a pre-print and an OSF repo.

@peterwittek
Copy link
Owner

JSS looked like a nice open access journal... Awesome about the repo! I wish everybody else was doing it.

@oliviaguest
Copy link
Contributor Author

oliviaguest commented Aug 30, 2016

I have been using seaborn a lot. And I wonder if this is in any way as useful to you as it is to me. I have been visualising the surface states using (dis)similalrity matrices:

import seaborn as sns
import matplotlib.pylab as plt

def create_similarity_matrix(X, labels, save_as):

    # Calculate the pairwise correlations as a metric for similarity
    corrmat = 1-pairwise_distances(X, metric="correlation")

    # Set up the matplotlib figure
    f, ax = plt.subplots(figsize=(12,9))

    # Y axis has inverted labels (seaborn default, no idea why)
    yticklabels = np.atleast_2d(labels)
    yticklabels = np.fliplr(yticklabels)[0]

    # Draw the heatmap using seaborn
    sns.heatmap(corrmat, vmax=1, vmin=-1, square=True,
    xticklabels = labels, yticklabels = labels, cmap = "RdBu_r", center = 0)
    f.tight_layout()

    # This sets the ticks to a readable angle
    plt.yticks(rotation=0)
    plt.xticks(rotation=90)

    # This sets the labels for the two axes
    ax.set_yticklabels(yticklabels, ha = 'right', va='center', size= 8)
    ax.set_xticklabels(labels, ha = 'center', va = 'top', size= 8)

    # Save and close the figure
    plt.savefig(save_as+'_heatmap.png', bbox_inches='tight')
    plt.close()

create_similarity_matrix(som.get_surface_state(), labels, 'state')

@peterwittek
Copy link
Owner

I love seaborn -- just importing it does magic to figures. I wish it was merged with matplotlib.

I am trying to grasp what this heatmap implies. This would be another way of looking at clustering structures between data points. Is this correct? We struggle a lot with evaluating cluster consistency and shifts between the clusters over evolving maps, so it is indeed useful, thanks. Perhaps you can add it to train.py as a helper function -- not as a method of the Somoclu class -- with minor changes to allow both saving to a file and inline visualization. No pressure, though, only if you have a bit of time.

Risking of being overly pedantic, allow me one tiny comment: scoping in Python is very simplistic. It deletes variables in exactly one specific case: when it returns from a function. So the del at the end of the function definition is unnecessary, at least in principle.

@oliviaguest
Copy link
Contributor Author

Yeah, sorry about the del was copy pasting in a hurry from a non-function/script to make this into a function for you.

I will have to ask you to look at a paper using (dis)similarity matrices. It's a very Googleable term. You could look at my recent preprint which explains them here. Hopefully that will help. Then please feel free to ask me questions if it's still unclear how that could be applied to SOMs, but it should be easier to understand. Perhaps running it on the example you have in the docs for somoclu might also help clarify the usefulness or indeed lack thereof. I have no idea if you will find it as useful as I do!

Tangent, but I actually presented SOMs connected in a hierarchy last week at a conference and I used these matrices to show their representational spaces.

@oliviaguest
Copy link
Contributor Author

PS: Clusters in the heatmap would be detected by calculating the pairwise correlations between each pattern's activation on the surface of the map.

@peterwittek
Copy link
Owner

Wow, and I thought I knew SOMs well. Thanks for the paper. I've seen some work on hierarchical SOMs, but I never made the connection to deep learning. Normally I think of SOMs as the shallowest possible learners, but this hierarchy changes the game. This heatmap is handy indeed.

@oliviaguest
Copy link
Contributor Author

No, sorry, I'm confusing you because I was being brief.

So that preprint uses DSMs, but not SOMs. It uses convolutional artificial neural networks.

My hierarchical SOMs are not written up yet. Hopefully at some point.

Even though they are a little similar they are not overlapping other than by my presence. Completely different PIs. Two different projects.

@peterwittek
Copy link
Owner

I think I got it -- the question is more about the representation. It is still very interesting, though.

@oliviaguest
Copy link
Contributor Author

I really hope it's useful and clear. Otherwise please get back to me. I don't want you spending too much time on it. And thanks for nice comments.

@oliviaguest
Copy link
Contributor Author

Also I'd just like to say it's been really nice interacting with you as an open source developer who's so positive and welcoming to new ideas and contributions. Definitely get back to me if you want me to incorporate this idea into somoclu.

@oliviaguest
Copy link
Contributor Author

PS: Tidied up the code above even more. 😄

@peterwittek
Copy link
Owner

It looks great. Please add it to train.py and send a pull request when you have a moment. I want to make sure you get the credit for it. Then I'll do some minor changes to integrate it nicely with the rest of the module.

@oliviaguest
Copy link
Contributor Author

My PR is very rough, apologies: #47

@peterwittek
Copy link
Owner

Many thanks. I made some cosmetic changes and added a bit of documentation. I believe we are following the best practices of open source software development ;-)

@ghost
Copy link

ghost commented Sep 6, 2016

I did become interested in the activation patterns, I was especially interested if I would be able to find any activation pattern that has two distinct 'hotzones', this doesn't appear to be possible.

Here are a few collages of 100 activation patterns each from a 44x44 codebook I trained with a toroidal hexagonal lattice:

http://imgur.com/a/yqueJ

Brighter=greater distance between sample and that point in the codemap.

After some consideration I decided that isn't informative enough, here is the inverse (1/distance):

http://imgur.com/a/vQ5Oa

The two are not related, I used random sampling.

Code:

def collage(arr):  
    def unblockshaped(arr, h, w):
        n, nrows, ncols = arr.shape
        return (arr.reshape(h//nrows, -1, nrows, ncols).swapaxes(1,2).reshape(h, w))

    sqrt_lastdim = math.sqrt(arr.shape[0])
    sqrt_ceil = math.ceil(sqrt_lastdim)
    if not sqrt_lastdim.is_integer():
        diff = sqrt_ceil ** 2 - arr.shape[0]
        arr = np.concatenate((arr, np.zeros((diff, arr.shape[1],arr.shape[2]))), axis=0)

    grid = unblockshaped(arr, arr.shape[1]*sqrt_ceil, arr.shape[2]*sqrt_ceil)
    return grid

X = data
W = codebook.reshape((codebook.shape[0] * codebook.shape[1], codebook.shape[2]))
D = -2*np.dot(W, X.T) + (W**2).sum(1)[:, None] + (X**2).sum(1)[:, None].T
D = D.swapaxes(0,1).reshape(-1,codebook.shape[0],codebook.shape[1])

grid = collage(codebook.swapaxes(0,2).swapaxes(1,2))
plt.imshow(grid)
plt.show()

for i in range(10):
    idx = np.random.choice(D.shape[0], 100)
    grid = collage(D[idx])

    plt.close()
    plt.gcf().set_size_inches(4, 4)
    plt.gca().axes.get_xaxis().set_visible(False)
    plt.gca().axes.get_yaxis().set_visible(False)
    plt.gca().set_aspect('equal', adjustable='box')    

    plt.imshow(grid)
    plt.savefig(str(i)+'.png', bbox_inches='tight', pad_inches=0, dpi=256)

I fairly sure I confused the heck out of the x,y axis somewhere, so any non square lattice may not work.

@peterwittek
Copy link
Owner

This is interesting. I am not all that surprised that you have hot zones all over the place, since a plain, random-initialized SOM can only be expected to preserve the local topology.

Perhaps you can get fewer and more localized hot spots. Initialize the map with PCA, then train it for one epoch with a low radius and a low learning rate. This map will not look great, since in one epoch, the training phase does not have a chance to pull the points apart that the PCA collapses to the same spot. The only purpose of this is to see if only one region lights up in the activation map. If this is the case, from here it is a matter of tuning the training parameters to get what you want.

@ghost
Copy link

ghost commented Sep 6, 2016

I actually feel somewhat stupid, all I had to do was make the values negative to make the images look more intuitive, now brighter is actually better without having to use inverse:

http://imgur.com/a/rrXCu

The second album of the previous post does however show the absolute centers of influence though. So I guess inverse has it's uses nevertheless.

I'm actually not sure how to interpret these results, what significance does the activation pattern hold? Because it's of higher dimensionality than the data was to begin with - it is however very constrained, much like the pixels of images. Quite interesting, still have to think about it and how it may be of use. It might have potential when you need to project data into a higher dimensional space, although I'm not sure of how much use it would be because it adopts the same problems that images have - being spatial.

It certainly ought to be far more useful than a BMU, considering that the BMU is a tiny tiny fraction of the whole, lots of information gets discarded. I'm pretty sure the activation map each sample makes is significant when comparing samples.

@oliviaguest
Copy link
Contributor Author

oliviaguest commented Sep 6, 2016

The activation pattern allows us to calculate the similarity matrix. This is my only use really. The similarity matrix depicts a representational space that could not be captured by the BMUs since they are by definition orthogonal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants