string symbol markers ("scattertext" plot) #345

Closed
ddale opened this Issue Jun 20, 2011 · 5 comments

Projects

None yet

4 participants

@ddale

Original report at SourceForge, opened Tue Jul 5 11:51:10 2005

use the sample/index number, or a tuple of strings, as a plotting symbol
in a scatter plot, with optional dot marker. text alignment options should
at least be left,bottom or left,middle (useful with dot marker) and
center-middle (no dot markers). distance between marker and text might
also be an option?

an example of this
http://www.tomasoberg.com/qspr/pca.gif

As discussed on the matplotlib-users mailing-list:
http://sourceforge.net/mailarchive/message.php?msg_id=12243982

Quote, John Hunter:
Yes, you could do this, but it would take a bit of work to get
everything right. Basically, you would like to add string symbol
markers to scatter, and have them colored with colormaps and support
variable sizeing as well, right? The right way to do this, I think,
would be to implement a TextCollection, following the examples in
collections.py. Otherwise it would be extremely slow for large
numbers of markers. This would be a useful class anyhow to support
drawing of text with shared property (eg tick labels) since text
drawing is slow and is a bottleneck in some applications.

@mdboom
Matplotlib Developers member

#347 also mentions the creation of a TextCollection.

@mdboom
Matplotlib Developers member

Pull request #400 allows for arbitrary mathtext as the marker itself. That's almost a solution to this problem but not quite. It seems the functionality of this bug is more closely related to annotate() than anything else.

@pelson
Matplotlib Developers member

I'm not sure it fully covers the original request, but the ability to make paths from text (using TextPath) and the ability to make markers for scatter with Paths, means that you can make text markers. A simple example of this:

import matplotlib.pyplot as plt
from matplotlib.text import TextPath

plt.scatter(range(3),  range(3), marker=TextPath((0, 0), 'f', size=10000), s=1000)
plt.show()

Pragmatically, there is little benefit to keeping this ticket (from 2005) hanging around, therefore I am closing.

@pelson pelson closed this Jun 30, 2012
@BrenBarn

I'd like to say that this issue isn't closed for me. The key issue is not the use of text markers per se, but the ability to specify the marker per scatter point. That is, I want to do something like this:

x = range(3)
y = range(3)
pyplot.scatter(x, y, marker=[str(a) + "," + str(b) for a, b in zip(x, y)])

To plot text "1,1" at (1,1), "1,2" at (1,2), etc. Just being able to make markers out of text isn't sufficient, because there's still no way to use multiple markers in a single scatter.

@pelson
Matplotlib Developers member

I'd like to say that this issue isn't closed for me.

Thanks for coming forward @BrenBarn. I have read and re-read the original request and it now seems to me that the functionality being requested is for a scattertext method on an axes, not for a text as the marker in a scatter plot.

From my limited understanding of the scatter function, there are optimisations which can be applied to some backends which provide a major performance improvement over the naive double for loop approach. These optimisations can boil down to the fact that some backends allow you to define a geometry once and use it in multiple locations. It is my belief that this is the reason the scatter function does not accept an array of n markers for n points, as it simply would not provide sufficient benefit over the naive approach (however <n distinct markers might). Based on this assertion, the scattertext functionality being requested would simply be a nested for loop along the lines of:

import matplotlib.pyplot as plt
import numpy 
ax = plt.axes()
xs = numpy.linspace(0, 1, 6)
ys = numpy.linspace(0, 1, 3)

ax.scatter(*numpy.meshgrid(xs, ys))
for x in xs:
    for y in ys:
       ax.text(x, y, '%s, %s' % (x, y) )

plt.show()

Given that there has been little to no progress in over 7 years, and that realistically the most likely way that a developer is going to implement such a thing is if they actually need such functionality too, I considered the issue redundant and hence closed it.

If anyone can see a performance benefit (or a benefit to users which outweighs the maintenance cost to mpl) of providing a vectorised interface (i.e. one that doesn't require naive for loops) to provide n distinct strings for n points, then I think a new ticket with a proposed interface and some implementation detail would be highly valuable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment