Request for comments: python: a new plugin to use arbitrary Python as a data source #1337

Merged
merged 3 commits into from Aug 6, 2012

4 participants

@rjw57

Edit: This is now a candidate for merging.

This pull request is a RFC about whether this feature would be useful. It is not yet ready for merging. (Or, perhaps, it is but it needs a lot of work before it should be built by default. The current issues are noted in the README.md file.) I've posted to the mailing list about this as well since I'm not sure which is the best place to discuss feature requests like this.

This plugin allows you to write data sources in the Python programming language. This is useful if you want to rapidly prototype a plugin, perform some custom manipulation on data or if you want to bind mapnik to a data source which is most conveniently accessed through Python. Finally it is of utility if you have theoretically 'infinite' data sources which can be rendered in finite areas. The README.md file associated with this patch has the example of a data source which consists of infinite concentric circles centred on a point.

Mapnik already has excellent Python bindings but they only directly support calling into mapnik from Python. This forces mapnik and its input plugins to be the lowest layer of the stack. The role of this plugin is to allow mapnik to call into Python itself. This allows mapnik to sit as rendering middleware between a custom Python frontend and a custom Python datasource. This increases the utility of mapnik as a component in a larger system.

Please see the README.md file for more information.

@lightmare

Interesting idea. Certainly deserves the request number it got ;-)

@springmeyer
Mapnik member

Impressive work. I've also played around with calling into Javascript from mapnik using node.js (mapnik/node-mapnik#28), but ultimately the benefit (rapid prototyping) did not outweigh the risks (immediate crashes if used concurrently). However my implementation was much more basic than what you have come up with!

I wonder if you could, upon datasource initialization, create a pool of python processes, to work around the potential gil bottleneck?

Also, can you describe more what you think is causing the deadlock when loading XML? Is lxml perhaps being used in your python program in this case, or any other is there any other python app in the same running process that relies on certain threading assumptions, like mod_wsgi?

@rjw57

I'm not sure what's causing the deadlock as I'm not making use of XML loading currently and so haven't looked. Assuming there's not likely to be an immediate NACK to it, I'm willing to do the work to turn this into a 'proper' implementation. My gut feeling is that I'm probably double-acquiring the GIL at some point. The deadlock happens with a trivial test program and is almost certainly the fault of the somewhat 'sprinkle magic until the prototype worked' approach to acquiring the GIL I went for :).

Creating a Python process pool is certainly possible although the 'right' solution would be to find a way to do that on the Python side I imagine. That way, at least, the interface is that of least surprise to the programmer. This could be done by making the interface on the Python side more asynchronous (allowing, e.g, concurrent.future process pools) rather than staying as faithful as possible to the C++ side. This would make 'naive' use of the plugin crash-safe but slow when used concurrently and multi-process aware use crash-safe and fast.

Since there appears to be some appetite for this, I'll work on another rev of the plugin and update the pull request with my progress.

@artemp
Mapnik member

@rjw57 - really cool! would be great to have this functionality

@rjw57

OK. I've tidied up the patch, fixed the deadlock and added some rendering tests for the plugin. I've also added the functionality to pass (string) parameters into the Python datasource. I think this is now at a point where it can be considered for merging.

Edit: I've rebased the commit for merging. Development can be seen in https://github.com/rjw57/mapnik/tree/rjw57-python-plugin-devel

Edit 2: Fixed a stupid path bug with the tests. Any further development will now take place as separate commits.

@springmeyer springmeyer and 1 other commented on an outdated diff Jul 31, 2012
plugins/input/python/Makefile
@@ -0,0 +1,34 @@
+CXX = g++
+
+CXXFLAGS = $(shell mapnik-config --cflags) -I/usr/include/python2.7 -fPIC
@springmeyer
Mapnik member

Make sense to either remove this Makefile or make generic the python version include?

@rjw57
rjw57 added a note Jul 31, 2012

OK, I'll remove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@rjw57 rjw57 python: a new plugin to use arbitrary Python as a data source
This plugin allows you to write data sources in the Python programming language.
This is useful if you want to rapidly prototype a plugin, perform some custom
manipulation on data or if you want to bind mapnik to a datasource which is most
conveniently accessed through Python.

The plugin may be used from the existing mapnik Python bindings or it can embed
the Python interpreter directly allowing it to be used from C++, XML or even
JavaScript.

Mapnik already has excellent Python bindings but they only directly support
calling *into* mapnik *from* Python. This forces mapnik and its input plugins to
be the lowest layer of the stack. The role of this plugin is to allow mapnik to
call *into* Python itself. This allows mapnik to sit as rendering middleware
between a custom Python frontend and a custom Python datasource. This increases
the utility of mapnik as a component in a larger system.

There already exists MemoryDatasource which can be used to dynamically create
geometry in Python. It suffers from the problem that it does not allow
generating only the geometry which is seen by a particular query. Similarly the
entire geometry must exist in memory before rendering can progress. By using a
custom iterator object or by using generator expressions this plugin allows
geometry to be created on demand and to be destroyed after use. This can have a
great impact on memory efficiency. Since geometry is generated on-demand as
rendering progresses there can be arbitrarily complex 'cleverness' optimising
the geometry generated for a particular query. Obvious examples of this would
be generating only geometry within the query bounding box and generating
geometry with an appropriate level of detail for the output resolution.
156a759
@springmeyer springmeyer commented on the diff Jul 31, 2012
bindings/python/mapnik/__init__.py
+
+ For example. One might have a features() method in a derived class like the following:
+
+ def features(self, query):
+ # ... create WKB features feat1 and feat2
+
+ return mapnik.PythonDatasource.wkb_features(
+ keys = ( 'name', 'author' ),
+ features = [
+ (feat1, { 'name': 'feat1', 'author': 'alice' }),
+ (feat2, { 'name': 'feat2', 'author': 'bob' }),
+ ]
+ )
+
+ """
+ ctx = Context()
@springmeyer
Mapnik member

Seems like the Context should be a member of the PythonDatasource?. The shared pointer under the hood I guess should handle this automatically, but why not?

@rjw57
rjw57 added a note Jul 31, 2012

As I understand it, the Context is the place where the knowledge about what fields/labels will be present in the features is vested. wkb_features() is merely a convenience function to collect that information into a single function call when returning the features. You could make your own derived class which keeps the Context inside itself if you know ahead of time what the field names will be.

If the Context were placed into the PythonDatasource class itself, then you'd either have to pre-declare the fields you'll be using elsewhere when it is initialised or this convenience function would have to override you. This can of course be done but I thought collecting them together like this allows naive code to work without stopping someone doing it 'properly'.

I suppose the rationale for this is that the PythonDatasource class and wkb_features classmethod are just there to make it quick and 'Pythonic' to generate a Datasource. You could implement your own if you 'know better'.

@springmeyer
Mapnik member

okay, sounds reasonable.

@rjw57
rjw57 added a note Jul 31, 2012

I've just had a quick look at some other plugins and, for example, the PostGIS plugin also creates one Context for each call to features() like the PythonDatasource.wkb_features() implementation does.

Edit: /me stops wittering now it's deemed reasonable :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@artemp artemp was assigned Aug 3, 2012
@artemp artemp merged commit 189322e into mapnik:master Aug 6, 2012
@springmeyer
Mapnik member

@rjw57 - after the merge I'm seeing one test failure on os x 10.7. Perhaps the expected test image just needs updated?

http://cl.ly/image/2N3N1O2J1E1F (actual is the image from /tmp)

======================================================================
FAIL: python_tests.python_plugin_test.test_python_circle_rendering
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/nose-1.1.2-py2.7.egg/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/dane/projects/mapnik6/tests/python_tests/python_plugin_test.py", line 154, in test_python_circle_rendering
    'failed comparing actual (%s) and expected (%s)' % (actual,'tests/python_tests/'+ expected))
  File "/Library/Python/2.7/site-packages/nose-1.1.2-py2.7.egg/nose/tools.py", line 31, in eq_
    assert a == b, msg or "%r != %r" % (a, b)
AssertionError: failed comparing actual (/tmp/mapnik-python-circle-render1.png) and expected (tests/python_tests/images/support/mapnik-python-circle-render1.png)
@springmeyer
Mapnik member

@rjw57 - it would be great to avoid the test dependency on Shapely, simply to ensure the tests are easy to run on all systems. Mapnik itself has no dependency on GEOS (except for a plugin we plan to remove). Would you be amenable to re-writing the test to use just Mapnik?

Obviously Shapely would be good to use in real-world example of plugin usage, and to that goal it would be great if you would add the plugin to https://github.com/mapnik/mapnik/wiki/PluginArchitecture and create a wiki page at https://github.com/mapnik/mapnik/wiki/Python-Plugin.

@rjw57

@springmeyer I'll address these things over the weekend. Thanks.

@springmeyer
Mapnik member

bump @rjw57 - also please see: #1407

@rjw57

See pull request 1414 for removal of shapely dependency. I opened a new request because this one has already been merged.

@rjw57

Wiki updated.

@ghost

Sorry, I'm new to all this and so this might not be the place to comment on this because this is CLOSED, but I've got a fail in the tests when installing this on CentOS 6 using nose 1.2.1 and Mapnik 2.1.0:

======================================================================
FAIL: python_tests.python_plugin_test.test_python_point_rendering
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/root/mapnik-v2.1.0/tests/python_tests/python_plugin_test.py", line 139, in test_python_point_rendering
    'failed comparing actual (%s) and expected (%s)' % (actual,'tests/python_tests/'+ expected))
AssertionError: failed comparing actual (/tmp/mapnik-python-point-render1.png) and expected (tests/python_tests/images/support/mapnik-python-point-render1.png)

----------------------------------------------------------------------

Links to mapnik-python-point-render1.png image:

expected (from tests/python_tests/images/support)
actual (from /tmp)

Thank you!

@springmeyer
Mapnik member

Hi @jargylplath - those images looks basically the same so you are likely seeing a non-critical error. Are you trying to use the plugin and having trouble?

@ghost

Nope. Not even using the plugin, just saw the error and wanted to report it so that either the test or the code (depending on who is the culprit) could be revised in a future release. For my purposes, Mapnik continues to perform splendidly. ^_^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment