parallelize datasource.features() query #849

Closed
artemp opened this Issue Oct 11, 2011 · 17 comments

5 participants

@artemp
Mapnik member

Rendering is serial, to ensure proper painting of features. But on machines with sufficient memory mapnik datasources could be queried in parallel before or just after rendering of the first layer begins. So instead of the normal query, render, query, render... approach we could create a pool of threads at render time, dispatch N datasource queries, and have each callback when they return a featureset. Rendering would commence in order of the layers and only pause if the next layer has not yet returned from the thread pool.

@kunitoki
Mapnik member

Would be cool if we implement some sort of paralellization fetching data. I will try to think what's the best way to accomplish this, probably a synchronized thread pool with ordered jobs.

@kunitoki
Mapnik member

if we refactor the feature_style_processor::apply_to_layer in sub chunks, like fetch_layer_data and render_layer i've got already some working code that may apply ( using http://threadpool.sourceforge.net/ which is based on boost:thread ).

@kunitoki
Mapnik member

I'v created a test bench here, and an initial version is coming along. Still it doesn't produce good results (as i'm splitting the fetch/draw in 2 different parts and wait for the fetch to be completed before rendering, even if some layers can already render while waiting for the next to fetch the remaining data).

Normal mapnik:

//-- starting rendering timer...
10.51ms (cpu 10.00ms)       | rendering style #1 for layer: 'provincia-1' and style 'isolestyle'
percent rendered: 100% - 3 rendered for 3 queried for  layer 'provincia-1' and style 'isolestyle'
10.63ms (cpu 10.00ms)       | rendering total for layer: 'provincia-1'
18.70ms (cpu 30.00ms)       | rendering style #1 for layer: 'laguna-2' and style 'lagunastyle'
percent rendered: 100% - 55 rendered for 55 queried for  layer 'laguna-2' and style 'lagunastyle'
18.82ms (cpu 30.00ms)       | rendering total for layer: 'laguna-2'
14.24ms (cpu 10.00ms)       | rendering style #1 for layer: 'isole-3' and style 'isolestyle'
percent rendered: 100% - 431 rendered for 431 queried for  layer 'isole-3' and style 'isolestyle'
14.34ms (cpu 10.00ms)       | rendering total for layer: 'isole-3'
235.71ms (cpu 230.00ms)     | rendering style #1 for layer: 'strade-4' and style 'isolestyle'
percent rendered: 100% - 19874 rendered for 19874 queried for  layer 'strade-4' and style 'isolestyle'
235.81ms (cpu 230.00ms)     | rendering total for layer: 'strade-4'
6.07ms (cpu 10.00ms)        | rendering style #1 for layer: 'ponti-5' and style 'isolestyle'
percent rendered: 100% - 499 rendered for 499 queried for  layer 'ponti-5' and style 'isolestyle'
6.14ms (cpu 10.00ms)        | rendering total for layer: 'ponti-5'
11.01ms (cpu 10.00ms)       | rendering style #1 for layer: 'pontili-6' and style 'isolestyle'
percent rendered: 100% - 969 rendered for 969 queried for  layer 'pontili-6' and style 'isolestyle'
11.09ms (cpu 10.00ms)       | rendering total for layer: 'pontili-6'
68.58ms (cpu 70.00ms)       | rendering style #1 for layer: 'pavimentazione-7' and style 'stradestyle'
percent rendered: 100% - 965 rendered for 965 queried for  layer 'pavimentazione-7' and style 'stradestyle'
68.68ms (cpu 70.00ms)       | rendering total for layer: 'pavimentazione-7'
364.47ms (cpu 370.00ms)     | total map rendering
//-- rendering timer stopped...

real    0m0.714s
user    0m0.676s
sys     0m0.037s

Thread pool fetch:

//-- starting rendering timer...
35.95ms (cpu 40.00ms)       | fetching data for layer: 'provincia-1'
38.37ms (cpu 40.00ms)       | fetching data for layer: 'laguna-2'
49.83ms (cpu 60.00ms)       | fetching data for layer: 'isole-3'
16.39ms (cpu 30.00ms)       | fetching data for layer: 'ponti-5'
32.16ms (cpu 50.00ms)       | fetching data for layer: 'pontili-6'
44.33ms (cpu 70.00ms)       | fetching data for layer: 'pavimentazione-7'
170.68ms (cpu 210.00ms)     | fetching data for layer: 'strade-4'
173.47ms (cpu 220.00ms)     | fetching all features from map
171.41ms (cpu 210.00ms)     | spent waiting threads to finish
9.14ms (cpu 10.00ms)        | rendering style #1 for layer: 'provincia-1' and style 'isolestyle'
percent rendered: 100% - 3 rendered for 3 queried for  layer 'provincia-1' and style 'isolestyle'
16.77ms (cpu 10.00ms)       | rendering style #1 for layer: 'laguna-2' and style 'lagunastyle'
percent rendered: 100% - 55 rendered for 55 queried for  layer 'laguna-2' and style 'lagunastyle'
11.74ms (cpu 20.00ms)       | rendering style #1 for layer: 'isole-3' and style 'isolestyle'
percent rendered: 100% - 431 rendered for 431 queried for  layer 'isole-3' and style 'isolestyle'
156.04ms (cpu 150.00ms)     | rendering style #1 for layer: 'strade-4' and style 'isolestyle'
percent rendered: 100% - 19874 rendered for 19874 queried for  layer 'strade-4' and style 'isolestyle'
4.11ms (cpu 0.00ms)         | rendering style #1 for layer: 'ponti-5' and style 'isolestyle'
percent rendered: 100% - 499 rendered for 499 queried for  layer 'ponti-5' and style 'isolestyle'
7.50ms (cpu 10.00ms)        | rendering style #1 for layer: 'pontili-6' and style 'isolestyle'
percent rendered: 100% - 969 rendered for 969 queried for  layer 'pontili-6' and style 'isolestyle'
66.67ms (cpu 70.00ms)       | rendering style #1 for layer: 'pavimentazione-7' and style 'stradestyle'
percent rendered: 100% - 965 rendered for 965 queried for  layer 'pavimentazione-7' and style 'stradestyle'
290.54ms (cpu 290.00ms)     | rendering features from map
463.59ms (cpu 510.00ms)     | total map rendering
//-- rendering timer stopped...

real    0m0.815s
user    0m0.721s
sys     0m0.135s

As you see the process is 100ms slower than serialized native mapnik, but sure there lot of room for improvement.
Anyway some code is there to see if we all can make this better and better. I'll create a branch where we can test and improve this.

@springmeyer
Mapnik member

wow, awesome progress! a couple notes 1) @kkaefer has also taken a look at this - would be cool for you two to compare notes, 2) a couple related ideas might be #834 and #833 - because we may be able to reduce the overhead of caching features to make things faster.

@kkaefer
Mapnik member

My branch isn't really complete or working.

@lexman

@kunitoki, i noticed that the very first query to PositGIS in a connection causes an overhead ; it could explain why your proof of concept is 100ms slower. With my machine the overhead is about 10 ms. Look at this example :

#include <iostream>
#include <string>
#include <postgresql/libpq-fe.h>
#include <ctime>
#include <stdlib.h>
#include <sys/time.h> // for gettimeofday() on unix

using namespace std;

double time_now() {
    struct timeval t;
    struct timezone tzp;
    gettimeofday(&t, &tzp);
    return t.tv_sec + t.tv_usec * 1e-6;
}

void query_time(PGconn *conn, string query) {
    PGresult *res;
    double begin = time_now();
    res = PQexec(conn, query.c_str());  
    std::cout << (time_now() - begin) * 1000.0 << " ms : " << query << "\n";
    PQclear(res);
}

PGconn* init_connection() {
    PGconn *conn;
    conn = PQconnectdb("dbname=my_database host=my_host user=my_user password=my_password");
    if (PQstatus(conn) == CONNECTION_BAD) {
            puts("We were unable to connect to the database");
            exit(0);
    }
    return conn;
}

int main() {
    PGconn *conn;

    conn = init_connection();
    query_time(conn, "SELECT PostGIS_Full_Version();"); 
    query_time(conn, "SELECT 'BOX3D(181405.0531517194 4943637.392652283,190043.1619932181 4952275.501493781)'::box3d"); 
    query_time(conn, "SELECT 'BOX3D(181405.0531517194 4943637.392652283,190043.1619932181 4952275.501493781)'::box3d"); 
    query_time(conn, "SELECT 'BOX3D(181405.0531517194 4943637.392652283,190043.1619932181 4952275.501493781)'::box3d"); 
    PQfinish(conn);

    std::cout << "New connection\n";
    conn = init_connection();
    query_time(conn, "SELECT 'BOX3D(181405.0531517194 4943637.392652283,190043.1619932181 4952275.501493781)'::box3d"); 
    query_time(conn, "SELECT 'BOX3D(181405.0531517194 4943637.392652283,190043.1619932181 4952275.501493781)'::box3d"); 
    query_time(conn, "SELECT 'BOX3D(181405.0531517194 4943637.392652283,190043.1619932181 4952275.501493781)'::box3d"); 
    query_time(conn, "SELECT 'BOX3D(181405.0531517194 4943637.392652283,190043.1619932181 4952275.501493781)'::box3d"); 
    PQfinish(conn);

}

Which produces this result :

17.94 ms : SELECT PostGIS_Full_Version();
0.699997 ms : SELECT 'BOX3D(181405.0531517194 4943637.392652283,190043.1619932181 4952275.501493781)'::box3d
0.384092 ms : SELECT 'BOX3D(181405.0531517194 4943637.392652283,190043.1619932181 4952275.501493781)'::box3d
0.359774 ms : SELECT 'BOX3D(181405.0531517194 4943637.392652283,190043.1619932181 4952275.501493781)'::box3d
New connection
9.15098 ms : SELECT 'BOX3D(181405.0531517194 4943637.392652283,190043.1619932181 4952275.501493781)'::box3d
0.484943 ms : SELECT 'BOX3D(181405.0531517194 4943637.392652283,190043.1619932181 4952275.501493781)'::box3d
0.387907 ms : SELECT 'BOX3D(181405.0531517194 4943637.392652283,190043.1619932181 4952275.501493781)'::box3d
0.431061 ms : SELECT 'BOX3D(181405.0531517194 4943637.392652283,190043.1619932181 4952275.501493781)'::box3d
@springmeyer
Mapnik member

perhaps look into using PQsetnonblocking (http://www.postgresql.org/docs/9.2/static/libpq-async.html)

@kunitoki
Mapnik member

no the overhead is due to the fact that before starting the image rendering i wait for all postgis connections to finish fetch data, but the whole concept can be taken further, cause you can start drawing if your first layer finished getting data while the next layer is still fetching: we only have to respect drawing order, so fetch in parallel and cascade signal to draw layers respecting their order... i have a proof of a concept written in plain boost::thread (no pools) somewhere but never started implementing it in mapnik

@kunitoki
Mapnik member

Here it is:

from my tests ( #849 (comment) ), most of the time is spent rendering, and there is no need in waiting that all the layers finish fetching all the data, but if layer 1 finished fetching but layer 2 is still working, then layer 1 can start rendering. obviously the opposite, if layer 1 is still fetching but layer 2 has finished and wants to draw, layer 2 will wait until layer 1 finished drawing. pretty easy with boost::thread_group too.

here are some ideas:

/**
    Compile with:
        g++ -pthread -lboost_thread-mt -o test main.cpp
 */
#include <boost/thread.hpp>
#include <boost/bind.hpp>
#include <boost/shared_ptr.hpp>
#include <boost/make_shared.hpp>

#include <iostream>

static boost::mutex io_mutex;
void write(std::string s)
{
    boost::mutex::scoped_lock sl(io_mutex);
    std::cout << s;
}

class task;
typedef boost::shared_ptr<task> task_ptr;

typedef boost::shared_ptr<boost::mutex> mutex_ptr;

class task
{
public:
    task(task_ptr prev, int id)
      : prev_(prev),
        mutex_(boost::make_shared<boost::mutex>()),
        id_(id)
    {
    }

    mutex_ptr mutex()
    {
        return mutex_;
    }

    virtual void run()
    {
        {
            boost::lock_guard<boost::mutex> ourlock(*mutex());

            std::stringstream s1;
            s1 << "chained_task(" << id_ << ") executing parallel fetch" << std::endl;
            write(s1.str());
            boost::this_thread::sleep(boost::posix_time::seconds(id_ * 3));

            if (prev_)
            {
                std::stringstream s2;
                s2 << "chained_task(" << id_ << ") waiting previous to finish drawing" << std::endl;
                write(s2.str());
                boost::lock_guard<boost::mutex> prevlock(*prev_->mutex());
            }

            std::stringstream s3;
            s3 << "chained_task(" << id_ << ") drawing" << std::endl;
            write(s3.str());
            boost::this_thread::sleep(boost::posix_time::seconds(id_));
        }

        std::stringstream s4;
        s4 << "Next chained_task(" << id_ + 1 << ") should advance now" << std::endl;
        write(s4.str());
    }

protected:
    task_ptr prev_;
    mutex_ptr mutex_;
    int id_;
};

int main (int argc, char* argv[])
{
    boost::thread_group g;

    task_ptr t1 = boost::make_shared<task>(task_ptr(), 1);
    task_ptr t2 = boost::make_shared<task>(t1, 2);
    task_ptr t3 = boost::make_shared<task>(t2, 3);

    g.create_thread(boost::bind(&task::run, t1));
    g.create_thread(boost::bind(&task::run, t2));
    g.create_thread(boost::bind(&task::run, t3));

    g.join_all();

    return 0;
}

related to this... we should give a bit more testing on all the datasources, as i've encountered some problems with multi threaded fetching of features from a sqlite database, which should be opened with the FULLMUTEX or with serialized mode enabled.

@kunitoki
Mapnik member

The task constructor take a task_ptr as first argument, which is the previous task where the next should wait.

When you see "executing parallel fetch" here every task will start this in parallel (no lock). But if there is a previous task, we are waiting for it to finish before going into "drawing" part which is the actual drawing. Nth task must wait Nth-1 task to finish drawing before continuing (unless a task has no previous, like the first one).

@lexman

Well, there is a short way, specifically for the PostGis driver : we could use directly the asynchonous postgres functions : http://www.postgresql.org/docs/9.1/static/libpq-async.html. We would make several calls to PQsendQuery to send all the querries at once, and use PQgetResult() (which blocks until results come) when fetching for the data to draw each layer.

This doesn't require threads, but can't be applied to datasources using files like SQLite, OSM nor GDAL.

@kunitoki
Mapnik member

Yes, that is a way, but i think that with the threading approach we could benefit in other datasources too (i'm using sqlite and oracle a lot for example)...

@lexman

A few thoughts about parallelism with threads...

  • how do we configure threads ? Is there a main thread pool or pools by datasource / layer ? How do we configure it ?
  • is it healthy to have a library that might be used inside threads that will create thread itself ?
  • running data source queries in parallel lead to accumulate features in buffers until it is time to render them.In the worse case, we could have to store in memroy nearly everthing we have to render. For the moment, Mapnik is optimised to draw features as they are retreived, wich saves memory. So setting Mapnik to work with parallel queries is a big change in the use of ressources.
  • the option cache-features=false saves memory but loops several times over a datasource (slower in most cases). When using parallel queries, memory will be used anyway, so cache-features=false should be ignored .
@springmeyer
Mapnik member

A few thoughts on my end:

  • I too like the idea of something generic. However, having a postgres specific implementation might be a good way to test the overall viability of this idea (which is still completely un-proven as a benefit to performance).

  • Yes, I worry about fully caching features not being viable - you'd use too much memory and then not see the idealized performance benefits. I've wondered if datasources could be extended to calculate a total estimated feature count at initialization so that whether to fully cache could be better determined on the fly in feature_style_processor.

    • In regard to the other above questions that I've not commented on directly - yeah, very good questions :)
@lexman

Hello,

we started working on parallelization in general, and in the Postgis driver specifically.

We've been working on v2.1 which is the version we use in production for the moment, but we are willing to make the effort to port it to master afterwards.
We'll give you feedback on performace improvements as soon as we can mesure it on our servers.

To give you visibility on our work, I've pushed it on our repo : https://github.com/Mappy/mapnik/tree/parallel_queries

I've stared a thread that explains the changes : Mappy#1

We'd be glad to have feedback.

@lexman

We had a progress on this subject. The short story is : with a database server large enought we save 25% of rendering time and still endure the same load.
If you want some details you read it here : Mappy#1 (comment)

@springmeyer
Mapnik member

landed thanks to @abonnasseau and team in #2001

@springmeyer springmeyer closed this Sep 6, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment