Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent topology population introduce invalidities #16

Open
strk opened this issue Jun 12, 2020 · 26 comments
Open

Concurrent topology population introduce invalidities #16

strk opened this issue Jun 12, 2020 · 26 comments
Assignees

Comments

@strk
Copy link
Collaborator

strk commented Jun 12, 2020

While PostGIS Topology is known to not support concurrent population, as described in https://trac.osgeo.org/postgis/timeline?from=2020-05-08T06%3A13%3A24-07%3A00&precision=second, we want to try at working around this problem by ensuring a single transaction will ever only work on the same edges, faces and nodes at the same time.

This ticket is to track such task in the appropriate repository (here).

Work is undergoing about this in the trac_osgeo_org_postgis_ticket_4684 branch

@strk strk self-assigned this Jun 12, 2020
@strk
Copy link
Collaborator Author

strk commented Jun 12, 2020

I've read somewhere about deadlocks occurring, @larsop can you explain why this happens ?
Reading the code I see you're obtaining locks with multiple SELECTs, does obtaining them all with a SINGLE select help some ? Ideally every transaction should obtain all logs upfront, without holding any lock while waiting to obtain the others

@larsop
Copy link
Owner

larsop commented Jun 19, 2020

First I do block on cell in content based grid table, since this is a very small table.

Basically what I do here is to block all cells that intersects with that area that may be affected the current update. So if anther update at the same time intersects with some of the same cells it will just wait and try later and another job will start instead.

My hope with it doing it this way was to avoid to do row locks on very big tables and then totally avoid the next step.

In the next step I start to block rows in the topology tables. The reason why I this in many steps is that did not know about the thats it's best to this one single transaction.

@strk
Copy link
Collaborator Author

strk commented Jun 19, 2020

We're trying to find the root cause of the issue here, so increasing the complexity is the wrong way to go. We need to simplify as much as possible. See the description in https://trac.osgeo.org/postgis/ticket/4684#comment:12 -- can you craft a test in THIS repository to use THAT data ?

@larsop
Copy link
Owner

larsop commented Jun 19, 2020

OK, I will that.

@strk
Copy link
Collaborator Author

strk commented Jun 19, 2020

Another thing to do would be having the resolve_overlap_gap_run STOP as soon as an invalid topology is detected. Attempts at adding more elements to an already corrupted topology wont' help finding what introduced the invalidity in the first place. I'll see on monday if I can come up with a list of exceptions that might be interpreted as a corrupted topology. The exceptions I've added with the recent upstream PostGIS commit contained the "Corrupted topology" label but I'm afraid there' no clear way to determine what exception comes out of a corrupted topology and what not.

@larsop
Copy link
Owner

larsop commented Jun 23, 2020

We're trying to find the root cause of the issue here, so increasing the complexity is the wrong way to go. We need to simplify as much as possible. See the description in https://trac.osgeo.org/postgis/ticket/4684#comment:12 -- can you craft a test in THIS repository to use THAT data ?

Here is sample but it seems to work ok

select topology.CreateTopology ('test_topo_x2',25833,1.0);

-- Add F1/E1
select topology.TopoGeo_addLinestring('test_topo_x2', ST_SetSRID(ST_LineMerge('0105000020E9640000020000000102000000070000009ED5973A3CFA1141623BBDC97A90594110CE42F567F31141313BA66F6E905941AD63DB8F45CF11419D1E9F337D9359419F0F59AE92E711412D28DB46F9965941F4253E3202241241514E1E96669759415790A59724481241E56A25D25794594165E42779D72F12415561E9BEDB90594101020000000200000065E42779D72F12415561E9BEDB905941EA7D34089A241241D1087F6AC7905941'),25833),0.75); 

-- Add E2
select topology.TopoGeo_addLinestring('test_topo_x2', ST_SetSRID(ST_LineMerge('0102000020E9640000070000006E371694CB1D1241484CE8047F9259412CE6E0108E0E12413423164D09925941BF251729900012412C951E9EA192594194B682C4CF0112413630F9A6AF935941D607B8470D1112414A59CB5E2594594143C8812F0B1F124152E7C20D8D9359416E371694CB1D1241484CE8047F925941'),25833),0.75); 


DO
$body$
DECLARE 
stmts text[];	
BEGIN

-- Add E3
stmts[0] = 'select topology.TopoGeo_addLinestring('||quote_literal('test_topo_x2')||', ST_SetSRID(ST_LineMerge(0102000020E964000002000000EA7D34089A241241D1087F6AC79059419ED5973A3CFA1141623BBDC97A905941),25833),0.75)';
-- Add E4
stmts[1] = 'select topology.TopoGeo_addLinestring('||quote_literal('test_topo_x2')||', ST_SetSRID(ST_LineMerge(0102000020E9640000040000009ED5973A3CFA1141623BBDC97A905941E505D8A6E40E1241A75E8DC5AF8D5941A90EC3C7BE29124166D7BEBEE28D5941EA7D34089A241241D1087F6AC7905941),25833),0.75)';

perform execute_parallel(stmts,2,true);

 
END
$body$;

SELECT 'validation', * FROM topology.ValidateTopology('test_topo_x2');

@larsop
Copy link
Owner

larsop commented Jun 25, 2020

Another thing to do would be having the resolve_overlap_gap_run STOP as soon as an invalid topology is detected. Attempts at adding more elements to an already corrupted topology wont' help finding what introduced the invalidity in the first place. I'll see on monday if I can come up with a list of exceptions that might be interpreted as a corrupted topology. The exceptions I've added with the recent upstream PostGIS commit contained the "Corrupted topology" label but I'm afraid there' no clear way to determine what exception comes out of a corrupted topology and what not.

I done update in this commit where I only catch and log topo error when adding lines to the final Topology result.

When running this test

CALL resolve_overlap_gap_run(
('sl_esh.ar50_utvikling_flate','sl_sdeid','geo',25833,true),
('test_topo_ar50_t7',1.0),
resolve_overlap_data_clean_type_func(300,9,500,3,140,120,240,35,325),
25,1000)"

Is was 12 lines that was not added with this errors. All these errors was related to stage 3, which means adding lines that makes a connection between lines from different cells added in stage 2. Stage 3 is here running multithreaded.
(233250 edges was added with out error to the final Topology layer).

SELECT d_msg from test_topo_ar50_t7.ar50_utvikling_flate_no_cut_line_failed where line_geo_lost = true;
                                          d_msg                                          
-----------------------------------------------------------------------------------------
 Side-location conflict: new edge starts in face 0 and ends in face 77236
 Side-location conflict: new edge starts in face 77236 and ends in face 0
 Corrupted topology: adjacent edges 224269 and 232975 bind different face (77415 and 0)
 Side-location conflict: new edge starts in face 0 and ends in face 77790
 Side-location conflict: new edge starts in face 77790 and ends in face 0
 Side-location conflict: new edge starts in face 0 and ends in face 77909
 Side-location conflict: new edge starts in face 78158 and ends in face 0
 Side-location conflict: new edge starts in face 0 and ends in face 78317
 Side-location conflict: new edge starts in face 0 and ends in face 78328
 Side-location conflict: new edge starts in face 0 and ends in face 78381
 Side-location conflict: new edge starts in face 0 and ends in face 78385
 Corrupted topology: adjacent edges -184008 and 232135 bind different face (78452 and 0)
(12 rows)

When running validation (I did not wait for akk results)

SELECT 'validation', * FROM topology.ValidateTopology('test_topo_ar50_t7');
NOTICE:  00000: Corrupted topology: face 78626 could not be constructed only from edges knowing about it (like edge 231930).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78627 could not be constructed only from edges knowing about it (like edge 232006).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78628 could not be constructed only from edges knowing about it (like edge 231892).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78629 could not be constructed only from edges knowing about it (like edge 231896).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78633 could not be constructed only from edges knowing about it (like edge 231997).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78638 could not be constructed only from edges knowing about it (like edge 231840).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78639 could not be constructed only from edges knowing about it (like edge 231883).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78640 could not be constructed only from edges knowing about it (like edge 231981).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78642 could not be constructed only from edges knowing about it (like edge 232006).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78643 could not be constructed only from edges knowing about it (like edge 231877).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78644 could not be constructed only from edges knowing about it (like edge 231841).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78656 could not be constructed only from edges knowing about it (like edge 232253).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78657 could not be constructed only from edges knowing about it (like edge 232250).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78658 could not be constructed only from edges knowing about it (like edge 232222).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78611 could not be constructed only from edges knowing about it (like edge 231424).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78612 could not be constructed only from edges knowing about it (like edge 231461).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78599 could not be constructed only from edges knowing about it (like edge 231912).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78600 could not be constructed only from edges knowing about it (like edge 231786).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78601 could not be constructed only from edges knowing about it (like edge 231869).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78603 could not be constructed only from edges knowing about it (like edge 231790).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78604 could not be constructed only from edges knowing about it (like edge 231777).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78605 could not be constructed only from edges knowing about it (like edge 231776).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76451 could not be constructed only from edges knowing about it (like edge 231425).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76543 could not be constructed only from edges knowing about it (like edge 231542).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76547 could not be constructed only from edges knowing about it (like edge 231549).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76548 could not be constructed only from edges knowing about it (like edge 231550).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76783 could not be constructed only from edges knowing about it (like edge 231848).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76785 could not be constructed only from edges knowing about it (like edge 231847).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76801 could not be constructed only from edges knowing about it (like edge 231803).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76757 could not be constructed only from edges knowing about it (like edge 231777).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76803 could not be constructed only from edges knowing about it (like edge 231854).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76812 could not be constructed only from edges knowing about it (like edge 231851).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76798 could not be constructed only from edges knowing about it (like edge 231847).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76818 could not be constructed only from edges knowing about it (like edge 231895).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76792 could not be constructed only from edges knowing about it (like edge 231912).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76840 could not be constructed only from edges knowing about it (like edge 231877).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76858 could not be constructed only from edges knowing about it (like edge 231925).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76904 could not be constructed only from edges knowing about it (like edge 231979).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76903 could not be constructed only from edges knowing about it (like edge 231979).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76914 could not be constructed only from edges knowing about it (like edge 231997).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76806 could not be constructed only from edges knowing about it (like edge 232001).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76930 could not be constructed only from edges knowing about it (like edge 232001).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 76923 could not be constructed only from edges knowing about it (like edge 231976).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 77131 could not be constructed only from edges knowing about it (like edge 232234).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 77136 could not be constructed only from edges knowing about it (like edge 232224).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 77139 could not be constructed only from edges knowing about it (like edge 232243).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 77141 could not be constructed only from edges knowing about it (like edge 232243).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 77133 could not be constructed only from edges knowing about it (like edge 232245).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 77143 could not be constructed only from edges knowing about it (like edge 232247).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 77124 could not be constructed only from edges knowing about it (like edge 232227).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 77144 could not be constructed only from edges knowing about it (like edge 232240).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 77145 could not be constructed only from edges knowing about it (like edge 232249).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 77135 could not be constructed only from edges knowing about it (like edge 232250).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 77134 could not be constructed only from edges knowing about it (like edge 232251).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 77148 could not be constructed only from edges knowing about it (like edge 232252).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 77120 could not be constructed only from edges knowing about it (like edge 232253).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78607 could not be constructed only from edges knowing about it (like edge 232133).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78577 could not be constructed only from edges knowing about it (like edge 231539).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78578 could not be constructed only from edges knowing about it (like edge 231539).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78590 could not be constructed only from edges knowing about it (like edge 231882).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78591 could not be constructed only from edges knowing about it (like edge 231844).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78595 could not be constructed only from edges knowing about it (like edge 231872).
LOCATION:  pg_notice, lwgeom_pg.c:277
NOTICE:  00000: Corrupted topology: face 78596 could not be constructed only from edges knowing about it (like edge 231882).
LOCATION:  pg_notice, lwgeom_pg.c:277

If I try to return the command by hand I get the same error

SELECT ARRAY(SELECT topology.TopoGeo_addLinestring('test_topo_ar50_t7','0102000020E9640000200000000000000040D70F4100000000461E594100000000F0D60F4100000000461E59410000000050D60F4100000000461E59410000000050D60F4100000080481E59410000000020D40F4100000080481E59410000000020D40F4100000080431E594100000000D0D30F4100000080431E594100000000D0D30F4100000080391E59410000000080D30F4100000080391E59410000000080D30F4100000000371E59410000000030D30F4100000000371E59410000000030D30F4100000080341E594100000000E0D20F4100000080341E59410000000090D20F4100000080341E59410000000090D20F4100000000321E59410000000040D20F4100000000321E594100000000F0D10F4100000000321E594100000000F0D10F41000000802F1E594100000000A0D10F41000000802F1E594100000000A0D10F4100000000231E59410000000050D10F4100000000231E59410000000050D10F4100000000191E59410000000000D10F4100000000191E59410000000000D10F4100000080111E594100000000B0D00F4100000080111E594100000000B0D00F41000000000F1E59410000000060D00F41000000000F1E59410000000060D00F41000000000A1E59410000000010D00F41000000000A1E59410000000010D00F4100000080071E594100000000C0CF0F4100000080071E594100000000C0CF0F4100000000051E5941',1));
ERROR:  XX000: Side-location conflict: new edge starts in face 0 and ends in face 79100

@larsop
Copy link
Owner

larsop commented Jun 25, 2020

One more comment here, where the error happens and how many seems to be kind of random or maybe depend on number of parallel threads. I did a test now with 28 parallel threads instead of 25 and then I got this list of errors. I will also run a test 10 threads to if this reduce the number of errors.

select log_time, ST_AsText(ST_Centroid(geo)), ST_length(geo)  ,d_msg, error_info log_time from test_topo_ar50_t7.ar50_utvikling_flate_no_cut_line_failed l  order by l.log_time;
          log_time          |                st_astext                 | st_length |                                            d_msg                                            |      log_time      
----------------------------+------------------------------------------+-----------+---------------------------------------------------------------------------------------------+--------------------
 2020-06-25 13:53:02.438219 | POINT(301988.566176471 6574225.625)      |      1360 | deadlock detected                                                                           | Will not do retry 
 2020-06-25 13:53:02.438219 | POINT(302854.504504504 6574110.67567568) |      1110 | deadlock detected                                                                           | Will not do retry 
 2020-06-25 13:53:02.438219 | POINT(302472.598425197 6574400.59055118) |      1270 | deadlock detected                                                                           | Will not do retry 
 2020-06-25 13:53:02.438219 | POINT(300856.607142857 6573804.64285714) |      1680 | deadlock detected                                                                           | Will not do retry 
 2020-06-25 13:53:02.438219 | POINT(301510.027027027 6574263.08108108) |      1850 | deadlock detected                                                                           | Will not do retry 
 2020-06-25 13:53:05.094632 | POINT(175709.375 7015499.375)            |       160 | deadlock detected                                                                           | Will not do retry 
 2020-06-25 13:53:43.855212 | POINT(149693.947368421 7015518.94736842) |       190 | deadlock detected                                                                           | Will not do retry 
 2020-06-25 13:53:55.961733 | POINT(288223.660714286 6590878.48214286) |       560 | Side-location conflict: new edge starts in face 77360 and ends in face 0                    | Will not do retry 
 2020-06-25 13:53:55.961733 | POINT(288053.631578947 6586362.42105263) |       950 | Side-location conflict: new edge starts in face 77362 and ends in face 0                    | Will not do retry 
 2020-06-25 13:53:55.961733 | POINT(288381.711711712 6589206.53153153) |      1110 | Side-location conflict: new edge starts in face 77362 and ends in face 0                    | Will not do retry 
 2020-06-25 13:53:55.961733 | POINT(288388.991935484 6584075.36290323) |      1240 | Side-location conflict: new edge starts in face 0 and ends in face 77364                    | Will not do retry 
 2020-06-25 13:53:55.961733 | POINT(288265.995850622 6585125.26970954) |      2410 | Corrupted topology: adjacent edges 232446 and 188780 bind different face (0 and 77364)      | Will not do retry 
 2020-06-25 13:53:55.961733 | POINT(288080.231788079 6591781.43487859) |      4530 | Side-location conflict: new edge starts in face 0 and ends in face 77372                    | Will not do retry 
 2020-06-25 13:53:55.961733 | POINT(288286.666666667 6589145)          |       150 | Side-location conflict: new edge starts in face 0 and ends in face 76074                    | Will not do retry 
 2020-06-25 13:53:55.961733 | POINT(288231.818181818 6589010.45454545) |       220 | Side-location conflict: new edge starts in face 76074 and ends in face 0                    | Will not do retry 
 2020-06-25 13:53:55.961733 | POINT(288184.318181818 6584434.77272727) |       220 | Side-location conflict: new edge starts in face 0 and ends in face 76074                    | Will not do retry 
 2020-06-25 13:53:55.961733 | POINT(288176.666666667 6586386.66666667) |       300 | Side-location conflict: new edge starts in face 76074 and ends in face 0                    | Will not do retry 
 2020-06-25 13:53:55.961733 | POINT(288252.843137255 6586481.56862745) |       510 | Side-location conflict: new edge starts in face 0 and ends in face 77360                    | Will not do retry 
 2020-06-25 13:53:55.961733 | POINT(288219.903846154 6586161.05769231) |       520 | Side-location conflict: new edge starts in face 0 and ends in face 77360                    | Will not do retry 
 2020-06-25 13:55:03.240885 | POINT(307856 6583881)                    |        50 | Side-location conflict: new edge starts in face 0 and ends in face 76886                    | Will not do retry 
 2020-06-25 13:55:03.240885 | POINT(307845 6591096)                    |        50 | Side-location conflict: new edge starts in face 0 and ends in face 76886                    | Will not do retry 
 2020-06-25 13:55:03.240885 | POINT(307890.227272727 6591031.59090909) |       220 | Side-location conflict: new edge starts in face 77539 and ends in face 0                    | Will not do retry 
 2020-06-25 13:55:03.240885 | POINT(307831.956521739 6583796.52173913) |       230 | Side-location conflict: new edge starts in face 77539 and ends in face 0                    | Will not do retry 
 2020-06-25 13:55:03.240885 | POINT(307788.617021277 6589414.57446809) |       940 | Side-location conflict: new edge starts in face 0 and ends in face 77542                    | Will not do retry 
 2020-06-25 13:55:03.240885 | POINT(307910.136054422 6589087.92517007) |      1470 | Corrupted topology: adjacent edges 232681 and -104459 bind different face (0 and 77550)     | Will not do retry 
 2020-06-25 13:56:07.701246 | POINT(302105 6592905)                    |        40 | Corrupted topology: adjacent edges -84980 and 232935 bind different face (77556 and 0)      | Will not do retry 
 2020-06-25 13:56:07.701246 | POINT(302060.625 6592899.375)            |        80 | Side-location conflict: new edge starts in face 0 and ends in face 77556                    | Will not do retry 
 2020-06-25 13:57:43.339146 | POINT(311936.793478261 6583469.61956522) |       920 | Side-location conflict: new edge starts in face 0 and ends in face 77802                    | Will not do retry 
 2020-06-25 13:57:43.339146 | POINT(312008.050847458 6583829.74576271) |      1180 | Side-location conflict: new edge starts in face 0 and ends in face 77802                    | Will not do retry 
 2020-06-25 13:57:43.339146 | POINT(314301.851851852 6583463.04232804) |      1890 | Side-location conflict: new edge starts in face 77807 and ends in face 0                    | Will not do retry 
 2020-06-25 13:57:43.339146 | POINT(313896.914893617 6583486.08510638) |      2350 | Side-location conflict: new edge starts in face 0 and ends in face 77807                    | Will not do retry 
 2020-06-25 13:58:31.12307  | POINT(311207.272727273 6583507.72727273) |       110 | Side-location conflict: new edge starts in face 0 and ends in face 77802                    | Will not do retry 
 2020-06-25 13:58:31.12307  | POINT(311304.0625 6583523.4375)          |       160 | Corrupted topology: adjacent edges -195120 and 233065 bind different face (77802 and 0)     | Will not do retry 
 2020-06-25 13:58:31.12307  | POINT(311160.344827586 6583606.89655172) |       580 | Side-location conflict: new edge starts in face 0 and ends in face 77855                    | Will not do retry 
 2020-06-25 14:06:44.566905 | POINT(303778.068181818 6574041.93181818) |       440 | Corrupted topology: adjacent edges -212760 and 231208 bind different face (78272 and 76294) | Will not do retry 
 2020-06-25 14:06:44.566905 | POINT(303921.990291262 6573913.30097087) |      1030 | Side-location conflict: new edge starts in face 78276 and ends in face 76294                | Will not do retry 
 2020-06-25 14:07:29.301706 | POINT(310123.03030303 6574292.12121212)  |       660 | Side-location conflict: new edge starts in face 78332 and ends in face 0                    | Will not do retry 
 2020-06-25 14:07:29.301706 | POINT(310411.647058824 6574375.35294118) |       850 | Side-location conflict: new edge starts in face 0 and ends in face 78332                    | Will not do retry 
 2020-06-25 14:08:09.729936 | POINT(282498.333333333 6583526.66666667) |        90 | Side-location conflict: new edge starts in face 0 and ends in face 77363                    | Will not do retry 

With 10 threads we got 17 errors not 39 errors as with 28 threads , but the the first run with 25 threads we got 11 errors so it seems to more random than connected to number threads. But 10 threads we did not see any deadlocks.

select log_time, ST_AsText(ST_Centroid(geo)), ST_length(geo)  ,d_msg, error_info log_time from test_topo_ar50_t7.ar50_utvikling_flate_no_cut_line_failed l  order by l.log_time;
          log_time          |                st_astext                 | st_length |                                          d_msg                                          |      log_time      
----------------------------+------------------------------------------+-----------+-----------------------------------------------------------------------------------------+--------------------
 2020-06-25 15:37:58.079098 | POINT(311936.793478261 6583469.61956522) |       920 | Side-location conflict: new edge starts in face 0 and ends in face 77547                | Will not do retry 
 2020-06-25 15:37:58.079098 | POINT(312008.050847458 6583829.74576271) |      1180 | Side-location conflict: new edge starts in face 0 and ends in face 77547                | Will not do retry 
 2020-06-25 15:37:58.079098 | POINT(314301.851851852 6583463.04232804) |      1890 | Side-location conflict: new edge starts in face 77552 and ends in face 0                | Will not do retry 
 2020-06-25 15:37:58.079098 | POINT(313896.914893617 6583486.08510638) |      2350 | Side-location conflict: new edge starts in face 0 and ends in face 77552                | Will not do retry 
 2020-06-25 15:40:12.995357 | POINT(311207.272727273 6583507.72727273) |       110 | Side-location conflict: new edge starts in face 0 and ends in face 77547                | Will not do retry 
 2020-06-25 15:40:12.995357 | POINT(311304.0625 6583523.4375)          |       160 | Corrupted topology: adjacent edges -108107 and 233064 bind different face (77547 and 0) | Will not do retry 
 2020-06-25 15:40:12.995357 | POINT(311160.344827586 6583606.89655172) |       580 | Side-location conflict: new edge starts in face 0 and ends in face 77878                | Will not do retry 
 2020-06-25 15:40:13.005192 | POINT(166661.25 7013388.75)              |        80 | Side-location conflict: new edge starts in face 0 and ends in face 77903                | Will not do retry 
 2020-06-25 15:40:13.005192 | POINT(166646.015625 7007314.296875)      |       640 | Corrupted topology: adjacent edges 233108 and 20482 bind different face (0 and 77903)   | Will not do retry 
 2020-06-25 15:40:13.005192 | POINT(166633.676470588 7013287.35294118) |       680 | Side-location conflict: new edge starts in face 77915 and ends in face 0                | Will not do retry 
 2020-06-25 15:40:13.005192 | POINT(166490.52154195 7006935.88435374)  |      4410 | Side-location conflict: new edge starts in face 0 and ends in face 77935                | Will not do retry 
 2020-06-25 15:45:04.651391 | POINT(164900.423728814 7006108.47457627) |       590 | Side-location conflict: new edge starts in face 0 and ends in face 77935                | Will not do retry 
 2020-06-25 15:45:04.651391 | POINT(164816.368421053 7006275.89473684) |       950 | Side-location conflict: new edge starts in face 0 and ends in face 77935                | Will not do retry 
 2020-06-25 15:48:47.785503 | POINT(310123.03030303 6574292.12121212)  |       660 | Side-location conflict: new edge starts in face 78437 and ends in face 0                | Will not do retry 
 2020-06-25 15:48:47.785503 | POINT(310411.647058824 6574375.35294118) |       850 | Side-location conflict: new edge starts in face 0 and ends in face 78437                | Will not do retry 
 2020-06-25 15:52:19.051995 | POINT(173677.53968254 7006084.36507936)  |      1260 | Side-location conflict: new edge starts in face 0 and ends in face 78339                | Will not do retry 
 2020-06-25 15:52:19.051995 | POINT(173196.823671498 7006038.36956522) |      4140 | Side-location conflict: new edge starts in face 78339 and ends in face 0                | Will not do retry 

@strk
Copy link
Collaborator Author

strk commented Jun 26, 2020 via email

@strk
Copy link
Collaborator Author

strk commented Jun 26, 2020

Both the "Side-location conflict" and the (obviously) "Corrupted topology" errors indicate an invalid topology. Is this happening against a topology that passes the ValidateTopology test ? If so, there are two possibilities:

  1. ValidateTopology is failing to catch invalidities (possible, we could take a lookg with QGIS TopologyViewer)
  2. The invalidity is introduced by the single call to topology.TopoGeo_addLinestring

To determine both occurrences it would help to receive a dump of the starting topology.
If possible, reducing it as much as possible while keeping this behaviour.
How big would a full dump of this starting database be ? Make sure to drop any unneeded table (we only need the test_topo_ar50_t7 schema and the topology.topology and topology.layer entries)

@strk
Copy link
Collaborator Author

strk commented Jun 26, 2020

Per point 1 above (ValidateTopology failing to catch invalidities) see https://trac.osgeo.org/postgis/ticket/3042

@larsop
Copy link
Owner

larsop commented Jun 27, 2020

Given we're using small geometries, can you ensure the second "thread" (process) is really started before the first one commits its transaction ? Can add a pg_sleep() be added as part as the "same statement" ?

Yes, you where right adding a sleep made the test fail and we git this result.
(Running in one thread no errors)

?column? | error | id1 | id2
------------+------------------+-----+-----
validation | face within face | 1 | 2
validation | face within face | 2 | 1
validation | face within face | 1 | 3
validation | face within face | 2 | 3


CREATE OR REPLACE FUNCTION add_border_line_sleep(
_topology_name character varying, 
_new_line_raw geometry,
_snap_tolerance float)
  RETURNS integer[]
  AS $$
DECLARE
edges_added integer[];
command_string varchar;

BEGIN
  command_string := Format('SELECT ARRAY(SELECT topology.TopoGeo_addLinestring(%L,%L,%s))', 
  _topology_name, _new_line_raw, _snap_tolerance);
  EXECUTE command_string into edges_added;
  perform pg_sleep(1);
  RETURN edges_added;
END;

$$
LANGUAGE plpgsql;

select topology.CreateTopology ('test_topo_x2',25833,1.0);

-- Add F1/E1
select topology.TopoGeo_addLinestring('test_topo_x2', ST_SetSRID(ST_LineMerge('0105000020E9640000020000000102000000070000009ED5973A3CFA1141623BBDC97A90594110CE42F567F31141313BA66F6E905941AD63DB8F45CF11419D1E9F337D9359419F0F59AE92E711412D28DB46F9965941F4253E3202241241514E1E96669759415790A59724481241E56A25D25794594165E42779D72F12415561E9BEDB90594101020000000200000065E42779D72F12415561E9BEDB905941EA7D34089A241241D1087F6AC7905941'),25833),0.75); 

-- Add E2
select topology.TopoGeo_addLinestring('test_topo_x2', ST_SetSRID(ST_LineMerge('0102000020E9640000070000006E371694CB1D1241484CE8047F9259412CE6E0108E0E12413423164D09925941BF251729900012412C951E9EA192594194B682C4CF0112413630F9A6AF935941D607B8470D1112414A59CB5E2594594143C8812F0B1F124152E7C20D8D9359416E371694CB1D1241484CE8047F925941'),25833),0.75); 

DO
$body$
DECLARE 
stmts text[];	
topology_name character varying = 'test_topo_x2';
_snap_tolerance float = 0.75;
command_string varchar;
e3 geometry;
e4 geometry;

BEGIN

e3 = ST_SetSRID(ST_LineMerge('0102000020E964000002000000EA7D34089A241241D1087F6AC79059419ED5973A3CFA1141623BBDC97A905941'::Geometry),25833);
e4 = ST_SetSRID(ST_LineMerge('0102000020E9640000040000009ED5973A3CFA1141623BBDC97A905941E505D8A6E40E1241A75E8DC5AF8D5941A90EC3C7BE29124166D7BEBEE28D5941EA7D34089A241241D1087F6AC7905941'::Geometry),25833);

-- Add E3
stmts[1] = Format('select add_border_line_sleep(%L,%L,%s)',topology_name,e3,_snap_tolerance);
-- Add E4
stmts[2] = Format('select add_border_line_sleep(%L,%L,%s)',topology_name,e4,_snap_tolerance);

perform execute_parallel(stmts::text[],2,true);

 
END
$body$;

SELECT 'validation', * FROM topology.ValidateTopology('test_topo_x2');

@larsop
Copy link
Owner

larsop commented Jun 28, 2020

Both the "Side-location conflict" and the (obviously) "Corrupted topology" errors indicate an invalid topology. Is this happening against a topology that passes the ValidateTopology test ? If so, there are two possibilities:

  1. ValidateTopology is failing to catch invalidities (possible, we could take a lookg with QGIS TopologyViewer)
  2. The invalidity is introduced by the single call to topology.TopoGeo_addLinestring

To determine both occurrences it would help to receive a dump of the starting topology.
If possible, reducing it as much as possible while keeping this behaviour.

I have now change code to stop after exception and if the number of Topology error are not zero in this commit if setting the parameter for this.

This code is running now. The validation is taking log time so I only do validation if get an exception. I will try to dump the database to you as soon as it stops

How big would a full dump of this starting database be ? Make sure to drop any unneeded table (we only need the test_topo_ar50_t7 schema and the topology.topology and topology.layer entries)

The input simple data has this size,

table_schema | table_name | total_size | data_size | external_size
-------------------+-------------------------------------------+------------+-----------+---------------
sl_esh | ar50_utvikling_flate | 97 MB | 82 MB | 16 MB

@larsop
Copy link
Owner

larsop commented Jun 28, 2020

No I got a exception and invalid topology when running the test below.
Dumps uploaded to ftp.

time psql  -h vroom2.ad.skogoglandskap.no -U postgres resolve_cha -c "CALL resolve_overlap_gap_run(
('sl_esh.ar50_utvikling_flate','sl_sdeid','geo',25833,true),
('test_topo_ar50_t7',1.0),
resolve_overlap_data_clean_type_func(300,9,500,3,140,120,240,35,325),
28,1000,false)"

NOTICE:  New stmt (CALL resolve_overlap_gap_single_cell(
  'sl_esh.ar50_utvikling_flate','geo','sl_sdeid','test_topo_ar50_t7.ar50_utvikling_flate',
  'test_topo_ar50_t7',1,25833,'true',
  '(300,9,500,3,140,120,240,3,35)',
  'test_topo_ar50_t7.ar50_utvikling_flate_job_list','test_topo_ar50_t7.ar50_utvikling_flate_grid','0103000020E96400000100000005000000000000004069F8400000000020F05941000000004069F84000000080E0825A4100000000C06AF84000000080E0825A4100000000C06AF8400000000020F05941000000004069F8400000000020F05941',3,42);) on connection conn14
NOTICE:  196 statements to execute in 28 threads, done with 154 , failed num 1
NOTICE:  Failed run execute_parallel cell_job_type: 3 , in loop_number 42, state  : P0001 message: Num ok stats 153 raise_error Failed get value for stmt: 'CALL resolve_overlap_gap_single_cell(
  ''sl_esh.ar50_utvikling_flate'',''geo'',''sl_sdeid'',''test_topo_ar50_t7.ar50_utvikling_flate'',
  ''test_topo_ar50_t7'',1,25833,''true'',
  ''(300,9,500,3,140,120,240,3,35)'',
  ''test_topo_ar50_t7.ar50_utvikling_flate_job_list'',''test_topo_ar50_t7.ar50_utvikling_flate_grid'',''0103000020E9640000010000000500000000000000AED204410000006863D55A4100000000AED204410000006869D55A4100000000FD4D05410000006869D55A4100000000FD4D05410000006863D55A4100000000AED204410000006863D55A41'',3,42);' , using conn 'conn14', state  : '40P01' message: 'deadlock detected' detail : 'Process 13391 waits for ShareLock on transaction 341392380; blocked by process 13291.
Process 13291 waits for ShareLock on transaction 341392843; blocked by process 13391.' hint : 'See server log for query details.' context: 'while locking tuple (317,33) in relation "face"
SQL statement "SELECT count(r.*) FROM 
                               (
                                 SELECT f.* 
                                   FROM test_topo_ar50_t7.face f 
                                 where ST_DWithin(''0106000020E9640000010000000103000000010000003B00000000000000D0800541000000C012D25A4100000000D0800541000000C0C6D25A4100000000A8670541000000C0C6D25A4100000000A8670541000000E07DD35A4100000000885B0541000000E07DD35A4100000000885B05410000004073D35A4100000000D85505410000004073D35A4100000000D8550541000000E07DD35A4100000000C44C0541000000E07DD35A4100000000C44C0541000000C05CD35A4100000000484B0541000000C05CD35A4100000000484B0541000000C057D35A4100000000E83B0541000000C057D35A4100000000E83B0541000000EE8BD35A410000000038410541000000EE8BD35A410000000038410541000000C096D35A4100000000B8480541000000C096D35A4100000000B8480541000000C0FCD35A410000008098480541000000C0FCD35A410000008098480541000000BCF6D35A4100000000AC3B0541000000BCF6D35A4100000000AC3B0541000000C04AD45A41000000801B420541000000C04AD45A41000000801B420541000000C06AD45A4100000000063C0541000000C06AD45A4100000000063C054100000040FCD45A4100000040A741054100000040FCD45A4100000040A7410541000000C0FED45A41000000801B420541000000C0FED45A41000000801B420541000000C076D55A410000000048450541000000C076D55A41000000004845054100000040B0D55A41000000009845054100000040B0D55A410000000098450541000000C0B2D55A410000000058590541000000C0B2D55A410000000058590541000000C07BD55A410000000088600541000000C07BD55A410000000088600541000000C076D55A410000000078610541000000C076D55A410000000078610541000000C067D55A4100000000186B0541000000C067D55A4100000000186B0541000000C06CD55A410000000008710541000000C06CD55A410000000008710541000000C0ADD55A4100000000587B0541000000C0ADD55A4100000000587B054100000050EBD55A4100000080937E054100000050EBD55A4100000080937E0541000000C0EDD65A41000000800DBA0541000000C0EDD65A41000000800DBA0541000000C00AD55A4100000000B8B20541000000C00AD55A4100000000B8B205410000004072D45A410000000048AF05410000004072D45A410000000048AF054100000040E9D35A4100000000D8B0054100000040E9D35A4100000000D8B00541000000C0D7D35A4100000000F8CC0541000000C0D7D35A4100000000F8CC0541000000C012D25A4100000000D0800541000000C012D25A41'',f.mbr,1)
                                 for update
                               ) as r;"
PL/pgSQL function resolve_overlap_gap_single_cell(character varying,character varying,character varying,character varying,character varying,double precision,integer,boolean,resolve_overlap_data_clean_type,character varying,character varying,geometry,integer,integer) line 577 at EXECUTE
while executing query on dblink connection named "conn14"
SQL statement "select val           from dblink_get_result(conntions_array[i]) as d(val text)"
PL/pgSQL function execute_parallel(text[],integer,boolean,text,boolean) line 118 at SQL statement
SQL statement "SELECT execute_parallel (stmts_final, _max_parallel_jobs,true,null,_contiune_after_stat_exception)"
PL/pgSQL function resolve_overlap_gap_run(resolve_overlap_data_input_type,resolve_overlap_data_topology_type,resolve_overlap_data_clean_type,integer,integer,boolean) line 148 at SQL statement' detail :  hint : An error happend in one the statemnet, please chem them  context: PL/pgSQL function execute_parallel(text[],integer,boolean,text,boolean) line 200 at RAISE
SQL statement "SELECT execute_parallel (stmts_final, _max_parallel_jobs,true,null,_contiune_after_stat_exception)"
PL/pgSQL function resolve_overlap_gap_run(resolve_overlap_data_input_type,resolve_overlap_data_topology_type,resolve_overlap_data_clean_type,integer,integer,boolean) line 148 at SQL statement
NOTICE:  Start to ValidateTopology for cell_job_type 3 at loop_number 42 running SELECT count(*) FROM topology.ValidateTopology('test_topo_ar50_t7') 
NOTICE:  Corrupted topology: face 93663 could not be constructed only from edges knowing about it (like edge 252484).
NOTICE:  Corrupted topology: face 93677 could not be constructed only from edges knowing about it (like edge 252497).
NOTICE:  Corrupted topology: face 93675 could not be constructed only from edges knowing about it (like edge 252497).
NOTICE:  Corrupted topology: face 93682 could not be constructed only from edges knowing about it (like edge 252462).
NOTICE:  Corrupted topology: face 93686 could not be constructed only from edges knowing about it (like edge 252491).
NOTICE:  Corrupted topology: face 93700 could not be constructed only from edges knowing about it (like edge 252505).
NOTICE:  Corrupted topology: face 93702 could not be constructed only from edges knowing about it (like edge 252529).
NOTICE:  Corrupted topology: face 93712 could not be constructed only from edges knowing about it (like edge 252540).
.
.
NOTICE:  Found 115 errors when ValidateTopology for cell_job_type 3 at loop_number 42 for topology test_topo_ar50_t7 in 943.350402 secs
ERROR:  Failed run execute_parallel cell_job_type and error found : 3 , in loop_number 42, state  : P0001 message: Num ok stats 153 raise_error Failed get value for stmt: 'CALL resolve_overlap_gap_single_cell(
  ''sl_esh.ar50_utvikling_flate'',''geo'',''sl_sdeid'',''test_topo_ar50_t7.ar50_utvikling_flate'',
  ''test_topo_ar50_t7'',1,25833,''true'',
  ''(300,9,500,3,140,120,240,3,35)'',
  ''test_topo_ar50_t7.ar50_utvikling_flate_job_list'',''test_topo_ar50_t7.ar50_utvikling_flate_grid'',''0103000020E9640000010000000500000000000000AED204410000006863D55A4100000000AED204410000006869D55A4100000000FD4D05410000006869D55A4100000000FD4D05410000006863D55A4100000000AED204410000006863D55A41'',3,42);' , using conn 'conn14', state  : '40P01' message: 'deadlock detected' detail : 'Process 13391 waits for ShareLock on transaction 341392380; blocked by process 13291.
Process 13291 waits for ShareLock on transaction 341392843; blocked by process 13391.' hint : 'See server log for query details.' context: 'while locking tuple (317,33) in relation "face"
SQL statement "SELECT count(r.*) FROM 
                               (
                                 SELECT f.* 
                                   FROM test_topo_ar50_t7.face f 
                                 where ST_DWithin(''0106000020E9640000010000000103000000010000003B00000000000000D0800541000000C012D25A4100000000D0800541000000C0C6D25A4100000000A8670541000000C0C6D25A4100000000A8670541000000E07DD35A4100000000885B0541000000E07DD35A4100000000885B05410000004073D35A4100000000D85505410000004073D35A4100000000D8550541000000E07DD35A4100000000C44C0541000000E07DD35A4100000000C44C0541000000C05CD35A4100000000484B0541000000C05CD35A4100000000484B0541000000C057D35A4100000000E83B0541000000C057D35A4100000000E83B0541000000EE8BD35A410000000038410541000000EE8BD35A410000000038410541000000C096D35A4100000000B8480541000000C096D35A4100000000B8480541000000C0FCD35A410000008098480541000000C0FCD35A410000008098480541000000BCF6D35A4100000000AC3B0541000000BCF6D35A4100000000AC3B0541000000C04AD45A41000000801B420541000000C04AD45A41000000801B420541000000C06AD45A4100000000063C0541000000C06AD45A4100000000063C054100000040FCD45A4100000040A741054100000040FCD45A4100000040A7410541000000C0FED45A41000000801B420541000000C0FED45A41000000801B420541000000C076D55A410000000048450541000000C076D55A41000000004845054100000040B0D55A41000000009845054100000040B0D55A410000000098450541000000C0B2D55A410000000058590541000000C0B2D55A410000000058590541000000C07BD55A410000000088600541000000C07BD55A410000000088600541000000C076D55A410000000078610541000000C076D55A410000000078610541000000C067D55A4100000000186B0541000000C067D55A4100000000186B0541000000C06CD55A410000000008710541000000C06CD55A410000000008710541000000C0ADD55A4100000000587B0541000000C0ADD55A4100000000587B054100000050EBD55A4100000080937E054100000050EBD55A4100000080937E0541000000C0EDD65A41000000800DBA0541000000C0EDD65A41000000800DBA0541000000C00AD55A4100000000B8B20541000000C00AD55A4100000000B8B205410000004072D45A410000000048AF05410000004072D45A410000000048AF054100000040E9D35A4100000000D8B0054100000040E9D35A4100000000D8B00541000000C0D7D35A4100000000F8CC0541000000C0D7D35A4100000000F8CC0541000000C012D25A4100000000D0800541000000C012D25A41'',f.mbr,1)
                                 for update
                               ) as r;"
PL/pgSQL function resolve_overlap_gap_single_cell(character varying,character varying,character varying,character varying,character varying,double precision,integer,boolean,resolve_overlap_data_clean_type,character varying,character varying,geometry,integer,integer) line 577 at EXECUTE
while executing query on dblink connection named "conn14"
SQL statement "select val           from dblink_get_result(conntions_array[i]) as d(val text)"
PL/pgSQL function execute_parallel(text[],integer,boolean,text,boolean) line 118 at SQL statement
SQL statement "SELECT execute_parallel (stmts_final, _max_parallel_jobs,true,null,_contiune_after_stat_exception)"
PL/pgSQL function resolve_overlap_gap_run(resolve_overlap_data_input_type,resolve_overlap_data_topology_type,resolve_overlap_data_clean_type,integer,integer,boolean) line 148 at SQL statement' detail :  hint : An error happend in one the statemnet, please chem them  context: PL/pgSQL function execute_parallel(text[],integer,boolean,text,boolean) line 200 at RAISE
SQL statement "SELECT execute_parallel (stmts_final, _max_parallel_jobs,true,null,_contiune_after_stat_exception)"
PL/pgSQL function resolve_overlap_gap_run(resolve_overlap_data_input_type,resolve_overlap_data_topology_type,resolve_overlap_data_clean_type,integer,integer,boolean) line 148 at SQL statement
CONTEXT:  PL/pgSQL function resolve_overlap_gap_run(resolve_overlap_data_input_type,resolve_overlap_data_topology_type,resolve_overlap_data_clean_type,integer,integer,boolean) line 168 at RAISE

real    160m29.104s
user    0m0.193s
sys     0m0.319s

@larsop
Copy link
Owner

larsop commented Jun 29, 2020

In the table test_topo_ar50_t7.ar50_utvikling_flate_no_cut_line_failed there is one entry with this error.

d_msg         | deadlock detected
d_detail      | Process 11884 waits for ShareLock on transaction 341384924; blocked by process 11881.                                                                                      

SQL statement "WITH newedges(edge_id,right_face) AS ( VALUES (178910,88178),(178602,88178),(179196,88178),(174575,88178),(176337,88178),(182525,88178),(175081,88178),(183079,88178),(181430,88178),(183192,88178),(178971,88178),(174243,88178),(183206,88178),(183246,88178),(176113,88178),(177920,88178),(179649,88178),(183476,88178),(176272,88178),(176459,88178)......................
 UPDATE "test_topo_ar50_t7".edge_data e SET right_face = o.right_face FROM newedges o WHERE e.edge_id = o.edge_id"+

@strk
Copy link
Collaborator Author

strk commented Jun 29, 2020 via email

@larsop
Copy link
Owner

larsop commented Jul 6, 2020

I did a test now on ar50 data_set

CALL resolve_overlap_gap_run(
('sl_esh.ar50_utvikling_flate','sl_sdeid','geo',25833,true),
('test_topo_ar50_t7',1.0),
resolve_overlap_data_clean_type_func(300,9,500,3,140,120,240,35,325),
30,1000,true,false)"

and the ' select * from topology.ValidateTopology('test_topo_ar50_t7')' returned no rows.

after extend the blocking area.

I will continue to do more checking here, just confirm that I still have parallel queries running at stage 3.

@strk
Copy link
Collaborator Author

strk commented Jul 6, 2020

Please remember that SOME invalidities are currently NOT caught by ValidateTopology: https://trac.osgeo.org/postgis/ticket/3042
I think that ticket should be worked on. It might take a full day to implement the edge-linking checks.

@larsop
Copy link
Owner

larsop commented Jul 7, 2020

Sad to say we did get invalid topologies when running job_type 3 in multithreaded mode after a couple of mere runs.
Running step 3 in a single thread did not produce any error after a couple of times.

@larsop
Copy link
Owner

larsop commented Jul 9, 2020

I have done some more testing on layer that have two areas that are not spatially connected. When adjusting the blocking area so it's always cover all off one of the spatial areas, we don't get any topo errors. This is tested 3 times now.

So the problem is find the correct blocking area , I will do some more testing on this.

@larsop
Copy link
Owner

larsop commented Jul 9, 2020

I am now trying to compute blocking area based the input data.
1)I find all input polygons that is less than the tolerance distance from the lines to add and make envelope for this results
2) Do same as in 1 , but with the result from 1 as input.

@larsop
Copy link
Owner

larsop commented Jul 10, 2020

I have been running 5 times with this and I have not seen any topo error yet.

I also did a test with no row locking and only spatial blocking then I got 5 topo errors.

So the combination of row level and spatial blocking seems to work now.

Sandro has had two comments and I will check out then now

  1. Do row level blocking in one transaction
  2. Extend row level locking include this

I will also try to a test with only row level locking to see how that works out.

@larsop
Copy link
Owner

larsop commented Jul 10, 2020

Sandro has had two comments and I will check out then now

  1. Do row level blocking in one transaction
  2. Extend row level locking include this

Here I have tried resolve case 1 og 2. And I have also reduced the spatial blocking area.

Row level locking in Postgres seems to be working very well if I lock many thousand rows og 30 threads in parallel.

It takes some hours to test this, because just validate it takes more than one hour.

@larsop
Copy link
Owner

larsop commented Jul 10, 2020

Given we're using small geometries, can you ensure the second "thread" (process) is really started before the first one commits its transaction ? Can add a pg_sleep() be added as part as the "same statement" ?

Yes, you where right adding a sleep made the test fail and we git this result.
(Running in one thread no errors)

?column? | error | id1 | id2
------------+------------------+-----+-----
validation | face within face | 1 | 2
validation | face within face | 2 | 1
validation | face within face | 1 | 3
validation | face within face | 2 | 3

In this test_add_line_do_row_level_lock.txt file I have done row level like here and the topo errors we had are gone.

@strk
Copy link
Collaborator Author

strk commented Jul 10, 2020 via email

@larsop
Copy link
Owner

larsop commented Jul 10, 2020

It would be great to attach the testcase to ticket https://trac.osgeo.org/postgis/ticket/4684 One day we might implement row-level locking in PostGIS Topology itself...

Sure, I will do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants