Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault /paging issue for DBs with big buffers #742

Open
mybyte opened this issue May 17, 2018 · 17 comments
Open

Segfault /paging issue for DBs with big buffers #742

mybyte opened this issue May 17, 2018 · 17 comments

Comments

@mybyte
Copy link

mybyte commented May 17, 2018

When setting the buffers to high numbers, virtuoso crashes with a segfault:

NumberOfBuffers          =85000000
MaxDirtyBuffers          = 65000000

It seems to crash around 670 GB of virtual memory.

Thu May 17 2018
14:05:57 OpenLink Virtuoso Universal Server
14:05:57 Version 07.20.3217-pthreads for Linux as of Jul  3 2017
14:05:57 uses parts of OpenSSL, PCRE, Html Tidy
14:08:56 SQL Optimizer enabled (max 1000 layouts)
14:08:59 Row ref 256 is out of range 4
14:08:59 ./virtuoso-t() [0x8adc23]
14:08:59 ./virtuoso-t() [0x8adc88]
14:08:59 ./virtuoso-t() [0x4dad0e]
14:08:59 ./virtuoso-t() [0x4dcc59]
14:08:59 ./virtuoso-t() [0x4fa7e9]
14:08:59 ./virtuoso-t() [0x4fdfea]
14:08:59 ./virtuoso-t() [0x622255]
14:08:59 ./virtuoso-t() [0x5b8277]
14:08:59 ./virtuoso-t() [0x5bda58]
14:08:59 ./virtuoso-t() [0x5b6ee3]
14:08:59 ./virtuoso-t() [0x5b7160]
14:08:59 ./virtuoso-t() [0x5ec9bd]
14:08:59 ./virtuoso-t() [0x5b6ee3]
14:08:59 ./virtuoso-t() [0x5c1203]
14:08:59 ./virtuoso-t() [0x5c2692]
14:08:59 ./virtuoso-t() [0x4ca00c]
14:08:59 ./virtuoso-t() [0x4ca789]
14:08:59 ./virtuoso-t() [0x4683a4]
14:08:59 ./virtuoso-t() [0x46fb04]
14:08:59 ./virtuoso-t() [0x470008]
14:08:59 ./virtuoso-t() [0x5bb9be]
14:08:59 ./virtuoso-t() [0x5bc17c]
14:08:59 ./virtuoso-t() [0x5b6ee3]
14:08:59 ./virtuoso-t() [0x5c1203]
14:08:59 ./virtuoso-t() [0x5c24b9]
14:08:59 ./virtuoso-t() [0x46c0ef]
14:08:59 ./virtuoso-t() [0x47030a]
14:08:59 ./virtuoso-t() [0x5c6a57]
14:08:59 ./virtuoso-t() [0x40e4b0]
14:08:59 /lib64/libc.so.6(__libc_start_main+0xf1) [0x7f01d7424401]
14:08:59 ./virtuoso-t() [0x40ea2a]
14:08:59 GPF: page.c:175 prefix row ref out of pm range
GPF: page.c:175 prefix row ref out of pm range
Segmentation fault

Tested with different builds (commits from beginning of 2017 up until latest) of VOS. On different machines (RHEL, Fedora). RHEL builds seem to be less verbose about the cause, but segfaults (and memory consumption at which they occur) seem to be consistent. Completely reproducible with any fresh db.

Configuration:

[Database]
DatabaseFile			= /mnt/ssd/virttest/virtuoso.db
ErrorLogFile			= /mnt/ssd/virttest/virtuoso.log
LockFile			= /mnt/ssd/virttest/virtuoso.lck
TransactionFile			= /mnt/ssd/virttest/virtuoso.trx
xa_persistent_file		= /mnt/ssd/virttest/virtuoso.pxa
ErrorLogLevel			= 7
FileExtend			= 200
MaxCheckpointRemap		= 15000000
Striping			= 0
TempStorage			= TempDatabase
TransactionAfterImageLimit = 99999999

[TempDatabase]
DatabaseFile			= /mnt/ssd/virttest/virtuoso-temp.db
TransactionFile			= /mnt/ssd/virttest/virtuoso-temp.trx
MaxCheckpointRemap		= 2000
Striping			= 0

[Parameters]
TransactionAfterImageLimit = 99999999
ServerPort			= 1111
LiteMode			= 0
DisableUnixSocket		= 1
DisableTcpSocket		= 0
MaxClientConnections		= 150
CheckpointInterval		= -1
O_DIRECT			= 0
CaseMode			= 2
MaxStaticCursorRows		= 5000
CheckpointAuditTrail		= 0
AllowOSCalls			= 0
SchedulerInterval		= 10
DirsAllowed			= .., /usr/local/virtuoso-opensource/share/virtuoso/vad, /mnt/datengrab, /mnt/ssd/rdfData, /home/kati/workspace/WCA_Services/geodata, /home/kati/workspace/WCA_Services, /mnt/ssd/serviceData/triple, /home/kati/Kati/Services, /mnt/ssd/wostriple/, /home/kati/workspace/WCA_Services/dumps, /mnt/ssd/serviceData/dumps, /mnt/ssd/serviceData/geodata
ThreadCleanupInterval		= 0
ThreadThreshold			= 10
ResourcesCleanupInterval	= 0
FreeTextBatchSize		= 100000
SingleCPU			= 0
VADInstallDir			= /usr/local/virtuoso-opensource/share/virtuoso/vad/
PrefixResultNames               = 0
RdfFreeTextRulesSize		= 100
IndexTreeMaps			= 1024
MaxMemPoolSize                  = 200000000
PrefixResultNames               = 0
MacSpotlight                    = 0
IndexTreeMaps                   = 64
MaxQueryMem 		 	= 20G		; memory allocated to query processor
VectorSize 		 	= 15000		; initial parallel query vector (array of query operations) size
MaxVectorSize 		 	= 1000000	; query vector size threshold.
AdjustVectorSize 	 	= 1
ThreadsPerQuery 	 	= 70
AsyncQueueMaxThreads 	 	= 65

NumberOfBuffers          =85000000
MaxDirtyBuffers          = 65000000

[HTTPServer]
ServerPort			= 8890
ServerRoot			= /usr/local/virtuoso-opensource/var/lib/virtuoso/vsp
MaxClientConnections		= 150
DavRoot				= DAV
EnabledDavVSP			= 0
HTTPProxyEnabled		= 0
TempASPXDir			= 0
DefaultMailServer		= localhost:25
ServerThreads			= 64
MaxKeepAlives			= 10
KeepAliveTimeout		= 10
MaxCachedProxyConnections	= 10
ProxyConnectionCacheTimeout	= 15
HTTPThreadSize			= 280000
HttpPrintWarningsInOutput	= 0
Charset				= UTF-8
MaintenancePage             	= atomic.html
EnabledGzipContent          	= 1


[AutoRepair]
BadParentLinks			= 0

[Client]
SQL_PREFETCH_ROWS		= 100
SQL_PREFETCH_BYTES		= 16000
SQL_QUERY_TIMEOUT		= 0
SQL_TXN_TIMEOUT			= 0

[VDB]
ArrayOptimization		= 0
NumArrayParameters		= 10
VDBDisconnectTimeout		= 1000
KeepConnectionOnFixedThread	= 0

[Replication]
ServerName			= db-KS2
ServerEnable			= 1
QueueMax			= 50000

[URIQA]
DynamicLocal			= 0
DefaultHost			= localhost:8890


[SPARQL]
ResultSetMaxRows           	= 1000000000
MaxQueryCostEstimationTime 	= 400000	; in seconds
MaxQueryExecutionTime      	= 86400	; in seconds
DefaultQuery               	= select distinct ?pub where {?pub a fhg:Publication} LIMIT 100
DeferInferenceRulesInit    	= 0  ; controls inference rules loading

[Plugins]
LoadPath			= /usr/local/virtuoso-opensource/lib/virtuoso/hosting
@ffritsche
Copy link

ffritsche commented May 17, 2018

The Redhat maschine was not so verbose.

                Thu May 17 2018
14:30:17 INFO: OpenLink Virtuoso Universal Server
14:30:17 INFO: Version 07.20.3217-pthreads for Linux as of May 17 2018
14:30:17 INFO: uses parts of OpenSSL, PCRE, Html Tidy
14:32:19 INFO: Database version 3126
14:32:20 INFO: SQL Optimizer enabled (max 1000 layouts)
Segmentation fault (core dumped)

/var/log/messages:
kernel: traps: virtuoso-t[38243] general protection ip:ad36ce sp:7ffefe99e3e0 error:0 in virtuoso-t[400000+e19000]

gdb dump:

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/virtuoso-opensource/bin/virtuoso-t...done.
[New LWP 38243]
[New LWP 38244]
[New LWP 38245]
[New LWP 38246]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./virtuoso-t -dfc /mnt/ssd/virttest/virtuoso.ini'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000ad36ce in malloc_ex (size=160, mem_pool=0x7fc61d93c000) at tlsf.c:592
592     tlsf.c: No such file or directory.
(gdb) bt
#0  0x0000000000ad36ce in malloc_ex (size=160, mem_pool=0x7fc61d93c000) at tlsf.c:592
#1  0x00000000004a07d1 in pm_get (buf=0x7fc686b49528, sz=75) at disk.c:2099
#2  0x00000000004f15f4 in pg_make_map (buf=0x7fc686b49528) at insert.c:157
#3  0x00000000004a0bdf in buf_disk_read (buf=0x7fc686b49528) at disk.c:2198
#4  0x00000000004b8cfd in page_wait_access (itc=0xa274119838, dp=66, buf_from=0x0, buf_ret=0x7ffefe99e6b0, mode=0, max_change=5) at gate.c:233
#5  0x00000000004bd294 in itc_reset (it=0xa274119838) at gate.c:1409
#6  0x0000000000515241 in isp_read_schema (lt=0xa273f67f40) at meta.c:2362
#7  0x000000000048fc34 in ddl_init_schema () at ddlrun.c:1780
#8  0x00000000006a2876 in srv_global_init (mode=0xb75888 "") at sqlsrv.c:4048
#9  0x0000000000411100 in main (argc=3, argv=0x2e82790) at viunix.c:677

Same virtuoso.ini as above:

Package Dependencies:

Package  		req Version  	installed
-------  		-----------  	--------- 
autoconf 		2.57		2.69
automake 		1.9		1.13.4-3
libtool  		1.5.16		2.4.2-22
flex     		2.5.33		2.5.37
bison    		2.3		3.0.4
gperf    		2.7.2		3.0.4-8
gawk     		3.1.1		4.0.2
m4       		1.4.1		1.4.16
make     		3.79.1		3.82
OpenSSL  		0.9.7i		1.0.2k
openssl-devel				1:1.0.2k-12.el7 (was req for compile)

htop while core dumping:
https://imgur.com/WwqBih6

I hope someone can help us

@TallTed
Copy link
Collaborator

TallTed commented May 17, 2018

How much memory (both in-total and available-to-Virtuoso) is actually installed in your test environments?

Have you tweaked "swappiness"?

@ffritsche
Copy link

The host machine has 1,5 TB memory
The dedicated vmware guest has 1100 GB memory

swappiness is still at default. It's worth a try, but the seqfault always happens at 666/333 GB.

@mybyte
Copy link
Author

mybyte commented May 17, 2018

Yep. Same here. VMWare guest at about 800 GB of memory. Virtuoso crashes at about 674 gb virtual memory allocation. Way before the machine runs out of memory or feels any need to swap.
If I'd have to guess - "Row ref 256" sounds too much of a magic number to ignore. Could it be that virtuoso runs out of space for page pointers to allocate that many pages?

@ffritsche
Copy link

free -m:

		total		used		free		shared	buff/cache		available
Mem:	1108481	85712	846630	9		176138		1020923
Swap:	4095	0		4095

@ffritsche
Copy link

Today I compiled the last git version and have now the same error at around 666GB as @mybyte.

Fri Jul 06 2018
14:57:23 INFO: OpenLink Virtuoso Universal Server
14:57:23 INFO: Version 07.20.3217-pthreads for Linux as of Jul  6 2018
14:57:23 INFO: uses parts of OpenSSL, PCRE, Html Tidy
14:59:41 INFO: SQL Optimizer enabled (max 1000 layouts)
14:59:43 ERROR: Row ref 256 is out of range 4
14:59:43 INFO: ./virtuoso-t() [0x8c500a]
14:59:43 INFO: ./virtuoso-t() [0x8c5068]
14:59:43 INFO: ./virtuoso-t() [0x4df86f]
14:59:43 INFO: ./virtuoso-t() [0x4e17f8]
14:59:43 INFO: ./virtuoso-t() [0x500025]
14:59:43 INFO: ./virtuoso-t() [0x503a0a]
14:59:43 INFO: ./virtuoso-t() [0x62d3a5]
14:59:43 INFO: ./virtuoso-t() [0x5c2109]
14:59:43 INFO: ./virtuoso-t(table_source_input+0x270) [0x5c7920]
14:59:43 INFO: ./virtuoso-t() [0x5c0d7f]
14:59:43 INFO: ./virtuoso-t() [0x5c0ff0]
14:59:43 INFO: ./virtuoso-t() [0x5f76fd]
14:59:43 INFO: ./virtuoso-t() [0x5c0d7f]
14:59:43 INFO: ./virtuoso-t() [0x5cb211]
14:59:43 INFO: ./virtuoso-t() [0x5cc909]
14:59:43 INFO: ./virtuoso-t() [0x4ce5c9]
14:59:43 INFO: ./virtuoso-t() [0x4ced79]
14:59:43 INFO: ./virtuoso-t() [0x46a7f9]
14:59:43 INFO: ./virtuoso-t() [0x472677]
14:59:43 INFO: ./virtuoso-t() [0x472c44]
14:59:43 INFO: ./virtuoso-t() [0x5c5866]
14:59:43 INFO: ./virtuoso-t() [0x5c6020]
14:59:43 INFO: ./virtuoso-t() [0x5c0d7f]
14:59:43 INFO: ./virtuoso-t() [0x5cb211]
14:59:43 INFO: ./virtuoso-t() [0x5cc709]
14:59:43 INFO: ./virtuoso-t() [0x46e86f]
14:59:43 INFO: ./virtuoso-t() [0x472eaa]
14:59:43 INFO: ./virtuoso-t() [0x5d0d97]
14:59:43 INFO: ./virtuoso-t() [0x40f7c0]
14:59:43 INFO: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7ffb0cb433d5]
14:59:43 INFO: ./virtuoso-t() [0x40fd34]
14:59:43 ERROR: GPF: page.c:175 prefix row ref out of pm range
GPF: page.c:175 prefix row ref out of pm range
Segmentation fault

Nobody here who could help us?

@pkleef
Copy link
Collaborator

pkleef commented Jul 10, 2018

@ffritsche development is currently looking into this crash. i will advice as soon as we have a solution

@ffritsche
Copy link

Thanks a lot!

@ffritsche
Copy link

An update from my side.
I installed the new 7.2.5.1 precompiled version on a fresh Redhat 7.5 Server with 1GB Ram and have the same problem.
Segmentation fault at around 666GB virtual size and 333GB resident size.

@TallTed
Copy link
Collaborator

TallTed commented Aug 27, 2018

@ffritsche Just confirming a probable typo -- you said "a fresh Redhat 7.5 Server with 1GB Ram", where I think you meant "with 1TB RAM"?

@ffritsche
Copy link

yeah 1TB
So much RAM ^^

@ffritsche
Copy link

ffritsche commented Sep 10, 2018

I have an other update.
Our DB runs now 1 month with a config for 600 GB free memory.

NumberOfBuffers  = 51000000
MaxDirtyBuffers    = 39000000

Virtuoso starts with this configuration at around 400GB virtual size and 200GB resident size memory.
After 1 month our DB is now at 747 VS and 721 RS memory, thus over the 666 GB without segfault.
It only gets a segfault on startup with more than 666GB.
The other question is, why does it consume so much more memory with a configuration of only 600GB?

I hope this helps

@HughWilliams
Copy link
Collaborator

HughWilliams commented Sep 15, 2018

@ffritsche: Note the NumberOfBuffers INI file param only controls the amount of memory available for hosting the database working set; there are other INI file params and activities for which Virtuoso will allocate memory. See this spreadsheet which details the memory consumption based on INI file settings, for live services we and others host.

I also note that you are using a 07.20.3217 version binary which is rather old, as we have recently made a 07.20.3229 (a/k/a Open Source v7.2.5) stable/7 release with many memory consumption enhancements and fixes.

I also note that you have the following set in your INI file:

ThreadCleanupInterval      = 0
ResourcesCleanupInterval   = 0

which means unused threads/resources are not cleaned up, on the assumption they will soon be reused, which can be construed as excessive memory consumption (or memory leak). We suggest these both be set to 1 to force cleanup.

@ffritsche
Copy link

Update:
Today I tried the newest version/commit and the segfault still exists an crashes at around 666GB on startup.

@HughWilliams
Copy link
Collaborator

Development suggest in libsrc/Wi/wi.h on line 38 comment out #define PM_TLSF 1 so it reads:

#if defined(linux)
//#define PM_TLSF 1
#endif

then perform a full rebuild (make clean, make) and test to see if this resolves the problem.

Should the problem still persist please provide a gdb stack trace from core file as done previously for review ...

@ffritsche
Copy link

@HughWilliams Thanks that does the trick. Does this have any impact on the system?

@HughWilliams
Copy link
Collaborator

HughWilliams commented Sep 10, 2019

There may be extra memory usage due to more memory fragmentation, thus this should be monitored.

You should also run the status(); command from isql and see how many of the allocated buffers are "used" when application running under full load. If there are many unused buffers, you should consider reducing the NumberOfBuffers to release more memory for use by the OS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants