Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

数据库运行一段时间出现无法连接的情况 #215

Closed
ContyChen opened this issue Dec 28, 2023 · 15 comments
Closed

数据库运行一段时间出现无法连接的情况 #215

ContyChen opened this issue Dec 28, 2023 · 15 comments

Comments

@ContyChen
Copy link

版本:5.2.1

数据库运行一段时间后无法连接,内存占用巨大
ERROR [ScheduleService-15] [2023-12-28T00:53:28,003] Failed to create session, sessionId: 5 org.lealone.common.exceptions.JdbcSQLException: Database "wms_cloud" not found [90013-1] at org.lealone.common.exceptions.DbException.getJdbcSQLException(DbException.java:342) ~[lealone-5.2.1.jar:?] at org.lealone.common.exceptions.DbException.get(DbException.java:173) ~[lealone-5.2.1.jar:?] at org.lealone.common.exceptions.DbException.get(DbException.java:150) ~[lealone-5.2.1.jar:?] at org.lealone.db.LealoneDatabase.getDatabase(LealoneDatabase.java:144) ~[lealone-5.2.1.jar:?] at org.lealone.db.session.ServerSessionFactory.createServerSession(ServerSessionFactory.java:49) ~[lealone-5.2.1.jar:?] at org.lealone.db.session.ServerSessionFactory.createSession(ServerSessionFactory.java:40) ~[lealone-5.2.1.jar:?] at org.lealone.db.ConnectionInfo.createSession(ConnectionInfo.java:652) ~[lealone-5.2.1.jar:?] at org.lealone.server.TcpServerConnection.createSession(TcpServerConnection.java:104) [lealone-5.2.1.jar:?] at org.lealone.server.SessionInitTask.run(SessionInitTask.java:26) [lealone-5.2.1.jar:?] at org.lealone.server.TcpServerConnection.readInitPacket(TcpServerConnection.java:96) [lealone-5.2.1.jar:?] at org.lealone.server.TcpServerConnection.handleRequest(TcpServerConnection.java:69) [lealone-5.2.1.jar:?] at org.lealone.net.TransferConnection.handle(TransferConnection.java:127) [lealone-5.2.1.jar:?] at org.lealone.net.nio.NioEventLoop.read(NioEventLoop.java:230) [lealone-5.2.1.jar:?] at org.lealone.server.Scheduler.handleSelectedKeys(Scheduler.java:422) [lealone-5.2.1.jar:?] at org.lealone.server.Scheduler.runEventLoop(Scheduler.java:408) [lealone-5.2.1.jar:?] at org.lealone.server.Scheduler.run(Scheduler.java:111) [lealone-5.2.1.jar:?] ERROR [ScheduleService-16] [2023-12-28T00:53:31,396] Failed to create session, sessionId: 3 org.lealone.common.exceptions.JdbcSQLException: Database "wms_cloud" not found [90013-1] at org.lealone.common.exceptions.DbException.getJdbcSQLException(DbException.java:342) ~[lealone-5.2.1.jar:?] at org.lealone.common.exceptions.DbException.get(DbException.java:173) ~[lealone-5.2.1.jar:?] at org.lealone.common.exceptions.DbException.get(DbException.java:150) ~[lealone-5.2.1.jar:?] at org.lealone.db.LealoneDatabase.getDatabase(LealoneDatabase.java:144) ~[lealone-5.2.1.jar:?] at org.lealone.db.session.ServerSessionFactory.createServerSession(ServerSessionFactory.java:49) ~[lealone-5.2.1.jar:?] at org.lealone.db.session.ServerSessionFactory.createSession(ServerSessionFactory.java:40) ~[lealone-5.2.1.jar:?] at org.lealone.db.ConnectionInfo.createSession(ConnectionInfo.java:652) ~[lealone-5.2.1.jar:?] at org.lealone.server.TcpServerConnection.createSession(TcpServerConnection.java:104) [lealone-5.2.1.jar:?] at org.lealone.server.SessionInitTask.run(SessionInitTask.java:26) [lealone-5.2.1.jar:?] at org.lealone.server.TcpServerConnection.readInitPacket(TcpServerConnection.java:96) [lealone-5.2.1.jar:?] at org.lealone.server.TcpServerConnection.handleRequest(TcpServerConnection.java:69) [lealone-5.2.1.jar:?] at org.lealone.net.TransferConnection.handle(TransferConnection.java:127) [lealone-5.2.1.jar:?] at org.lealone.net.nio.NioEventLoop.read(NioEventLoop.java:230) [lealone-5.2.1.jar:?] at org.lealone.server.Scheduler.handleSelectedKeys(Scheduler.java:422) [lealone-5.2.1.jar:?] at org.lealone.server.Scheduler.runEventLoop(Scheduler.java:408) [lealone-5.2.1.jar:?] at org.lealone.server.Scheduler.run(Scheduler.java:111) [lealone-5.2.1.jar:?]

@ContyChen
Copy link
Author

ContyChen commented Dec 28, 2023

db_0.trace.db有错误信息

2023-12-27 21:18:04 database: CREATE DATABASE IF NOT EXISTS WMS_CLOUD RUN MODE CLIENT_SERVER PARAMETERS(DATABASE_TO_UPPER='FALSE')
org.lealone.common.exceptions.JdbcSQLException: Out of memory.; SQL statement:
CREATE CACHED TABLE PUBLIC."sys_oper_log" COMMENT STRINGDECODE('\u64cd\u4f5c\u65e5\u5fd7\u8bb0\u5f55')(
    "oper_id" BIGINT NOT NULL SELECTIVITY 100 COMMENT STRINGDECODE('\u65e5\u5fd7\u4e3b\u952e'),
    "title" VARCHAR(50) DEFAULT '' SELECTIVITY 1 COMMENT STRINGDECODE('\u6a21\u5757\u6807\u9898'),
    "business_type" INT DEFAULT 0 SELECTIVITY 1 COMMENT STRINGDECODE('\u4e1a\u52a1\u7c7b\u578b\uff080\u5176\u5b83 1\u65b0\u589e 2\u4fee\u6539 3\u5220\u9664\uff09'),
    "method" VARCHAR(100) DEFAULT '' SELECTIVITY 2 COMMENT STRINGDECODE('\u65b9\u6cd5\u540d\u79f0'),
    "request_method" VARCHAR(10) DEFAULT '' SELECTIVITY 1 COMMENT STRINGDECODE('\u8bf7\u6c42\u65b9\u5f0f'),
    "operator_type" INT DEFAULT 0 SELECTIVITY 1 COMMENT STRINGDECODE('\u64cd\u4f5c\u7c7b\u522b\uff080\u5176\u5b83 1\u540e\u53f0\u7528\u6237 2\u624b\u673a\u7aef\u7528\u6237\uff09'),
    "oper_name" VARCHAR(50) DEFAULT '' SELECTIVITY 5 COMMENT STRINGDECODE('\u64cd\u4f5c\u4eba\u5458'),
    "dept_name" VARCHAR(50) DEFAULT '' SELECTIVITY 1 COMMENT STRINGDECODE('\u90e8\u95e8\u540d\u79f0'),
    "oper_url" VARCHAR(255) DEFAULT '' SELECTIVITY 2 COMMENT STRINGDECODE('\u8bf7\u6c42URL'),
    "oper_ip" VARCHAR(128) DEFAULT '' SELECTIVITY 4 COMMENT STRINGDECODE('\u4e3b\u673a\u5730\u5740'),
    "oper_location" VARCHAR(255) DEFAULT '' SELECTIVITY 1 COMMENT STRINGDECODE('\u64cd\u4f5c\u5730\u70b9'),
    "oper_param" VARCHAR(2000) DEFAULT '' SELECTIVITY 95 COMMENT STRINGDECODE('\u8bf7\u6c42\u53c2\u6570'),
    "json_result" VARCHAR(2000) DEFAULT '' SELECTIVITY 65 COMMENT STRINGDECODE('\u8fd4\u56de\u53c2\u6570'),
    "status" INT DEFAULT 0 SELECTIVITY 1 COMMENT STRINGDECODE('\u64cd\u4f5c\u72b6\u6001\uff080\u6b63\u5e38 1\u5f02\u5e38\uff09'),
    "error_msg" VARCHAR(2000) DEFAULT '' SELECTIVITY 1 COMMENT STRINGDECODE('\u9519\u8bef\u6d88\u606f'),
    "oper_time" DATETIME SELECTIVITY 100 COMMENT STRINGDECODE('\u64cd\u4f5c\u65f6\u95f4')
) [90108-1]
	at org.lealone.common.exceptions.DbException.getJdbcSQLException(DbException.java:342)
	at org.lealone.common.exceptions.DbException.get(DbException.java:162)
	at org.lealone.common.exceptions.DbException.convert(DbException.java:287)
	at org.lealone.sql.executor.YieldableBase.handleException(YieldableBase.java:171)
	at org.lealone.sql.executor.YieldableBase.run(YieldableBase.java:124)
	at org.lealone.sql.StatementBase.syncExecute(StatementBase.java:507)
	at org.lealone.sql.StatementBase.executeUpdate(StatementBase.java:536)
	at org.lealone.db.MetaRecord.execute(MetaRecord.java:69)
	at org.lealone.db.MetaRecord.execute(MetaRecord.java:60)
	at org.lealone.db.Database.openMetaTable(Database.java:535)
	at org.lealone.db.Database.openDatabase(Database.java:475)
	at org.lealone.db.Database.init(Database.java:414)
	at org.lealone.sql.ddl.CreateDatabase.update(CreateDatabase.java:62)
	at org.lealone.sql.executor.YieldableLocalUpdate.executeInternal(YieldableLocalUpdate.java:23)
	at org.lealone.sql.executor.YieldableBase.run(YieldableBase.java:115)
	at org.lealone.sql.StatementBase.syncExecute(StatementBase.java:507)
	at org.lealone.sql.StatementBase.executeUpdate(StatementBase.java:536)
	at org.lealone.db.MetaRecord.execute(MetaRecord.java:69)
	at org.lealone.db.LealoneDatabase.getDatabase(LealoneDatabase.java:137)
	at org.lealone.db.session.ServerSessionFactory.createServerSession(ServerSessionFactory.java:49)
	at org.lealone.db.session.ServerSessionFactory.createSession(ServerSessionFactory.java:40)
	at org.lealone.db.ConnectionInfo.createSession(ConnectionInfo.java:652)
	at org.lealone.server.TcpServerConnection.createSession(TcpServerConnection.java:104)
	at org.lealone.server.SessionInitTask.run(SessionInitTask.java:26)
	at org.lealone.server.TcpServerConnection.readInitPacket(TcpServerConnection.java:96)
	at org.lealone.server.TcpServerConnection.handleRequest(TcpServerConnection.java:69)
	at org.lealone.net.TransferConnection.handle(TransferConnection.java:127)
	at org.lealone.net.nio.NioEventLoop.read(NioEventLoop.java:230)
	at org.lealone.server.Scheduler.handleSelectedKeys(Scheduler.java:422)
	at org.lealone.server.Scheduler.runEventLoop(Scheduler.java:408)
	at org.lealone.server.Scheduler.run(Scheduler.java:111)
Caused by: java.lang.OutOfMemoryError: Cannot reserve 307836 bytes of direct buffer memory (allocated: 1073450399, limit: 1073741824)
	at java.base/java.nio.Bits.reserveMemory(Bits.java:178)
	at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:121)
	at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:332)
	at java.base/sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:243)
	at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:293)
	at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:273)
	at java.base/sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:839)
	at java.base/sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:821)
	at org.lealone.storage.fs.impl.nio.FileNio.read(FileNio.java:60)
	at org.lealone.common.util.DataUtils.readFully(DataUtils.java:375)
	at org.lealone.storage.fs.FileStorage.readFully(FileStorage.java:470)
	at org.lealone.storage.aose.btree.chunk.Chunk.readPagePositions(Chunk.java:147)
	at org.lealone.storage.aose.btree.chunk.Chunk.read(Chunk.java:196)
	at org.lealone.storage.aose.btree.chunk.ChunkManager.readChunk(ChunkManager.java:119)
	at org.lealone.storage.aose.btree.chunk.ChunkManager.readLastChunk(ChunkManager.java:64)
	at org.lealone.storage.aose.btree.chunk.ChunkManager.init(ChunkManager.java:58)
	at org.lealone.storage.aose.btree.BTreeStorage.<init>(BTreeStorage.java:95)
	at org.lealone.storage.aose.btree.BTreeMap.<init>(BTreeMap.java:111)
	at org.lealone.storage.aose.AOStorage.openBTreeMap(AOStorage.java:84)
	at org.lealone.storage.aose.AOStorage.openMap(AOStorage.java:63)
	at org.lealone.storage.aose.AOStorage.openMap(AOStorage.java:57)
	at org.lealone.transaction.aote.AOTransaction.openMap(AOTransaction.java:140)
	at org.lealone.transaction.aote.AOTransaction.openMap(AOTransaction.java:30)
	at org.lealone.db.index.standard.StandardPrimaryIndex.<init>(StandardPrimaryIndex.java:78)
	at org.lealone.db.table.StandardTable.<init>(StandardTable.java:93)
	at org.lealone.db.schema.Schema.createTable(Schema.java:751)
	at org.lealone.sql.ddl.CreateTable.update(CreateTable.java:185)
	at org.lealone.sql.executor.YieldableLocalUpdate.executeInternal(YieldableLocalUpdate.java:23)
	at org.lealone.sql.executor.YieldableBase.run(YieldableBase.java:115)
	... 26 more

@codefollower
Copy link
Member

wms_cloud 这个数据库找不到,执行过 drop database 语句?
内存回收的策略是:超过15分钟都没访问的 page 会被回收;如果已占用内存达到 jvm 最大内存的1/3会启动 gc。

如果一直降不下来,再试试执行 checkpoint 命令。jvm 如果可用内存很多的话,jvm 进程占用的内存看上去也会很大的,只要不出现 oom 就没事的。我做性能压测的时候,如果最大内存设置 4G,此时 jvm 进程占用的内存也会一下飙到差不多4G,过了很久 jvm 才会做 gc。设置256M内存做性能压测也一样。

@ContyChen
Copy link
Author

内存-Xmx1G ,没有执行drop。结束java进程再启动数据库可以正常访问

@codefollower
Copy link
Member

direct buffer memory 出现 oom 的话,看看是否设置了 -XX:MaxDirectMemorySize,把它去掉或调大。错误提示是 direct buffer memory 超过 1G 了,所以 oom 了。

@ContyChen
Copy link
Author

好的,我再调整试试

@ContyChen
Copy link
Author

怀疑与backup有关,执行backup后 内存就不降

@codefollower
Copy link
Member

那我试试 backup 看看能不能重现问题,执行 backup 时内部会先执行 checkpoint 刷脏页,然后如果一些 chunk 的数据很老了又会执行 chunk compact,这整个过程需要比较大的内存,如果当前可用内存不够了可能会 oom。

@codefollower
Copy link
Member

-Xmx512m 一边执行 insert 语句一边 backup 没有重现问题。

@ContyChen
Copy link
Author

backup 占用内存过大,内存释放不及时???
-Xmx512m ,连续备份两三次后出错,间隔一会儿备份就没有问题。
image

数据文件发邮件了,辛苦看一下

@codefollower
Copy link
Member

看到错误了,jvm 进程占用的内存才150M就出现 Error: Out of memory,确实有点奇怪,我排查一下原因。

@codefollower
Copy link
Member

如果在客户端想看到完整的异常堆栈,可以先执行 list 命令,之后执行的 sql 出错了就会打印完整的异常堆栈,再次执行 list 又会切换到简短模式。

@codefollower
Copy link
Member

我找到原因了,t_6_32\c_2_2.db 这个文件有200M,然后直接读到内存中就 oom 了,我得改进一下。

@codefollower
Copy link
Member

问题修复了,不过备份那个 200多M 的文件花了10几秒,ZipOutputStream 执行写操作时比较慢,跟 WinRAR 生成 zip 文件的速度差不多。

@codefollower
Copy link
Member

执行 backup 需要通过 ZipOutputStream 生成 zip 文件,如果嫌弃它慢不用 backup 也是可以的。
只需要执行 checkpoint 命令,把脏页刷到硬盘后直接拷贝 db_xxx 目录到别处即可。
backup 命令的好处是支持增量备份,在 LAST DATE 指定一个日期(或日期时间)就能实现增量备份,缺点就是慢。

@ContyChen
Copy link
Author

好的,我试一下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants