Unable to connect with database: JC_BAD_CONNECTION #235

harshajayaweeraXHJ · 2020-01-26T17:22:53Z

My application runs a sampling query periodically and it was working fine without any issue for around 30 hours. Then suddenly it gave the following error

Caused by: com.toshiba.mwcloud.gs.common.GSConnectionException: [145028:JC_BAD_CONNECTION] Failed to update by notification (address=/239.0.0.1:31999, reason=Receive timed out)
at com.toshiba.mwcloud.gs.subnet.NodeResolver.updateMasterInfo(NodeResolver.java:815)
at com.toshiba.mwcloud.gs.subnet.NodeResolver.prepareConnectionAndClusterInfo(NodeResolver.java:522)
at com.toshiba.mwcloud.gs.subnet.NodeResolver.getPartitionCount(NodeResolver.java:205)
at com.toshiba.mwcloud.gs.subnet.GridStoreChannel$5.execute(GridStoreChannel.java:2106)
at com.toshiba.mwcloud.gs.subnet.GridStoreChannel.executeStatement(GridStoreChannel.java:1675)
... 38 more
Caused by: java.net.SocketTimeoutException: Receive timed out

Why is this happening? What is the cause.

This is the output of gs_stat -u admin/admin

{
"checkpoint": {
"archiveLog": 0,
"backupOperation": 0,
"duplicateLog": 0,
"endTime": 1580053987745,
"mode": "NORMAL_CHECKPOINT",
"normalCheckpointOperation": 1470,
"pendingPartition": 0,
"periodicCheckpoint": "ACTIVE",
"requestedCheckpointOperation": 0,
"startTime": 1580053987741
},
"cluster": {
"activeCount": 0,
"autoGoal": "ACTIVE",
"clusterName": "defaultCluster",
"clusterRevisionId": "4e9be62e-7911-48a4-8d93-19af09be7a15",
"clusterRevisionNo": 17651,
"clusterStatus": "SUB_CLUSTER",
"designatedCount": 1,
"loadBalancer": "ACTIVE",
"nodeList": [
{
"address": "10.128.0.2",
"port": 10040
}
],
"nodeStatus": "ABNORMAL",
"notificationMode": "MULTICAST",
"partitionStatus": "INITIAL",
"startupTime": "2020-01-25T15:20:31.377Z",
"syncCount": 0
},
"currentTime": "2020-01-26T17:20:39Z",
"performance": {
"backupCount": 0,
"batchFree": 0,
"bufferHashCollisionCount": 0,
"checkpointFileAllocateSize": 5443584,
"checkpointFileFlushCount": 0,
"checkpointFileFlushTime": 0,
"checkpointFileSize": 5439488,
"checkpointFileUsageRate": 0.927710843373494,
"checkpointMemory": 196608,
"checkpointMemoryLimit": 1073741824,
"checkpointWriteSize": 270139392,
"checkpointWriteTime": 214,
"currentCheckpointWriteBufferSize": 0,
"currentTime": 1580059239771,
"expirationDetail": {
"autoExpire": false,
"erasableExpiredTime": "1970-01-01T00:00:00.000Z",
"latestExpirationCheckTime": "1970-01-01T00:00:00.000Z"
},
"logFileFlushCount": 8832,
"logFileFlushTime": 38224,
"numBackground": 0,
"numConnection": 2,
"numNoExpireTxn": 0,
"numSession": 0,
"numTxn": 0,
"ownerCount": 128,
"peakProcessMemory": 86626304,
"processMemory": 86626304,
"recoveryReadSize": 262144,
"recoveryReadTime": 0,
"recoveryReadUncompressTime": 0,
"storeCompressionMode": "NO_BLOCK_COMPRESSION",
"storeDetail": {
"batchFreeMapData": {
"storeMemory": 0,
"storeUse": 0,
"swapRead": 0,
"swapWrite": 0
},
"batchFreeRowData": {
"storeMemory": 0,
"storeUse": 0,
"swapRead": 0,
"swapWrite": 0
},
"mapData": {
"storeMemory": 131072,
"storeUse": 131072,
"swapRead": 0,
"swapWrite": 0
},
"metaData": {
"storeMemory": 131072,
"storeUse": 131072,
"swapRead": 0,
"swapWrite": 0
},
"rowData": {
"storeMemory": 4784128,
"storeUse": 4784128,
"swapRead": 0,
"swapWrite": 0
}
},
"storeMemory": 5046272,
"storeMemoryLimit": 1073741824,
"storeTotalUse": 5046272,
"swapRead": 0,
"swapReadSize": 0,
"swapReadTime": 0,
"swapReadUncompressTime": 0,
"swapWrite": 0,
"swapWriteCompressTime": 0,
"swapWriteSize": 0,
"swapWriteTime": 0,
"syncReadSize": 0,
"syncReadTime": 0,
"syncReadUncompressTime": 0,
"totalBackupLsn": 0,
"totalLockConflictCount": 0,
"totalOtherLsn": 0,
"totalOwnerLsn": 110220,
"totalReadOperation": 4733,
"totalRowRead": 2325894,
"totalRowWrite": 55108,
"totalWriteOperation": 55108,
"txnDetail": {
"totalBackgroundOperation": 0
}
},
"recovery": {
"progressRate": 1
},
"version": "4.3.0-36424 CE"
}

The text was updated successfully, but these errors were encountered:

knonomura · 2020-01-27T02:46:04Z

Thank you for your usage and report.

"cluster": {
"clusterStatus": "SUB_CLUSTER", <--
"nodeStatus": "ABNORMAL", <--

I guess an error has occurred on the server. Ex. IO_ERROR
Could you please check gridstore-*.log file on $GS_LOG with "ERROR" ?

harshajayaweeraXHJ · 2020-01-27T07:27:14Z

Sure I'll, Thanks. Can you give more specific examples please. What kind of IO errors it could be. Read/write permission error or something like that?

harshajayaweeraXHJ · 2020-01-27T15:49:26Z

Log files in $GS_LOG says:

2020-01-26T15:53:07.742Z xxx 11745 ERROR EVENT_ENGINE [1042:CM_PLATFORM_ERROR] Listener error (engine=TRANSACTION_SERVICE, reason=Platform error (errno=24, description=Too many open files))

and also

2020-01-26T15:53:07.744Z xxx 11751 ERROR CLUSTER_OPERATION [1042:CM_PLATFORM_ERROR] [ERROR] SetError requested (reason=Prepare checkpoint failed. reason=(Failed to open (path=data/gs_log_1_1307.log, reason=Platform error (errno=24, description=Too many open files)))

What does this mean, Too many open files ?

Above errors are repeating for a while and then it gives following error

2020-01-26T15:53:07.746Z xxx 11751 WARNING CLUSTER_OPERATION [180060:CLM_STATUS_TO_SUBMASTER] Cluster status change to SUB_MASTER

2020-01-26T15:57:08.109Z xxx 11734 INFO SYSTEM_SERVICE [50912:SC_TRACE_STATS] clusterStatus=SUB_CLUSTER, nodeStatus=ABNORMAL,

It changes to sub master and then nodeStatus change to ABNORMAL

What is the root cause for this issue. when it says too many open files which files does it means.?

knonomura · 2020-01-28T05:57:03Z

Thank you for error information.

My application runs a sampling query periodically and it was working fine without any issue for around 30 hours.

Let me confirm some points.

Does your application use many connections ?
Are the appication and GridDB server on the same machine ?
Could you please tell me the output of the following command ?

ulimit -n

harshajayaweeraXHJ · 2020-01-28T06:20:37Z

Yes it creates a new connection on every call like blow:

this.store = GridStoreFactory.getInstance().getGridStore(createDBProps()); 
this.ts = store.putTimeSeries(COLLECTION_NAME, Payload.class);

and then once query is done leaves the connection open. (Is there any recommended approach to handle db connections, or any pooling solution available for griddb like HikariCP)

Yes my appication and GridDB server on the same machine ?
Output of the ulimit -n is 1024

knonomura · 2020-01-28T09:19:06Z

If you call GridStoreFactory#getGridStore() but don't call GridStore#close(),
connection (socket file descriptor) remains.

If so, please use the following method (A) or (B):
(A) Close GridStore object

GridStore store = GridStoreFactory.getInstance().getGridStore(props);
...
store.close();

(B) Reuse GridStore object

this.store = GridStoreFactory.getInstance().getGridStore(props);

ts = this.store.putTimeSeries(name, XXX.class);// or store.getTimeSeries(name, XXX.class)
...
ts.close()

ts = this.store.putTimeSeries(name, XXX.class);// or store.getTimeSeries(name, XXX.class) 
...
ts.close()
...

Java client library has a connection pool.
If necessary, please set maxConnectionPoolSize.
http://griddb.org/griddb_nosql/manual/GridDB_API_Reference.html#GridStoreFactory.html__setProperties(java.util.Properties)

And it might be better to increase the file descriptor limit.

harshajayaweeraXHJ · 2020-01-28T11:00:17Z

Thank you very much for the directions.
But still my question remains. What is the root cause for above errors?
What is "Too many open files"?

harshajayaweeraXHJ · 2020-01-28T11:03:20Z

Also from the above methods you suggested. Which one is recommended if i frequently make calls to the db(Usually front-end make db requests for data sampling query per every 5 seconds)

knonomura · 2020-01-29T03:23:07Z

What is the root cause for above errors?
What is "Too many open files"?

This is an error message of errno=24 in Linux OS Errors.
It is defined in error.h.

#define EMFILE          24      /* Too many open files */

http://man7.org/linux/man-pages/man3/errno.3.html

"files" means file descriptors and includes things like sockets.
I guess your machine has reached the maximum number of open files.

knonomura · 2020-01-29T03:27:10Z

Also from the above methods you suggested. Which one is recommended if i frequently make calls to the db(Usually front-end make db requests for data sampling query per every 5 seconds)

I often use the case (B) when queries are repeated in a thread or a function.

Note: GridStore object isn't thread safety. So, if you use multi-thread, please assign a GridStore object for each thread.
http://griddb.org/griddb_nosql/manual/GridDB_API_Reference.html#GridStore.html__
Thread safety of each method is not guaranteed.

harshajayaweeraXHJ · 2020-01-29T06:10:11Z

What is the root cause for above errors?
What is "Too many open files"?

This is an error message of errno=24 in Linux OS Errors.
It is defined in error.h.

#define EMFILE 24 /* Too many open files */
http://man7.org/linux/man-pages/man3/errno.3.html

"files" means file descriptors and includes things like sockets.
I guess your machine has reached the maximum number of open files.

But my server doesn't have any open files. Is this referring to database connections in this case?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to connect with database: JC_BAD_CONNECTION #235

Unable to connect with database: JC_BAD_CONNECTION #235

harshajayaweeraXHJ commented Jan 26, 2020

knonomura commented Jan 27, 2020

harshajayaweeraXHJ commented Jan 27, 2020 •

edited

harshajayaweeraXHJ commented Jan 27, 2020 •

edited

knonomura commented Jan 28, 2020

harshajayaweeraXHJ commented Jan 28, 2020

knonomura commented Jan 28, 2020 •

edited

harshajayaweeraXHJ commented Jan 28, 2020

harshajayaweeraXHJ commented Jan 28, 2020

knonomura commented Jan 29, 2020

knonomura commented Jan 29, 2020

harshajayaweeraXHJ commented Jan 29, 2020

Unable to connect with database: JC_BAD_CONNECTION #235

Unable to connect with database: JC_BAD_CONNECTION #235

Comments

harshajayaweeraXHJ commented Jan 26, 2020

knonomura commented Jan 27, 2020

harshajayaweeraXHJ commented Jan 27, 2020 • edited

harshajayaweeraXHJ commented Jan 27, 2020 • edited

knonomura commented Jan 28, 2020

harshajayaweeraXHJ commented Jan 28, 2020

knonomura commented Jan 28, 2020 • edited

harshajayaweeraXHJ commented Jan 28, 2020

harshajayaweeraXHJ commented Jan 28, 2020

knonomura commented Jan 29, 2020

knonomura commented Jan 29, 2020

harshajayaweeraXHJ commented Jan 29, 2020

harshajayaweeraXHJ commented Jan 27, 2020 •

edited

harshajayaweeraXHJ commented Jan 27, 2020 •

edited

knonomura commented Jan 28, 2020 •

edited