chunk allocation with rack awareness #235

szabolcsf · 2019-06-03T06:28:11Z

We have a ~25PB qfs 2.0.0 cluster with rackId configured on the chunkservers. Our physical servers have several disks, so we have multiple chunkservers per physical server. For this reason each physical server have a unique rackId.

We allocate one primary + one replica for every chunk. The goal is that every chunk should survive a complete failure of any physical server.

But somehow both primary and replica chunks end up on the same physical server, i.e. the same rackId.

This is our metaserver config:

metaServer.clientPort = 20000
metaServer.chunkServerIp = 10.10.1.1
metaServer.chunkServerPort = 20100
metaServer.clusterKey = prod
metaServer.cpDir = /home/qfs/meta/checkpoints
metaServer.logDir = /home/qfs/meta/logs
metaServer.createEmptyFs = 0
metaServer.recoveryInterval = 1
metaServer.msgLogWriter.logLevel = INFO
metaServer.msgLogWriter.maxLogFileSize = 100e09
metaServer.msgLogWriter.maxLogFiles = 3
metaServer.minChunkservers = 1
metaServer.clientThreadCount = 5
metaServer.maxClientCount = 131072
metaServer.clientSM.maxPendingOps = 51200
metaServer.chunkServer.chunkAllocTimeout = 1500
metaServer.chunkServer.chunkReallocTimeout = 2000
metaServer.clientSM.inactivityTimeout = 1500
metaServer.clientSM.auditLogging = 1
metaServer.rootDirUser = 1002
metaServer.rootDirGroup = 1003
metaServer.rootDirMode = 0777
metaServer.maxSpaceUtilizationThreshold = 0.95
metaServer.serverDownReplicationDelay = 7200
metaServer.maxConcurrentReadReplicationsPerNode = 8
chunkServer.storageTierPrefixes = /disk 10
chunkServer.bufferedIo = 0
metaServer.maxRebalanceSpaceUtilThreshold = 0.94
metaServer.minRebalanceSpaceUtilThreshold = 0.93
metaServer.chunkServer.heartbeatInterval = 15
metaServer.sortCandidatesBySpaceUtilization = 0
metaServer.rebalancingEnabled = 0
metaServer.MTimeUpdateResolution = 60
metaServer.msgLogWriter.logFilePrefixes = MetaServer.log
metaServer.maxWritesPerDriveRatio = 3
metaServer.sortCandidatesByLoadAvg = 1

and this is a chunkserver config:

chunkServer.metaServer.port = 20100
chunkServer.clientIp = 10.10.1.25
chunkServer.clientPort = 21011
chunkServer.clusterKey = prod
chunkServer.rackId = 681000
chunkServer.chunkDir = /mnt/n01/qfs /mnt/n02/qfs /mnt/n03/qfs /mnt/n04/qfs /mnt/n05/qfs /mnt/n06/qfs
chunkServer.diskIo.crashOnError = 0
chunkServer.abortOnChecksumMismatchFlag = 1
chunkServer.msgLogWriter.logLevel = DEBUG
chunkServer.msgLogWriter.maxLogFileSize = 1e9
chunkServer.msgLogWriter.maxLogFiles = 2
chunkServer.diskQueue.threadCount = 5
chunkServer.ioBufferPool.partitionBufferCount = 1572864
chunkServer.bufferedIo = 0
chunkServer.dirRecheckInterval = 864000
chunkServer.maxSpaceUtilizationThreshold = 0.05

So for instance other chunkservers on this exact same physical server also have the 681000 rackId. So far it happened several times that a physical server died and we've lost chunks, because they were on the same physical server, although assigned to a different chunkserver within that same physical server.

Could you please take a look at our configs and see if we are doing something wrong?

The text was updated successfully, but these errors were encountered:

szabolcsf · 2019-06-13T08:24:59Z

From qfsfsck output: Chunks reachable no rack assigned: 226257930 100%
Does that mean there's no rack assigned to any of the chunks?

mikeov · 2019-06-16T06:14:51Z

Rack IDs outside the range from 0 to 65535 are considered invalid, and ignored by chunk placement logic. Presently only rack IDs specified by metaServer.rackPrefixes parameter are validated, and the error message emitted in the case when rack id is outside valid range.

In the case if all chunk server rack IDs are outside valid range the FSCK will report all chunks as with no rack ID assigned.

Present design assumes that the number of racks (failure groups) is reasonably small less than a 100 or so.

I’d recommend to use one chunk server per physical node / host, with adequate number of network IO (“client”) and disk IO threads. By default the number of IO threads is 2 per chunk directory / IO device / “disk”.
Chunk server annotated configuration file https://github.com/quantcast/qfs/blob/master/conf/ChunkServer.prp describes corresponding parameters chunkServer.clientThreadCount, chunkServer.diskQueue.threadCount, and offers some insights of how to set them.

szabolcsf · 2019-06-16T10:59:44Z

Thank you @mikeov, this is very useful! We are going to fix the rackids in the chunk config and see how the chunk placement goes.
Ftr, we don't have 681000 racks in the cluster, we just did some multiplication to ensure uniqueness. We are going to use real rackid now (one id per rack) and it will be within the 0 to 65535 range.

szabolcsf · 2019-09-26T17:57:50Z

Closing this as resolved.

szabolcsf closed this as completed Sep 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chunk allocation with rack awareness #235

chunk allocation with rack awareness #235

szabolcsf commented Jun 3, 2019

szabolcsf commented Jun 13, 2019

mikeov commented Jun 16, 2019

szabolcsf commented Jun 16, 2019

szabolcsf commented Sep 26, 2019

chunk allocation with rack awareness #235

chunk allocation with rack awareness #235

Comments

szabolcsf commented Jun 3, 2019

szabolcsf commented Jun 13, 2019

mikeov commented Jun 16, 2019

szabolcsf commented Jun 16, 2019

szabolcsf commented Sep 26, 2019