Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunk allocation with rack awareness #235

Closed
szabolcsf opened this issue Jun 3, 2019 · 4 comments
Closed

chunk allocation with rack awareness #235

szabolcsf opened this issue Jun 3, 2019 · 4 comments

Comments

@szabolcsf
Copy link

We have a ~25PB qfs 2.0.0 cluster with rackId configured on the chunkservers. Our physical servers have several disks, so we have multiple chunkservers per physical server. For this reason each physical server have a unique rackId.

We allocate one primary + one replica for every chunk. The goal is that every chunk should survive a complete failure of any physical server.

But somehow both primary and replica chunks end up on the same physical server, i.e. the same rackId.

This is our metaserver config:

metaServer.clientPort = 20000
metaServer.chunkServerIp = 10.10.1.1
metaServer.chunkServerPort = 20100
metaServer.clusterKey = prod
metaServer.cpDir = /home/qfs/meta/checkpoints
metaServer.logDir = /home/qfs/meta/logs
metaServer.createEmptyFs = 0
metaServer.recoveryInterval = 1
metaServer.msgLogWriter.logLevel = INFO
metaServer.msgLogWriter.maxLogFileSize = 100e09
metaServer.msgLogWriter.maxLogFiles = 3
metaServer.minChunkservers = 1
metaServer.clientThreadCount = 5
metaServer.maxClientCount = 131072
metaServer.clientSM.maxPendingOps = 51200
metaServer.chunkServer.chunkAllocTimeout = 1500
metaServer.chunkServer.chunkReallocTimeout = 2000
metaServer.clientSM.inactivityTimeout = 1500
metaServer.clientSM.auditLogging = 1
metaServer.rootDirUser = 1002
metaServer.rootDirGroup = 1003
metaServer.rootDirMode = 0777
metaServer.maxSpaceUtilizationThreshold = 0.95
metaServer.serverDownReplicationDelay = 7200
metaServer.maxConcurrentReadReplicationsPerNode = 8
chunkServer.storageTierPrefixes = /disk 10
chunkServer.bufferedIo = 0
metaServer.maxRebalanceSpaceUtilThreshold = 0.94
metaServer.minRebalanceSpaceUtilThreshold = 0.93
metaServer.chunkServer.heartbeatInterval = 15
metaServer.sortCandidatesBySpaceUtilization = 0
metaServer.rebalancingEnabled = 0
metaServer.MTimeUpdateResolution = 60
metaServer.msgLogWriter.logFilePrefixes = MetaServer.log
metaServer.maxWritesPerDriveRatio = 3
metaServer.sortCandidatesByLoadAvg = 1

and this is a chunkserver config:

chunkServer.metaServer.port = 20100
chunkServer.clientIp = 10.10.1.25
chunkServer.clientPort = 21011
chunkServer.clusterKey = prod
chunkServer.rackId = 681000
chunkServer.chunkDir = /mnt/n01/qfs /mnt/n02/qfs /mnt/n03/qfs /mnt/n04/qfs /mnt/n05/qfs /mnt/n06/qfs
chunkServer.diskIo.crashOnError = 0
chunkServer.abortOnChecksumMismatchFlag = 1
chunkServer.msgLogWriter.logLevel = DEBUG
chunkServer.msgLogWriter.maxLogFileSize = 1e9
chunkServer.msgLogWriter.maxLogFiles = 2
chunkServer.diskQueue.threadCount = 5
chunkServer.ioBufferPool.partitionBufferCount = 1572864
chunkServer.bufferedIo = 0
chunkServer.dirRecheckInterval = 864000
chunkServer.maxSpaceUtilizationThreshold = 0.05

So for instance other chunkservers on this exact same physical server also have the 681000 rackId. So far it happened several times that a physical server died and we've lost chunks, because they were on the same physical server, although assigned to a different chunkserver within that same physical server.

Could you please take a look at our configs and see if we are doing something wrong?

@szabolcsf
Copy link
Author

From qfsfsck output: Chunks reachable no rack assigned: 226257930 100%
Does that mean there's no rack assigned to any of the chunks?

@mikeov
Copy link
Contributor

mikeov commented Jun 16, 2019

Rack IDs outside the range from 0 to 65535 are considered invalid, and ignored by chunk placement logic. Presently only rack IDs specified by metaServer.rackPrefixes parameter are validated, and the error message emitted in the case when rack id is outside valid range.

In the case if all chunk server rack IDs are outside valid range the FSCK will report all chunks as with no rack ID assigned.

Present design assumes that the number of racks (failure groups) is reasonably small less than a 100 or so.

I’d recommend to use one chunk server per physical node / host, with adequate number of network IO (“client”) and disk IO threads. By default the number of IO threads is 2 per chunk directory / IO device / “disk”.
Chunk server annotated configuration file https://github.com/quantcast/qfs/blob/master/conf/ChunkServer.prp describes corresponding parameters chunkServer.clientThreadCount, chunkServer.diskQueue.threadCount, and offers some insights of how to set them.

@szabolcsf
Copy link
Author

Thank you @mikeov, this is very useful! We are going to fix the rackids in the chunk config and see how the chunk placement goes.
Ftr, we don't have 681000 racks in the cluster, we just did some multiplication to ensure uniqueness. We are going to use real rackid now (one id per rack) and it will be within the 0 to 65535 range.

@szabolcsf
Copy link
Author

Closing this as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants