New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance syncing daemon with another daemon on the same local machine #7913
Comments
As far as I know daemon sync bottleneck is usually block verification and not networking related. |
Since block verification is a local operation, it should ideally consume 100% cpu if it's the bottleneck. All cpu cores are currently underutilized so there should be room for optimization as a local operation. |
That implies that your disk storage has instant IO speeds. Verification requires looking up older transactions / blocks so a lot of non sequential reading. That's why SSD/NVMe speed up sync noticeably. |
I'm sure things can be optimized, e.g. by using @vtnerd's ASM ECC lib (https://github.com/monero-project/supercop/tree/monero), which is currently only used for wallet scanning and not daemon sync. |
I just recorded a run of two monerod instances (compiled out of current using a mix of , then focusing on the it appears that the lib might help but not substantially as functions like is that right? not familiar with this part of the codebase |
Do you have stats for the checkpoints zone? Speeding that part up would also be interesting, as it accounts for the majority in a fresh sync. |
@selsta, just gave it a run here - starting the sync from scratch this time, we can see that most of the samples are simply to generate these: # record the samples
#
sudo perf record -a -F 500 -g -p $pid -- sleep $RECORD_DURATION
# do some terminal-based exploration
#
sudo perf report
# output in a way that flamegraph can consume and then generate svg
# see https://github.com/brendangregg/FlameGraph
#
sudo perf script >perf.script
stackcollapse-perf.pl perf.script > ./stacks-collapsed
cat ./stacks-collapsed | flamegraph.pl >flamegraph.svg # (or, open `stacks-collapsed on https://www.speedscope.app/) |
I've already been looking at that this for a few months - I've done some recent PRs to help with data copying that occurs on both sending and receiving side of the p2p protocols. They help a bit, but as @cirocosta points out they do not constitute the majority of the time spent syncing. I think there's some improvements to be done in the "block_span" code with copying as well, but there's still that wall of cryptography. @selsta the ECC library was not used with synchronization to reduce the possibility of a chain fork with different implementations. Perhaps it could be used only during synchronization but not after? |
Yes, that was my intention. Only use it for historical sync. |
#7803 is the outstanding work on data copying, but if you look at the numbers they aren't the dominating time for a ryzen3 desktop chip. My expectation is that CPUs with smaller caches will benefit more with that patch (less cache thrashing). @selsta thats an intriguing thought that I will look into. The major issue is that the code is focused on wallet scanning, and will need some more functions for bulletproofs, etc. |
Throwing in my support. Switching the ECC library for historical sync could improve sync time and UX substantially. |
would 3d xpoint or a ramdrive be even faster then? |
The whole blockchain in RAM would speed sync up, yes. Not familiar with "3d xpoint". |
instead of NAND flash used in most SSDs, it uses a kind of phase change memory. See here: https://ark.intel.com/content/www/us/en/ark/products/123623/intel-optane-ssd-900p-series-280gb-2-5in-pcie-x4-20nm-3d-xpoint.html I have one of those fancy SSDs and wouldn't mind running benchmarks if there are some bash scripts to do so. How much faster would the whole blockchain in RAM be? Would the bottleneck still be i/o or would it now be the CPU? |
Syncing a daemon with latest blocks from the network is currently one of the slowest parts of using Monero.
When syncing the daemon, we should expect most time to be spent waiting on network transmission or other external factors.
However, we find that syncing one daemon from another daemon on the same local machine exhibits the same slowness with very low resource utilization of the host.
This suggests daemon sync speed can be improved dramatically by better utilizing resources of the host machine.
This issue requests investigating and improving performance of syncing one daemon from another local daemon in order to improve daemon sync speed.
Specifically requested is a list breaking down top time consumers when syncing locally, including references to related code in order to inform where optimizations are needed and focus developer effort.
One can sync locally in order to analyze performance by starting a local blockchain with 2 daemons syncing with each other:
./monerod --stagenet --no-igd --hide-my-port --data-dir node1 --p2p-bind-ip 127.0.0.1 --p2p-bind-port 48080 --rpc-bind-port 48081 --zmq-rpc-bind-port 48082 --add-exclusive-node 127.0.0.1:38080 --rpc-login superuser:abctesting123 --rpc-access-control-origins http://localhost:8080 --fixed-difficulty 10
./monerod --stagenet --no-igd --hide-my-port --data-dir node2 --p2p-bind-ip 127.0.0.1 --rpc-bind-ip 0.0.0.0 --confirm-external-bind --add-exclusive-node 127.0.0.1:48080 --rpc-login superuser:abctesting123 --rpc-access-control-origins http://localhost:8080 --fixed-difficulty 10
start_mining 52aPELZwrwvVBNK4pvRZPNj4U5EEkZBsNTR2jozCLYyrhQySvYbWebTQEdt7RS9nFnRY9r88eFpt6UcsHKnVpCQDAFKu1Az 1
./monero-blockchain-import --stagenet --data-dir ./node1 --pop-blocks 100
./monerod --stagenet --no-igd --hide-my-port --data-dir node1 --p2p-bind-ip 127.0.0.1 --p2p-bind-port 48080 --rpc-bind-port 48081 --zmq-rpc-bind-port 48082 --add-exclusive-node 127.0.0.1:38080 --rpc-login superuser:abctesting123 --rpc-access-control-origins http://localhost:8080 --fixed-difficulty 10
The text was updated successfully, but these errors were encountered: