Skip to content
This repository has been archived by the owner on Aug 23, 2020. It is now read-only.

Full Sync issues easy solutions!!! #375

Closed
ALEX778899 opened this issue Dec 1, 2017 · 41 comments
Closed

Full Sync issues easy solutions!!! #375

ALEX778899 opened this issue Dec 1, 2017 · 41 comments

Comments

@ALEX778899
Copy link

ALEX778899 commented Dec 1, 2017

Dear IOTA FUNDATION and DEVELOPERS, I'm a systems analyst and Java developer, I'm helping lot of persons to setup a full node but I can notice there are many full nodes not fully synched after weeks with a few transactions to request (around 10 in one hour) and sometimes their database is too big. Obviously they can't use their full nodes with the wallet. Even in https://iotatangle.slack.com is full of users having this problem.
I can guarantee there aren't hardware or software issues, they have full synched neighbors (tcp and udp).
I analyzed a lot of powerful PC, SERVERS, VPS, THERE AREN'T MEMORY PROBLEMS, NO BOTTLENECKS, NO CRASHES, NO ERROR MESSAGES AND THE PROCESS IS RUNNING SMOOTHLY.

I believe IOTA will bring a real freedom and I'm supporting it as much as I can.
I would like to understand why these nodes never became totally synched (Latest Milestone Index=Latest Solid Milestone Index) and why they have so little transactions so please can you explain me if there are any specific requirement I'm not aware about?

Waiting for your explanation
Thanks in advance

***** UPDATE **************
I FOUND AND TESTED TWO DIFFERENT SOLUTIONS FOR THE PROBLEM (ONLY IF YOU FOLLOW THESE SUGGESTIONS AND ANYWAY YOU WILL NOT BECOME FULL SYNC, FOLLOW THIS OTHER INSTRUCTIONS #409.

FIRST SOLUTION
I hope the developers we'll fix the problem soon, in the meantime how to solve the problem:

SECOND SOLUTION
add LiQio's udp://94.156.128.15:14600 and udp://185.181.8.149:14600 swarm nodes. They will add you back automatically. When you become full sync remember to remove the swarm nodes.

@mindlapse
Copy link

mindlapse commented Dec 2, 2017

I have 12 neighbours who are fully synced, however my node is not fully synced. When restarting it catches up to a subtangle milestone several hours before the current one and then sits there. The 'latestMilestone' is newer but it is also out of date by hours.

Traffic is flowing, but in the logs, I notice that I'm not seeing any new notifications of milestones from the coordinator.

I'm using Ubuntu in AWS with Oracle JDK 1.8.1_151. I have not been able to sync my node once since I began with IOTA about a week ago. What could be happening here? Does the full node need to have the ability to request newer milestones?

@paulhandy
Copy link
Contributor

@ALEX778899 as you're a systems analyst and a Java developer, please provide some runtime analysis to help us see where the program is finding a bottleneck on your system. I'd be most interested to see how memory is behaving, as I would suspect this type of issue to be mainly related to memory.
Also, what memory parameters are you running with?

You must understand that it is difficult to help solve an issue with so very little context ( though, judging by the all-caps brand-new username and issue title, I wouldn't be surprised to find this to be another instance of a concern troll account. )

@rnagler
Copy link

rnagler commented Dec 3, 2017

same with me, running on linux mint, 4 processor, 16gb, oracle java 8.152, fiber connected, 6 working tcp connections, 2 udp, perfect communication, after 24 hours I have allways the same solid milestone number. Here is my node info:
NODE INFO

appName
IRI
appVersion
1.4.1.2
jreAvailableProcessors
4
jreFreeMemory
149596368
jreVersion
1.8.0_152
jreMaxMemory
3715629056
jreTotalMemory
555220992
latestMilestone
WVDUPPMYDPRZVFPSBPCISWVJUJDVUVSSWCWYRWJJQXGZYDPOA9JVJ9BQXZ9HBOPYIDPTKVBKWMVFA9999
latestMilestoneIndex
295252
latestSolidSubtangleMilestone
XTNQBAH9OPNJFELGCCZNSRCKKOLBTTBXSGLNRGIHYYNJRSXQODMBEDHGFSNYJFRGICWHKVMYYZEIZ9999
latestSolidSubtangleMilestoneIndex
244014
neighbors
9
packetsQueueSize
0
time
1512338547420
tips
3356
transactionsToRequest
12562

@rnagler
Copy link

rnagler commented Dec 3, 2017

My impression after 3 days of syncing trial is, that manual search for neighbors may lead to isles of neighbors with redundant transaction exchange. All new nodes go https://iotatangle.slack.com and sync with nodes that also have just arrived there. Mixing with full synced nodes would be necessary, but if you have a fully synced node you dont want to change your neighbors. So a lot of useless work is done - maybe it helps the tangle, but the effort to become a fully synced node seems to be too high for the moment.

@mindlapse
Copy link

ideally all new nodes should be close to a public node, and we need more public nodes.

@GhostTyper
Copy link

I suggest you just start makin' a public node.

@mindlapse
Copy link

mindlapse commented Dec 3, 2017

I would, if I could get synced. I can confirm that I have neighbours who are in sync and who are sending traffic, although my node has not yet synced. I just rescanned today and the same symptom still appears (a "latestSolidSubtangleMilestoneIndex" that doesn't move, currently stuck at 295189. The latestMilestoneIndex is also behind, at 295222 of 295272).

I'm using Oracle JDK 8 (1.8.0_151-b12). I'm running on a 4 core server with 8 GB of RAM in the us-east-1 amazon data center using a c5.xlarge instance type.

@GhostTyper
Copy link

I can give you my database, if you wish. Just extract it and then start your node with it.

@mindlapse
Copy link

That would be awesome - would you be able to share via dropbox?

@GhostTyper
Copy link

GhostTyper commented Dec 3, 2017

No, just use this link: https://iota.lukaseder.de/download.html (I'm sorry, but this service is discontinued. Use the download on iota.partners instead.)

@GhostTyper
Copy link

Did it work?

@mindlapse
Copy link

It worked! I'm in sync. Thank you so much!

@rnagler
Copy link

rnagler commented Dec 4, 2017

is it possible to get a fully synced database to download and to speed up syncing?

@GhostTyper
Copy link

I can export it again, if you don't find somebody who can help you (everyone could do this). I will develop an automagical export every night UTC 4h or something like that the next days.

@rnagler
Copy link

rnagler commented Dec 4, 2017

Please export it again

@GhostTyper
Copy link

GhostTyper commented Dec 4, 2017

Download it here: https://iota.lukaseder.de/download.html (I'm sorry, but this service is discontinued. Use the download on iota.partners instead.)

@lunfardo314
Copy link

lunfardo314 commented Dec 4, 2017

That is what I posted to slack. repeat it here.

Sync problem.
Always the same pattern:

  1. Latest milestone climbs in sync with botbox while lastest SOLID milestone doesn't move
  2. when solid milestone becomes few hundred points behind the latest one, I restart my IRI
  3. after ctrl-C or kill IRI stops communications but continues working on something for some 10+ min. Only then stops
  4. after restart node is immediately synced with SOLID milestone 2-30 points behind the "latest".
  5. then it repeats from step 1.
    E.g After restart some hour ago solid milestone was at 295518 and I know it won't move until next restart (edited)

[10:43]
What I think it is going on.

  • IRI no doubt receives from neighbors all the information needed for the confirmation of very up-to-date solid subtangle
  • it doesn't do that because it's too busy gossiping with other nodes. The backlog mounts up, occupies memory etc
  • when shut down is requested, it stops being busy with gossiping and starts cleaning up it's backlog of confirmations. That's why after restart it is synced immediately

[10:44]
My Ubuntu has 8GB RAM, 2 cores, iri flag is -Xmx6G, swapping disabled, CPU is busy 15-30%
I think it is a bug.

@rnagler
Copy link

rnagler commented Dec 4, 2017

thanks, my old db seemed to be anyhow corrupted, now I can connect with light wallet

@LiQio
Copy link

LiQio commented Dec 4, 2017

I can support what @lunfardo314 summarized.
I would like to add that after moving to a way more powerful server (64GB, 8 Core, NVME SSD) the node found it's solid sync state. Although at the cost of a very huge DB-dir (35 GB). During synchronization RAM usage went up to 18 and more GB.

@lunfardo314
Copy link

lunfardo314 commented Dec 4, 2017

that's quite a machine for IOT node!

@onemoreitguy
Copy link

Download the database.
But simply replacing the db directory content will generate java errors :(

@ALEX778899
Copy link
Author

ALEX778899 commented Dec 4, 2017

@paulhandy I analyzed a lot of powerful PC, SERVERS, VPS, THERE AREN'T MEMORY PROBLEMS, NO BOTTLENECKS, NO CRASHED, NO ERROR MESSAGES AND THE PROCESS IS RUNNING SMOOTHLY.
Please test it, install a fresh node in Ubuntu/Centos and you'll never become FULL SYNC!

**** HELP FOR ALL THE USERS HAVING THE PROBLEM****
I FOUND AND TESTED A TEMPORARY SOLUTION:
I hope the developers we'll fix soon, in the meantime how to solve the problem:

@rnagler
Copy link

rnagler commented Dec 4, 2017

I wonder if one can prove in theory that this manual sync process will ever converge in fully synced nodes. In my opinion the danger of having isles of nodes syncing forever without being fully synced is immanent,

@nimearo
Copy link

nimearo commented Dec 4, 2017

Did someone tried to analyze thread- and heapdumps or debug a running iri instance in this state already?

@eelco2k
Copy link

eelco2k commented Dec 4, 2017

I was also having problems with the memory. and my full node was crashing and not fully synced with 7 peers.
I noticed a lot of swapping on my memory. (even when memory was not full) and then after some investigation I saw that Debian 7 in my vm image had vm.swappiness = 60 as default. After i edited /etc/systctl.conf to set it to 10 all my problems where gone. still a growing memory usage but not extreme which causes the crash after some hours.

my steps to fix:

  1. sudoedit /etc/sysctl.conf

  2. Add this line vm.swappiness = 10

  3. sudo shutdown -r now # restart system

Maybe that helps for some people...( IRI version 1.4.1.2 )

@GhostTyper
Copy link

Ok guys. The sad truth is: This software is in beta.

I'm recording very detailed performance statistics since i'm running my open wallet node. These collected data tell:

  1. You can easily sync up even with only 1 core and 1 GB RAM, when the last snapshot is fresh.
  2. When the last snapshot was taken about a month ago nodes with 4 cores and 4 GB RAM will have more and more problems syncing up when gone out of sync.

I guess in the current state of IRI this should just be considered "normal" and will be optimized in the future. Every try to solve this by tweaking system settings just won't solve the real problem. I for my part will "solve" these issues by throwing more and more hardware on my public node until the next snapshot happens.

A quick "solution" for the network would be to take a snapshot.

@ALEX778899 ALEX778899 changed the title BIG PROBLEM! TONS OF FULL NODE ARE NOT FULL SYNCHED AFTER WEEKS EVEN THEY HAVE FULL SYNCHED NEIGHBORS!!! FULL SYNC PROBLEM TEMPORARY SOLUTION!!! Dec 5, 2017
@nimearo
Copy link

nimearo commented Dec 5, 2017

From my observation of a running node I would also suggest to run without -Xmx and -Xms flags because most of the memory which is used by iri is in native memory and used by rocksdb from my understanding. So I would suggest to add some swap to avoid the well known bad::alloc memory errors but to give rocksdb as much physical memory as possible.

@nuriel77
Copy link
Contributor

nuriel77 commented Dec 7, 2017

@eelco2k nice find!

Just to add, there's no need to reboot:
For runtime, you can just run sudo sysctl vm.swappiness=10.
And, ofc you can add the vm.swappiness = 10 to /etc/sysctl.conf so it persists between reboots.

@Schweigi
Copy link

My node has the same problem of just not syncing up as @lunfardo314 described. The vm.swappiness adjustment didn't help and so didn't all other tips in this threat. The question is if there is any kind of log file which could help debug the issue or how else one can help?

@nuriel77
Copy link
Contributor

@Schweigi did you try to d/l a fully synced database?
Also try to find neighbors that are fully synced, that might help too.

@Schweigi
Copy link

@nuriel77 I was using the Swarm nodes (according to Slack #nodesharing) to help sync my node but it made no difference.

I solved the problem now - long story short:
I noticed that the Docker version of IOTA (from this repo) uses by default -Xmx8g for Java. This was leading to a lot of crashes as my server only has 4GB of memory and Docker itself needs a little bit of memory too. Anyway, I ditched Docker completely and set the server up from scratch according to http://iota.partners and now the node is fully synced.

@LiQio
Copy link

LiQio commented Dec 11, 2017

@Schweigi The -Xmx8g makes sense. But concerning the OP problem using iota.partners (and downloading an up-to-date DB) is just a workaround.

@mostaruk
Copy link

Hello, where would I put this database? Many thanks

@nuriel77
Copy link
Contributor

@mostaruk that depends on which guide/tutorial you've been following

@mostaruk
Copy link

@nuriel77
Copy link
Contributor

There's a section in the FAQ explaining this https://github.com/nuriel77/iri-playbook/wiki/IOTA-Full-Node-Tutorial---Linux#where-can-i-get-a-fully-synced-database-to-help-kick-start-my-node

If you need more help contact me on slack (nuriel77)

@mostaruk
Copy link

Ahh yes sorry about that. Should have probably read the FAQ! Thanks, I'll look you up on slack.

@ysle
Copy link

ysle commented Dec 17, 2017

thanks for all the efforts here, but do we have any official statement here from the devs? i mean like a long term solution as a hotfix / commit to 1.4.1.3 please ?

@ALEX778899 ALEX778899 changed the title FULL SYNC PROBLEM TEMPORARY SOLUTION!!! FULL SYNC PROBLEM SOLUTION!!! Dec 17, 2017
@zenmetsu
Copy link

zenmetsu commented Dec 25, 2017

I hardly consider the OP's workaround to be viable. Here's the typical experience.
Download DB snapshot (2min - 30min depending on your net speed)
Untar DB snapshot (usually around 20sec-1min)
Start IRI...
wait...
and wait...
and wait...
during this period, which is typically about half an hour, I assume that IRI is processing the previously loaded DB. Connecting to the node will show that it is stuck at milestone 243000. CPU never spikes to above 50% on a powerful system, RAM utilization is acceptable, some frequent GC takes place within jvm, but the time spent in GC isn't excessive... disk utilization (via iostat -x) never exceeds 20% on a decent SSD... network never comes close to full utilization... so WTF is IRI doing during this period, and why TF is it taking so long if nothing in the system appears to be the bottleneck??

After this DB work is finished, the node will finally begin synchronizing to the network. It usually takes a minimum of 40 minutes to get to this point. Allow another hour for synchronization to take place, and you are getting close to 2 hours from start of DB download to point of synchronization. Considering that I'm seeing nodes getting stuck after 4-5 hours of operation, you are looking at an effective 60-70% duty cycle. The OP's solution cannot be considered a workaround, let alone a solution. :(

@ALEX778899
Copy link
Author

ALEX778899 commented Dec 26, 2017

@zenmetsu, the workarount is tested, viable and helped lot of new users, please read again the suggetions, there is an update.

@brunoamancio
Copy link

Please read #428

@iotasyncbot iotasyncbot changed the title Full Sync issues easy solutions!!! IRI-249 ⁃ Full Sync issues easy solutions!!! Apr 17, 2018
@anyong anyong changed the title IRI-249 ⁃ Full Sync issues easy solutions!!! Full Sync issues easy solutions!!! Apr 22, 2018
@alon-e alon-e closed this as completed Apr 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests