Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo migration using IPFS #4247

Closed
Stebalien opened this issue Sep 18, 2017 · 12 comments
Closed

Repo migration using IPFS #4247

Stebalien opened this issue Sep 18, 2017 · 12 comments
Labels
help wanted Seeking public contribution on this issue kind/enhancement A net-new feature or improvement to an existing feature status/deferred Conscious decision to pause or backlog

Comments

@Stebalien
Copy link
Member

Stebalien commented Sep 18, 2017

Currently, we download the repo migration tool using HTTPS from the gateways as necessary. Really, we should dogfood our own tech and use IPFS. The wrinkle is that we currently need to do the repo migration in order to start IPFS (because we need a working repo).

However, there's actually a simple solution to this. We can:

  1. Start IPFS.
  2. Notice that the repo is out-of-date.
  3. Create a new "transition" repo and config (possibly in /tmp, use the same ports for firewall reasons but a new temporary identity as this won't really be the same node).
  4. Fetch the migration tool with IPFS_HOME=tmp_ipfs_repo ipfs get /... (without starting the daemon).
  5. Run the migration tool.
  6. Continue booting the main IPFS daemon.
  7. (optionally) Open the migration repo and copy data out of it and into the main datastore.
  8. Delete the temporary datastore.
@whyrusleeping
Copy link
Member

whyrusleeping commented Sep 18, 2017

Could also create an in memory repo and with the (not yet implemented) ipfs get --filestore feature, stream it straight to disk

@kevina
Copy link
Contributor

kevina commented Sep 19, 2017

This can be an option, but we should still use the gateway as a backup option. We could prompt the user if they want to retrieve the file over the ipfs network and when they say no provide instructions to get the upgrade by retrieving via the gateway (likely a command line flag).

@Stebalien
Copy link
Member Author

Stebalien commented Sep 19, 2017

but we should still use the gateway as a backup option

What's the rational for keeping this option? Providing instructions for manually downloading and running the migration tool in case of failure is reasonable but otherwise, IPFS should work (and if it doesn't, we need to make it work).

@kevina
Copy link
Contributor

kevina commented Sep 19, 2017

IPFS should work (and if it doesn't, we need to make it work).

I always prefer the more convective approach. Yes it should work, except when it doesn't. And in my experience, in general, things don't work a lot of the time for me. I suppose the fully manual approach can be reasonable if it is a very simple process and does the exact same thing the automatic approach does.

@djdv
Copy link
Contributor

djdv commented Mar 16, 2018

I agree that it would be nice to use IPFS as the first choice where possible.
However, since the functionality to download over HTTP already exists, I feel like it should be kept as a backup solution and used conditionally, prioritize IPFS itself first and fallback to HTTP when appropriate.
Maybe we could count peers, providers, or some other metric and fallback there if we encounter IPFS problems, in addition users could set an env var IPFS_USE_GATEWAY, IPFS_USE_HTTP, or something to that effect to always prioritize HTTP.

Since manual steps are documented here: https://github.com/ipfs/fs-repo-migrations/blob/master/run.md
we could probably just link to the repo if a message to the user is needed.

@Kubuxu
Copy link
Member

Kubuxu commented Mar 16, 2018

What's the rational for keeping this option?

Nodes on my networks don't really work unless configured for releay due to weird CG-NAT. I have to configure them properly. I might setup a notebook with remote access for someone to test it out if I don't find time to fix it in go-ipfs itself.

@Stebalien
Copy link
Member Author

Stebalien commented Mar 17, 2018

Fair enough (although we should eventually try to make this work).

Nodes on my networks don't really work unless configured for releay due to weird CG-NAT.

Can they not even establish outbound connections? That's all that's needed in this case (connect to one of our nodes, fetch the migration over bitswap).

@Kubuxu
Copy link
Member

Kubuxu commented Mar 17, 2018

Can they not even establish outbound connections?

Not always if reuseport is enabled.

@Stebalien
Copy link
Member Author

Stebalien commented Mar 17, 2018

No always if reuseport is enabled.

Ah. I forgot about that issue... (libp2p/go-libp2p#1434). We really should find a nice way to fix that.

@Stebalien Stebalien added the help wanted Seeking public contribution on this issue label Jun 27, 2018
@bonedaddy
Copy link

bonedaddy commented Jun 27, 2018

Say you are upgrading Node A, before the upgrade starts a docker container Node B is spun up, and the repo contents from Node A are then copied over to Node B, after which the migration on Node A is performed. Once the migration is complete, the repo contents are moved back over to Node A, checks are done to ensure everything was moved back successfully, Node B gets destroyed and Node A is brought back online with the migrated repo.

Would a solution like this work? This would mean that we avoid hitting the public gateway, thereby consuming 0 internet bandwidth, allowing all migrations and traffic to happen over the local network. Which not only should be faster than hitting the public gateway, but will also save users $ (albeit small).

In theory this should allow Node A to continue serving requests, only having to be brought down to migrate the data allowing Node B to serve requests when Node A is down, which in theory should allow for 0 down-time migrations.

@Stebalien
Copy link
Member Author

Stebalien commented Jun 27, 2018

So, the issue there is that copying repos is slow and requires 2x the disk space. IMO, if you need high availability, you should be using ipfs-cluster. In that case, as long as you have a replication factor > 2, you can simply bring down and migrate each node one-by-one.

@Stebalien Stebalien added kind/enhancement A net-new feature or improvement to an existing feature status/deferred Conscious decision to pause or backlog labels Aug 22, 2018
@momack2 momack2 added this to Backlog in ipfs/go-ipfs May 9, 2019
@lidel
Copy link
Member

lidel commented Aug 2, 2022

This work continues in:

@lidel lidel closed this as completed Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Seeking public contribution on this issue kind/enhancement A net-new feature or improvement to an existing feature status/deferred Conscious decision to pause or backlog
Projects
No open projects
Development

No branches or pull requests

7 participants