Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: --datadir doesn't respect alternate chain dir organization #19172

Closed
whilei opened this issue Feb 26, 2019 · 8 comments
Closed

bug: --datadir doesn't respect alternate chain dir organization #19172

whilei opened this issue Feb 26, 2019 · 8 comments

Comments

@whilei
Copy link

whilei commented Feb 26, 2019

OS and program version info

  • OS: Linux 4.17.4-041704-generic GNU/Linux
  • Current working directory: /home/ia/go/src/github.com/ethereum/go-ethereum
  • Git commit: 6d652ce
  • ./build/bin/geth version:
Geth
Version: 1.9.0-unstable
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.11.5
Operating System: linux
GOPATH=/home/ia/go
GOROOT=/home/ia/go1.11.5.linux-amd64/go

Reproduce

Setup
+ [[ -f ./build/bin/geth ]]
+ mv ./build/bin/geth ./build/bin/geth.bak
+ make
build/env.sh go run build/ci.go install ./cmd/geth
>>> /home/ia/go1.11.5.linux-amd64/go/bin/go install -v ./cmd/geth
Done building.
Run "/home/ia/go/src/github.com/ethereum/go-ethereum/build/bin/geth" to launch geth.

Run
+ ./build/bin/geth --rinkeby --port 30333 --exec 'admin.datadir;' console
"/home/ia/.ethereum/rinkeby"
+ tree -L 2 /home/ia/.ethereum
/home/ia/.ethereum
└── rinkeby
    ├── geth
    ├── history
    └── keystore

3 directories, 1 file
+ ./build/bin/geth --rinkeby --datadir ./foo --port 30333 --exec 'admin.datadir;' console
"/home/ia/go/src/github.com/ethereum/go-ethereum/foo"
+ tree -L 2 ./foo
./foo
├── geth
│   ├── chaindata
│   ├── LOCK
│   ├── nodekey
│   ├── nodes
│   └── transactions.rlp
├── keystore
└── rinkeby
    └── history

5 directories, 4 files

Teardown
++ [[ -f ./build/bin/geth.bak ]]
++ mv ./build/bin/geth.bak ./build/bin/geth
++ rm -rf ./foo

Actual result

+ ./build/bin/geth --rinkeby --datadir ./foo --port 30333 --exec 'admin.datadir;' console
"/home/ia/go/src/github.com/ethereum/go-ethereum/foo"

Expected result

+ ./build/bin/geth --rinkeby --datadir ./foo --port 30333 --exec 'admin.datadir;' console
"/home/ia/go/src/github.com/ethereum/go-ethereum/foo/rinkeby"
@fjl
Copy link
Contributor

fjl commented Feb 27, 2019

The behavior you're seeing is intentional, but I can understand why it may seem confusing. Let me explain how we ended up with this:

If you are using --datadir, geth will place any files in <datadir>/geth. If you are using multiple chains, it is best to use a separate data directory for each chain.

If you are not using --datadir, a default location is chosen based on the chain flags and the operating system. At some point in git history, we chose to make the default directory for non-mainnet chains a subdirectory of the mainnet location. That was a silly idea, but we've been using this scheme for a long time and nobody ever complained about it.

Unsure what to do now. We could fix it, for OCD's sake, by choosing a default location outside of the mainnet datadir for non-mainnet chains. Some amount of backwards-compatibility code would be required because people who aren't using --datadir expect geth to find their keystore and block database if they previously synced it.

@whilei
Copy link
Author

whilei commented Feb 28, 2019

Gotcha, thanks for the background.

So if I'm understanding correctly, the "weird" part of the behavior is actually that ~/.ethereum/rinkeby is the default, instead of, say, ~/.rinkeby. Although with respect, I'm not sure I would call the issue/fix OCD-driven, since the meaning of 'datadir' can does change arbitrarily and without documentation, which for a user-facing flag that's second from the top of in the list of

ETHEREUM OPTIONS:                                                                                               
  --config value                  TOML configuration file                                              
  --datadir "/home/ia/.ethereum"  Data directory for the databases and keystore                 
  --keystore                      Directory for the keystore (default = inside the datadir)

seems like an important thing to have predictable and consistent. Agreed that backwards compatibility is important to keep, but as I've suggested, 'forwards compatibility' is important too; I found this bug because I nearly nuked 200gb of full archive sync thinking that it was somehow scrap development data because it wasn't where I expected it to be.

I agree that nested <datadir>/<chaindatadir> is awkward. It seems there are two paths toward a solve:

  1. alt chains get their own datadir by default, eg. ~/.rinkeby
  2. alt chains get consistent nesting in --datadir, whether that value is default or not

Both options will need some, probably comparable amount, of backwards-friendly migration logic.

Just for sake of a touchpoint w/r/t backwards datadir compat, here's a place where this kind of logic is handled in the code for earlier "datadir migrations". https://github.com/ethereum/go-ethereum/blob/master/node/config.go#L319-L331


There's a small further nitpick noticeable here with the console history files, too:

./foo
├── geth
│   ├── chaindata
│   ├── LOCK
│   ├── nodekey
│   ├── nodes
│   └── transactions.rlp
├── keystore
└── rinkeby
    └── history

to be consistent (even against the "unexpected" custom datadir/chaindir behavior) I think it should be:

./foo
├── geth
│   ├── chaindata
│   ├── LOCK
│   ├── nodekey
│   ├── nodes
│   └── transactions.rlp
│   └── history
├── keystore
    

Maybe this is worth spinning off into its own issue?

@fjl
Copy link
Contributor

fjl commented Mar 6, 2019

Yes, you're right, the geth history file belongs in the geth directory. I think the best way forward would be to define alternative locations outside the regular datadir for chain flags, e.g.

  • ~/.ethereum-rinkeby on linux and bsds
  • ~/Library/Ethereum-rinkeby on darwin
  • %LOCALAPPDATA%\Ethereum-rinkeby on windows

@whilei
Copy link
Author

whilei commented Mar 7, 2019

That's fine with me.

As referenced above, it appears that using "old resource paths" are handled like this:

c.warnOnce(&c.oldGethResourceWarning, "Using deprecated resource file %s, please move this file to the 'geth' subdirectory of datadir.", oldpath)

Will a log warning again be sufficient for this change?

@stale
Copy link

stale bot commented Mar 7, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@holiman
Copy link
Contributor

holiman commented Apr 16, 2020

The datadir is, nowadays, an implicit API surface, and various tools may depend on the organization of where data is stored. If we change that now, it might cause problems for a lot of users, so we decided to just leave it as is, since the added value of changing it is not deemed worth the problems it may cause.

@FriendlyLifeguard
Copy link

Hello. So in conclusion, we can simply ignore the warning and it will not have a negative impact during the syncing process? Newcomer using geth + prysm.

@fjl
Copy link
Contributor

fjl commented Jan 22, 2023

This warning does not impact your syncing process, correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants