Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dump contains duplicated data #10365

Closed
2 of 7 tasks
PhilippHomann opened this issue Feb 19, 2020 · 4 comments
Closed
2 of 7 tasks

Dump contains duplicated data #10365

PhilippHomann opened this issue Feb 19, 2020 · 4 comments
Labels
issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail

Comments

@PhilippHomann
Copy link
Contributor

  • Gitea version (or commit ref): v1.11.1
  • Git version: 2.24.1
  • Operating system: Docker Container
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No
    • Not relevant
  • Log gist:

Description

It seems that many data inside a dump created using gitea dump is duplicated.

Steps to reproduce

  1. Run docker container: docker run --name=gitea -p 3000:3000 -ti --rm gitea/gitea:1
  2. Run initial setup without any configuration change
  3. Run dump command
    docker exec -ti gitea sh
    su git
    gitea dump
  4. Examine dump file
Archive:  gitea-dump-1582113818.zip
Zip file size: 66623 bytes, number of entries: 53
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 custom/
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 custom/conf/
-rw-r--r--  2.0 unx     2069 bX defN 20-Feb-19 13:03 custom/conf/app.ini
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 custom/indexers/
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 custom/indexers/issues.bleve/
-rw-r--r--  2.0 unx       13 bX defN 20-Feb-19 13:03 custom/indexers/issues.bleve/rupture_meta.json
-rw-------  2.0 unx    32768 bX defN 20-Feb-19 13:03 custom/indexers/issues.bleve/store
-rw-r--r--  2.0 unx       47 bX defN 20-Feb-19 13:03 custom/indexers/issues.bleve/index_meta.json
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 custom/log/
-rw-r-----  2.0 unx    51734 bX defN 20-Feb-19 13:03 custom/log/gitea.log
-rw-r--r--  2.0 unx  1110016 bX defN 20-Feb-19 13:03 custom/gitea.db
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 custom/queues/
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 custom/queues/issue_indexer/
-rw-r--r--  2.0 unx       54 bX defN 20-Feb-19 13:03 custom/queues/issue_indexer/MANIFEST-000000
-rw-r--r--  2.0 unx       67 bX defN 20-Feb-19 13:03 custom/queues/issue_indexer/000001.log
-rw-r--r--  2.0 unx        0 bX defN 20-Feb-19 13:03 custom/queues/issue_indexer/LOCK
-rw-r--r--  2.0 unx       16 bX defN 20-Feb-19 13:03 custom/queues/issue_indexer/CURRENT
-rw-r--r--  2.0 unx      360 bX defN 20-Feb-19 13:03 custom/queues/issue_indexer/LOG
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 custom/queues/task/
-rw-r--r--  2.0 unx       54 bX defN 20-Feb-19 13:03 custom/queues/task/MANIFEST-000000
-rw-r--r--  2.0 unx       67 bX defN 20-Feb-19 13:03 custom/queues/task/000001.log
-rw-r--r--  2.0 unx        0 bX defN 20-Feb-19 13:03 custom/queues/task/LOCK
-rw-r--r--  2.0 unx       16 bX defN 20-Feb-19 13:03 custom/queues/task/CURRENT
-rw-r--r--  2.0 unx      358 bX defN 20-Feb-19 13:03 custom/queues/task/LOG
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 data/
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 data/conf/
-rw-r--r--  2.0 unx     2069 bX defN 20-Feb-19 13:03 data/conf/app.ini
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 data/indexers/
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 data/indexers/issues.bleve/
-rw-r--r--  2.0 unx       13 bX defN 20-Feb-19 13:03 data/indexers/issues.bleve/rupture_meta.json
-rw-------  2.0 unx    32768 bX defN 20-Feb-19 13:03 data/indexers/issues.bleve/store
-rw-r--r--  2.0 unx       47 bX defN 20-Feb-19 13:03 data/indexers/issues.bleve/index_meta.json
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 data/log/
-rw-r-----  2.0 unx    51734 bX defN 20-Feb-19 13:03 data/log/gitea.log
-rw-r--r--  2.0 unx  1110016 bX defN 20-Feb-19 13:03 data/gitea.db
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 data/queues/
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 data/queues/issue_indexer/
-rw-r--r--  2.0 unx       54 bX defN 20-Feb-19 13:03 data/queues/issue_indexer/MANIFEST-000000
-rw-r--r--  2.0 unx       67 bX defN 20-Feb-19 13:03 data/queues/issue_indexer/000001.log
-rw-r--r--  2.0 unx        0 bX defN 20-Feb-19 13:03 data/queues/issue_indexer/LOCK
-rw-r--r--  2.0 unx       16 bX defN 20-Feb-19 13:03 data/queues/issue_indexer/CURRENT
-rw-r--r--  2.0 unx      360 bX defN 20-Feb-19 13:03 data/queues/issue_indexer/LOG
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 data/queues/task/
-rw-r--r--  2.0 unx       54 bX defN 20-Feb-19 13:03 data/queues/task/MANIFEST-000000
-rw-r--r--  2.0 unx       67 bX defN 20-Feb-19 13:03 data/queues/task/000001.log
-rw-r--r--  2.0 unx        0 bX defN 20-Feb-19 13:03 data/queues/task/LOCK
-rw-r--r--  2.0 unx       16 bX defN 20-Feb-19 13:03 data/queues/task/CURRENT
-rw-r--r--  2.0 unx      358 bX defN 20-Feb-19 13:03 data/queues/task/LOG
drwxr-xr-x  2.0 unx        0 bx stor 20-Feb-19 13:03 log/
-rw-r-----  2.0 unx    51734 bX defN 20-Feb-19 13:03 log/gitea.log
-rw-r--r--  2.0 unx    32618 bX defN 20-Feb-19 13:03 gitea-db.sql
-rw-r--r--  2.0 unx     2069 bX defN 20-Feb-19 13:03 app.ini
-rw-r--r--  2.0 unx      142 bX defN 20-Feb-19 13:03 gitea-repo.zip
53 files, 2481841 bytes uncompressed, 58531 bytes compressed:  97.7%

LTM like the data and the custom directory contain the same data.
Also the log files (which might be quite huge) are dumped for a third time. The app.ini also.
Is this a expected behaviour?

@lunny
Copy link
Member

lunny commented Feb 19, 2020

And could you check your docker that do those duplicated files exist?

@lunny lunny added the issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail label Feb 19, 2020
@PhilippHomann
Copy link
Contributor Author

PhilippHomann commented Feb 19, 2020

The folder structure after initial setup:

bash-5.0# find /data/gitea
/data/gitea
/data/gitea/conf
/data/gitea/conf/app.ini
/data/gitea/indexers
/data/gitea/indexers/issues.bleve
/data/gitea/indexers/issues.bleve/rupture_meta.json
/data/gitea/indexers/issues.bleve/store
/data/gitea/indexers/issues.bleve/index_meta.json
/data/gitea/log
/data/gitea/log/gitea.log
/data/gitea/gitea.db
/data/gitea/queues
/data/gitea/queues/issue_indexer
/data/gitea/queues/issue_indexer/MANIFEST-000000
/data/gitea/queues/issue_indexer/000001.log
/data/gitea/queues/issue_indexer/LOCK
/data/gitea/queues/issue_indexer/CURRENT
/data/gitea/queues/issue_indexer/LOG
/data/gitea/queues/task
/data/gitea/queues/task/MANIFEST-000000
/data/gitea/queues/task/000001.log
/data/gitea/queues/task/LOCK
/data/gitea/queues/task/CURRENT
/data/gitea/queues/task/LOG

I already took a look at cmd/dump.go and it seems that the custom path and the data path are backed up separately.
But for the docker image the default custom path (set by GITEA_CUSTOM) is /data/gitea, which is the same as the APP_DATA_PATH generated by the docker image.

Also the log path is backed up explicitly. Also when its below APP_DATA_PATH.

@stale
Copy link

stale bot commented Apr 19, 2020

This issue has been automatically marked as stale because it has not had recent activity. I am here to help clear issues left open even if solved or waiting for more insight. This issue will be closed if no further activity occurs during the next 2 weeks. If the issue is still valid just add a comment to keep it alive. Thank you for your contributions.

@stale stale bot added the issue/stale label Apr 19, 2020
@PhilippHomann
Copy link
Contributor Author

@lunny Could you please approve #10376, which fixes this?

@stale stale bot removed the issue/stale label Apr 20, 2020
ydelafollye pushed a commit to ydelafollye/gitea that referenced this issue Jul 31, 2020
* Dump: Use mholt/archive/v3 to support tar including many compressions

Signed-off-by: Philipp Homann <homann.philipp@googlemail.com>

* Dump: Allow dump output to stdout

Signed-off-by: Philipp Homann <homann.philipp@googlemail.com>

* Dump: Fixed bug present since go-gitea#6677 where SessionConfig.Provider is never "file"

Signed-off-by: Philipp Homann <homann.philipp@googlemail.com>

* Dump: never pack RepoRootPath, LFS.ContentPath and LogRootPath when they are below AppDataPath

Signed-off-by: Philipp Homann <homann.philipp@googlemail.com>

* Dump: also dump LFS (fixes go-gitea#10058)

Signed-off-by: Philipp Homann <homann.philipp@googlemail.com>

* Dump: never dump CustomPath if CustomPath is a subdir of or equal to AppDataPath (fixes go-gitea#10365)

Signed-off-by: Philipp Homann <homann.philipp@googlemail.com>

* Use log.Info instead of fmt.Fprintf

Signed-off-by: Philipp Homann <homann.philipp@googlemail.com>

* import ordering

* make fmt

Co-authored-by: zeripath <art27@cantab.net>
Co-authored-by: techknowlogick <techknowlogick@gitea.io>
Co-authored-by: Matti R <matti@mdranta.net>
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail
Projects
None yet
Development

No branches or pull requests

2 participants