Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add simple data import and export #1630

Conversation

fredcarle
Copy link
Collaborator

@fredcarle fredcarle commented Jul 12, 2023

Relevant issue(s)

Resolves #1544

Description

This PR adds import and export functionality to the http api and cli. It can export to json to reduce the potential file size. At this stage csv output was not implemented as it would require extensive type casting (everything in csv is a string) on both writing and reading from the csv.

Tasks

  • I made sure the code is well commented, particularly hard-to-understand areas.
  • I made sure the repository-held documentation is changed accordingly.
  • I made sure the pull request title adheres to the conventional commit style (the subset used in the project can be found in tools/configs/chglog/config.yml).
  • I made sure to discuss its limitations such as threats to validity, vulnerability to mistake and misuse, robustness to invalidation of assumptions, resource requirements, ...

How has this been tested?

make test and manual testing

Specify the platform(s) on which this was tested:

  • (modify the list accordingly)
  • MacOS

@fredcarle fredcarle added feature New feature or request area/api Related to the external API component area/cli Related to the CLI binary labels Jul 12, 2023
@fredcarle fredcarle requested a review from a team July 12, 2023 19:19
@fredcarle fredcarle self-assigned this Jul 12, 2023
@fredcarle fredcarle force-pushed the fredcarle/feature/I1544-simple-import-export branch from 6a710f3 to cb93a0a Compare July 12, 2023 22:08
@codecov
Copy link

codecov bot commented Jul 12, 2023

Codecov Report

Patch coverage: 67.36% and project coverage change: -0.19 ⚠️

Comparison is base (6d896ba) 75.46% compared to head (661654e) 75.27%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1630      +/-   ##
===========================================
- Coverage    75.46%   75.27%   -0.19%     
===========================================
  Files          203      208       +5     
  Lines        21092    21694     +602     
===========================================
+ Hits         15916    16330     +414     
- Misses        4082     4221     +139     
- Partials      1094     1143      +49     
Flag Coverage Δ
all-tests 75.27% <67.36%> (-0.19%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
api/http/errors.go 100.00% <ø> (ø)
cli/dump.go 11.11% <0.00%> (ø)
cli/peerid.go 46.03% <0.00%> (ø)
cli/ping.go 38.64% <0.00%> (ø)
cli/request.go 40.37% <0.00%> (ø)
cli/schema_add.go 30.51% <0.00%> (ø)
cli/schema_migration_get.go 38.33% <0.00%> (ø)
cli/schema_migration_set.go 54.14% <0.00%> (ø)
db/errors.go 74.74% <17.14%> (-13.01%) ⬇️
cli/errors.go 25.64% <22.22%> (+0.64%) ⬆️
... and 13 more

... and 7 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6d896ba...661654e. Read the comment docs.

@fredcarle fredcarle force-pushed the fredcarle/feature/I1544-simple-import-export branch from cb93a0a to 0e1ee1b Compare July 13, 2023 13:38
cli/db_export.go Outdated Show resolved Hide resolved
Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the documentation at least needs to change, documenting the memory cost of this operation, if the team is happy with the current implementation in the short-term.

Copy link
Member

@shahzadlone shahzadlone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a nice first step and I think provides quality of life improvements for users.

I do have a slight nitpick for the terminology of backup/recover over import/export but I think that might be just me :P

suggestion: Would be nice to perhaps include a simple cli usage example in the README.md

api/http/handlerfuncs_export.go Outdated Show resolved Hide resolved
api/http/handlerfuncs_export.go Outdated Show resolved Hide resolved
cli/db_import_test.go Outdated Show resolved Hide resolved
@fredcarle fredcarle added this to the DefraDB v0.6 milestone Jul 17, 2023
@fredcarle fredcarle force-pushed the fredcarle/feature/I1544-simple-import-export branch from 0e1ee1b to 5577321 Compare July 19, 2023 04:36
@fredcarle
Copy link
Collaborator Author

Note: The tests will fail as I didn't update them for the most recent changes. I want to ensure that we agree with the approach before I put time into the tests which I'll do as soon as consensus is reached.

The change moves the handling of the import and export to the db package as it's a db specific action. It can be done programatically or via CLI. The new approach builds the export file document by document and reads the import file document by document. As such, the memory usage should remain quite low. The obvious downside is that we will only support JSON import/export for now but I feel like that is less important than the benefit of the reduction in memory usage.

Let me know what you guys think :)

db/backup.go Show resolved Hide resolved
@fredcarle fredcarle force-pushed the fredcarle/feature/I1544-simple-import-export branch from 5577321 to 97f4a2c Compare July 20, 2023 04:09
@fredcarle fredcarle requested a review from a team July 20, 2023 04:24
@fredcarle fredcarle force-pushed the fredcarle/feature/I1544-simple-import-export branch from e6a0c1a to cc38c53 Compare July 20, 2023 04:27
Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I'd like a few more changes before merge please. I'm only part way through review, but thought I'd submit what I have now and let you get cracking.

client/backup.go Show resolved Hide resolved
cli/db_export.go Outdated Show resolved Hide resolved
db/backup.go Outdated Show resolved Hide resolved
client/backup.go Show resolved Hide resolved
client/document.go Outdated Show resolved Hide resolved
api/http/router.go Outdated Show resolved Hide resolved
client/mocks/DB.go Outdated Show resolved Hide resolved
db/backup.go Outdated Show resolved Hide resolved
db/errors.go Outdated Show resolved Hide resolved
@fredcarle fredcarle force-pushed the fredcarle/feature/I1544-simple-import-export branch 2 times, most recently from 9d06362 to 2307340 Compare July 21, 2023 03:47
@fredcarle fredcarle force-pushed the fredcarle/feature/I1544-simple-import-export branch from 513d731 to 61ef8ce Compare July 22, 2023 03:23
@fredcarle fredcarle force-pushed the fredcarle/feature/I1544-simple-import-export branch from dcc6b04 to 27e1e6b Compare July 22, 2023 04:55
README.md Outdated Show resolved Hide resolved
@@ -387,6 +389,25 @@ defradb start --allowed-origins=http://localhost:3000

The catch-all `*` is also a valid origin.

## Backing up and restoring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I like this backing up and restoring analogy better :), but wondering if should be consistent as currently we have files with the following names:

  • backup_export.go
  • backup_import.go

Maybe instead can be:

  • backup.go
  • restore.go

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I went about it is: it's the backup feature and it can export and import data. I wanted a category of actions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I get that, I feel like backup is being used as an umbrella term. IMO commands will be simpler too if we can pick up one concise word for each action. As pointed out below

## Backing up and restoring

It is currently not possible to do a full backup of DefraDB that includes the history of changes through the Merkle DAG. However, DefraDB currently supports a simple backup of the current data state in JSON format that can be used to seed a database or help with transitioning from one DefraDB version to another.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: If the above thought is expanded then we can have the cli be:

defradb client backup path/to/backup.json
defradb client restore path/to/backup.json

Instead of:

defradb client backup export path/to/backup.json
defradb client backup import path/to/backup.json

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had

defradb client export path/to/backup.json
defradb client import path/to/backup.json

before and it was suggested that grouping them under a command / api path would be clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I do like the one with less words to type. But if there seems to be a consensus reached already then will just follow that (also is a non-blocker thought).

@fredcarle fredcarle merged commit bc8ada9 into sourcenetwork:develop Jul 22, 2023
11 checks passed
@fredcarle fredcarle deleted the fredcarle/feature/I1544-simple-import-export branch July 22, 2023 05:32
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
## Relevant issue(s)

Resolves sourcenetwork#1544

## Description

This PR adds import and export functionality to the http api and cli. It
can export to json to reduce the potential file size. At this stage csv
output was not implemented as it would require extensive type casting
(everything in csv is a string) on both writing and reading from the
csv.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Related to the external API component area/cli Related to the CLI binary feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simple version migration import/export
4 participants