Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new storage backend: Dropbox (#103) #251

Merged
merged 49 commits into from
Aug 24, 2023
Merged

Add new storage backend: Dropbox (#103) #251

merged 49 commits into from
Aug 24, 2023

Conversation

MaxJa4
Copy link
Contributor

@MaxJa4 MaxJa4 commented Aug 20, 2023

Implementation of the official unofficial Dropbox API (Repo).
Partially addresses #103.

Implementation notes

  • The Dropbox API only allows file uploads <150 MB
  • Therefore, by default, a upload session is used
  • Files get uploaded in 148 MB chunks (must be multiple of 4 MB)

This is fully transparent for the user and the rest of DVB. Files of up to 350GB are supported by the API this way.

Usage notes

  • It is required to create an app in Dropbox. It is also important to set the DROPBOX_REMOTE_PATH to the app's subfolder, where the app has access to (scope)... unless when creating the app, one chooses to give the app full access to everything.
  • After adding the files modification permissions to the app scope, a new token (for DROPBOX_TOKEN) must be generated manually in the Dropbox app console (website).

To be discussed

Testing this automatically is kind of an issue... for webdav/s3 etc one can simply spin up a mock service, but afaik not for Dropbox. This would require a live Dropbox account, which is kind of messy and potentially insecure.
During manual testing I verified the functionality of the file upload as well as pruning.
--> PR as draft, until this issue is addressed/discussed.

Todo

  • Add example and explanation for Dropbox usage in README.md
  • Add mock server for testing

@m90
Copy link
Member

m90 commented Aug 20, 2023

Thanks for working on this. As you already said I also think working against live infrastructure in tests should be avoided.

The nicest thing to have would be a mock server to work against in the tests (like it's done for Azure Blob Storage). If this does not exist, I'm not entirely sure what to do. There seems to be this: https://github.com/jgonera/fake_dropbox but it looks pretty outdated. I wonder if it would be worth searching GitHub for how others do it, as we can't be the only ones who want to test Dropbox integrations. E.g. what does rclone do?

@MaxJa4
Copy link
Contributor Author

MaxJa4 commented Aug 20, 2023

rclone seems to use a virtual file system (Code) for many of their storage backends, which seems quite massive and complex for this.
fake_dropbox seems indeed very outdated and doesn't implement the v2 API version (v1 is partially deprecated).
There is also no kind of dummy account or offline testing for Dropbox (Source 1, Source 2).

One solution might be to write a small simple mock rest server which replies with the expected responses and errors (restricted to MkDir, DelFile, ListDir, UploadFile).

@m90
Copy link
Member

m90 commented Aug 20, 2023

I wonder if we could use something like this https://github.com/stomakun/DropboxSwagger to automatically generate a mock server?

@m90
Copy link
Member

m90 commented Aug 21, 2023

There also seems to be a Dropbox published spec for their API https://github.com/dropbox/dropbox-api-spec but it's using some esoteric homegrown format for defining it. Not sure how much tooling there is around it and what it would take to generate a mock server from the spec.

cmd/backup/script.go Outdated Show resolved Hide resolved
internal/storage/dropbox/dropbox.go Outdated Show resolved Hide resolved
internal/storage/dropbox/dropbox.go Show resolved Hide resolved
@MaxJa4
Copy link
Contributor Author

MaxJa4 commented Aug 21, 2023

I wonder if we could use something like this https://github.com/stomakun/DropboxSwagger to automatically generate a mock server?

Using this with go-swagger could get us a mock server rather quickly. I will give it a try, since the stone format is rather arbitrairy and doesn't have a direct link to Go... only via JSON, which is already done for us by the DropboxSwagger repo you found.

@m90
Copy link
Member

m90 commented Aug 21, 2023

Using this with go-swagger could get us a mock server rather quickly. I will give it a try, since the stone format is rather arbitrairy and doesn't have a direct link to Go... only via JSON, which is already done for us by the DropboxSwagger repo you found.

If you wanna give this a try and see if we can get a test server running, it'd be much appreciated.

@MaxJa4
Copy link
Contributor Author

MaxJa4 commented Aug 21, 2023

Stumbled upon an auth issue: The used auth technique via token is kinda bad. The token periodically (24h?) expires, so that's no good for an automatic backup solution. OAuth2 would open (Docs) a browser window to allow the connection... every time? That's no good for a docker setup either.

Update: The user needs to get an auth-token once manually via a specific URL in the browser. After that, they need to send this to an API endpoint to retrieve a refresh-token. With that, we can request new short-lived tokens to use for authorization every time we need one. (Source)
Already tested it manually, works.
Quite the tedious setup... but seems like the only option.

@m90
Copy link
Member

m90 commented Aug 21, 2023

Is there a way this can be done outside of this tool, e.g. by using curl or similar and we could just describe how it works in the README? After that people could provide the auth token to the container as before.

@MaxJa4
Copy link
Contributor Author

MaxJa4 commented Aug 21, 2023

The access token (short-lived, only a couple hours) could be retrieved via curl, but it is almost the same hassle and workflow as when getting the refresh token (long-lived, no expiry) - which can be used to get infinite amounts of fresh access tokens.
I added the OAuth2 mechanism for that and created a setup manual which describes everything. Feel free to give it a quick read, whether it is easy to understand.

@MaxJa4
Copy link
Contributor Author

MaxJa4 commented Aug 21, 2023

I had a look at using the Swagger spec plus Swagger to generate a go-server that mocks Dropbox.
While doing so, I had to fix some issues with provided Dropbox Swagger spec, otherwise it wouldn't compile.
It created almost 400 files (it's a big API)... it feels like that would make this repo quite messy. I could remove all unused routes, so it's less new files, but due to the various Arg and Error types, most of them are all somewhat connected.

Seems to me like implementing this for testing is more effort than the feature itself. I will see if I can find an alternative, unless you have an idea in the meanwhile.

@m90
Copy link
Member

m90 commented Aug 21, 2023

Another approach to use here would be to create a server that plays back a session you record from the live API once, but I'm not entirely sure how well thought out such an approach is when dealing with integration tests, instead of unit tests.

A tool that does this for unit tests would be https://github.com/dnaeon/go-vcr for example

@MaxJa4
Copy link
Contributor Author

MaxJa4 commented Aug 21, 2023

Stumbled upon that too. For testing purposes, the Dropbox SDK exposes the http client to be overwritten. So it should be technically possible to implement this. I will have a further look soon.

@m90
Copy link
Member

m90 commented Aug 22, 2023

I was actually thinking about finding something that works on integration test level (as everything else would require setting up another test mechanism in here):

  • the mock server is running in a docker container and is able to either:
    • record a session against the live dropbox endpoint
    • playback a session that has previously been recorded

This way, the tests for this PR could follow the established setup.

@MaxJa4
Copy link
Contributor Author

MaxJa4 commented Aug 22, 2023

Found something that looks promising:
https://github.com/hiredscorelabs/cornell

Will test that later.

@MaxJa4
Copy link
Contributor Author

MaxJa4 commented Aug 22, 2023

I had a look at probably a dozen mock/replay/record server tools. Many were net suitable in regards to their deployment or features. Cornell was dropping headers (-> errors) and others were passing mock data which were not suitable for the supplied Swagger spec (-> errors).

The solution I came up with is now using openapi-mock (Repo) which has a docker container. It can utilize provided example responses which are provided in the Swagger spec (added these manually to the user_v2.yaml, which I had to conver from Swagger 2.0 JSON to Swagger 3.0 YAML).

Downside of all this is, that we can't really upload anything to the mock server and therefore cannot check, if the file was correctly uploaded. But that would mostly test the API and the Dropbox library anyway, which is not our concern.

Let's see if I can get it running not only on my machine (where it worked fine) but also in the CI.

@MaxJa4
Copy link
Contributor Author

MaxJa4 commented Aug 22, 2023

minio Pulling 
 minio Error 
Error response from daemon: received unexpected HTTP status: 503 Service Unavailable

Well, Docker Hub seems to be unstable today?

@MaxJa4
Copy link
Contributor Author

MaxJa4 commented Aug 23, 2023

Well, after 3h of trying the oauth2 mock server seems to be still allergic to Github CI... locally everything worked from the beginning like a charm, but in the CI the error 'server misbehaving' coming from the internal Docker DNS server is really weird.
Feel free the experiment yourself, I tried everything that I could come up with including Google.
Didn't find a promising alternative to that specific OAuth2 mock server either.

@m90
Copy link
Member

m90 commented Aug 24, 2023

Thanks again. Could you maybe try to describe what the CI only problem with the OAuth Server is, so I can start investigating?

@MaxJa4
Copy link
Contributor Author

MaxJa4 commented Aug 24, 2023

The backup container fails to communicate with the OAuth2 mock container.
The error message is dial tcp: lookup oauth2_mock on 127.0.0.11:53: server misbehaving where 127.0.0.11:53 is Docker's DNS service.
I tried fixed IPs to eliminate the DNS issue, but that didn't work. Tried the workarounds stated in the OAuth2 mock server repo located here.
The Swagger Mock server is referenced exactly the same way but never had issues... so I suspect that the OAuth2 mock server does something different internally.
url.JoinPath(opts.OAuth2Endpoint, "oauth2/token") doesn't seem to do anything bad either, as I temporarily printed the result of it out in the CI.
Locally, everything works (frustratingly).

@m90
Copy link
Member

m90 commented Aug 24, 2023

dial tcp: lookup oauth2_mock on 127.0.0.11:53: server misbehaving

I've recently seen this in another context where it was caused by the target service not being up and therefore unreachable, even though depends_on was configured. I'll see if I can make any sense of it around here.

@m90
Copy link
Member

m90 commented Aug 24, 2023

So I could not get the compose setup from the dropbox test case running locally and tried to figure out why: it occured to me the service would never start, thus its hostname would also never resolve.

I fixed the configuration according to the documentation here: https://github.com/navikt/mock-oauth2-server and got the service starting / the tests passing.

You can see what I have done over in this PR: #257

It seems you were passing the expected response as the config file instead of the configuration values. Do you still want to adjust this in here so we can merge the PR and cut a new release?

@MaxJa4
Copy link
Contributor Author

MaxJa4 commented Aug 24, 2023

Oh... well then I wonder how it could work on my machine. But I also saw in your commit/PR, that the file extension was wrong (I used yaml, but the contents were JSON). Maybe that led to that weird behavior.
Anyway, glad it works now! Thanks for helping here. I applied your changes to this PR.

Copy link
Member

@m90 m90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great now, thank you very much, especially for going all these extra miles on the testing topic. I'll try to release this very soon, but I'll leave it unmerged for now as for not to promise anything in the docs that isn't there yet.

@m90 m90 merged commit e08a330 into offen:main Aug 24, 2023
1 check passed
@m90
Copy link
Member

m90 commented Aug 24, 2023

This is now released in v2.30.0 🎉

@MaxJa4 MaxJa4 deleted the dropbox-storage-backend branch August 24, 2023 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants