Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions dvc/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,8 @@ class RelPath(str):
},
"http": {**HTTP_COMMON, **REMOTE_COMMON},
"https": {**HTTP_COMMON, **REMOTE_COMMON},
"webdav": {**HTTP_COMMON, **REMOTE_COMMON},
"webdavs": {**HTTP_COMMON, **REMOTE_COMMON},
"remote": {str: object}, # Any of the above options are valid
}
)
Expand Down
25 changes: 24 additions & 1 deletion dvc/path_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,14 @@ def __repr__(self):


class URLInfo(_BasePath):
DEFAULT_PORTS = {"http": 80, "https": 443, "ssh": 22, "hdfs": 0}
DEFAULT_PORTS = {
"http": 80,
"https": 443,
"ssh": 22,
"hdfs": 0,
"webdav": 80,
"webdavs": 443,
}

def __init__(self, url):
p = urlparse(url)
Expand Down Expand Up @@ -312,3 +319,19 @@ def __eq__(self, other):
and self._path == other._path
and self._extra_parts == other._extra_parts
)


class WebdavURLInfo(HTTPURLInfo):
def __init__(self, url):
super().__init__(url)

@cached_property
def url(self):
return "{}://{}{}{}{}{}".format(
self.scheme.replace("webdav", "http"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about https? webdavs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If replace in "webdavS" webdav on http will be "httpS"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, 🤦 sorry :)

Ok, a few concerns still:

  1. Do we need to adjust DEFAULT_PORTS in the URLInfo base class?
  2. Redefining url like this means loosing information. E.g. when we write in logs we will be getting http instead. In general feels like not a good idea creating a discrepancy between what we have inside class and what we return here.

Feels like we should be doing it on the Remote class level, or there should be some separate method - access url or something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Fixed

It seems to me to do a separate method is over head. It will be necessary to rewrite the entire code from HTTP with one fix, and at the same time, there will still be the possibility use in logs the wrong URL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was merely a random suggestion. The whole approach with overriding url feels like a hack and can hit us somewhere down the road. There should be a better way of doing this - e.g. RemoteWebDav class should understand how to deal with those URLs (translate them if needed and pass to HTTPRemote), etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, makes me think if we should tackle it the other way around by simply adding some flag that will tell HTTP(s) remote to use webdav to access this remote that would otherwise have a normal HTTP(s) URL. Something like

dvc remote add myremote https://example.com/path

and then

dvc remote modify myremote webdav true

or

dvc remote modify myremote type webdav

or something else. Not sure if this makes sense. It kinda goes against current conventions, but also I'm not sure if webdav is unique enough for a separate remote, maybe it is better to handle it as HTTP with some special tweaks. 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hack - not sure. The same DEFAULT_PORTS also change the state of the object without notice. The same S3 is HTTP.
Leaving url unchanged - then the purpose of the _BasePath class is not very clear. It seemed to me that he should return the ready-made address format for working with him.

As for - as an extension of http. The more I work with him, the more he is everyday. The same gc. I don’t know how it works, but if only the remote method is needed for it, it’s easy to add and now a big difference from HTTP is already obtained.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same DEFAULT_PORTS also change the state of the object without notice

Default ports logic is part of the protocols ... e.g. we want to detect that example.com:80 is the same as example.com. In your case you create an object that provides an inconsistent interface - some methods (and some logic internally) might still rely on webdav, while url has http. It can be very confusing - you ask and object to print its name - it prints Duck but then behave like Dog.

The same S3 is HTTP

not sure I got this point, could you clarify, please?

As for - as an extension of http. The more I work with him, the more he is everyday. The same gc. I don’t know how it works, but if only the remote method is needed for it, it’s easy to add and now a big difference from HTTP is already obtained.

Again, not sure I understood this. Btw, we can jump on Discord - dvc.org/chat and discuss this with me and Ruslan (ivan and ruslan there). Please, feel free to ping me/us.

self.netloc,
self._spath,
(";" + self.params) if self.params else "",
("?" + self.query) if self.query else "",
("#" + self.fragment) if self.fragment else "",
)
4 changes: 4 additions & 0 deletions dvc/remote/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
from dvc.remote.oss import RemoteOSS
from dvc.remote.s3 import RemoteS3
from dvc.remote.ssh import RemoteSSH
from dvc.remote.webdav import RemoteWEBDAV
from dvc.remote.webdavs import RemoteWEBDAVS


REMOTES = [
Expand All @@ -23,6 +25,8 @@
RemoteS3,
RemoteSSH,
RemoteOSS,
RemoteWEBDAV,
RemoteWEBDAVS,
# NOTE: RemoteLOCAL is the default
]

Expand Down
65 changes: 65 additions & 0 deletions dvc/remote/webdav.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import os.path

from .http import RemoteHTTP
from dvc.scheme import Schemes
from dvc.progress import Tqdm
from dvc.exceptions import HTTPError
from dvc.path_info import WebdavURLInfo


class RemoteWEBDAV(RemoteHTTP):
scheme = Schemes.WEBDAV
path_cls = WebdavURLInfo
REQUEST_TIMEOUT = 20

def _upload(self, from_file, to_info, name=None, no_progress_bar=False):
def chunks():
with open(from_file, "rb") as fd:
with Tqdm.wrapattr(
fd,
"read",
total=None
if no_progress_bar
else os.path.getsize(from_file),
leave=False,
desc=to_info.url if name is None else name,
disable=no_progress_bar,
) as fd_wrapped:
while True:
chunk = fd_wrapped.read(self.CHUNK_SIZE)
if not chunk:
break
yield chunk

self._create_collections(to_info)
response = self._request("PUT", to_info.url, data=chunks())
if response.status_code not in (200, 201):
raise HTTPError(response.status_code, response.reason)

def _create_collections(self, to_info):
url_cols = [x.url + "/" for x in to_info.parents][:-1]
from_idx = 0
for idx, url in enumerate(url_cols):
from_idx = idx
if bool(self._request("HEAD", url)):
break
for url in reversed(url_cols[:from_idx]):
response = self._request("MKCOL", url)
if response.status_code not in (200, 201):
if bool(self._request("HEAD", url)):
continue
raise HTTPError(response.status_code, response.reason)

def remove(self, path_info):
response = self._request("DELETE", path_info.url)
if response.status_code not in (200, 201, 204):
raise HTTPError(response.status_code, response.reason)

def gc(self):
return super(RemoteHTTP, self).gc()

def list_cache_paths(self, prefix=None, progress_callback=None):
raise NotImplementedError

def walk_files(self, path_info):
raise NotImplementedError
15 changes: 15 additions & 0 deletions dvc/remote/webdavs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from .webdav import RemoteWEBDAV
from dvc.scheme import Schemes


class RemoteWEBDAVS(RemoteWEBDAV):
scheme = Schemes.WEBDAVS

def gc(self):
raise NotImplementedError

def list_cache_paths(self, prefix=None, progress_callback=None):
raise NotImplementedError

def walk_files(self, path_info):
raise NotImplementedError
2 changes: 2 additions & 0 deletions dvc/scheme.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@ class Schemes:
GDRIVE = "gdrive"
LOCAL = "local"
OSS = "oss"
WEBDAV = "webdav"
WEBDAVS = "webdavs"
20 changes: 20 additions & 0 deletions tests/unit/remote/test_webdav.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import pytest

from dvc.exceptions import HTTPError
from dvc.path_info import WebdavURLInfo
from dvc.remote.webdav import RemoteWEBDAV
from tests.utils.httpd import StaticFileServer, WebDavSimpleHandler


def test_create_collections(dvc):
with StaticFileServer(handler_class=WebDavSimpleHandler) as httpd:
url0 = "webdav://localhost:{}/a/b/file.txt".format(httpd.server_port)
url1 = "webdav://localhost:{}/a/c/file.txt".format(httpd.server_port)
config = {"url": url0}

remote = RemoteWEBDAV(dvc, config)

remote._create_collections(WebdavURLInfo(url0))

with pytest.raises(HTTPError):
remote._create_collections(WebdavURLInfo(url1))
8 changes: 8 additions & 0 deletions tests/unit/test_path_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from dvc.path_info import HTTPURLInfo
from dvc.path_info import PathInfo
from dvc.path_info import URLInfo
from dvc.path_info import WebdavURLInfo


TEST_DEPTH = len(pathlib.Path(__file__).parents) + 1
Expand Down Expand Up @@ -89,3 +90,10 @@ def test_https_url_info_str():
def test_path_info_as_posix(mocker, path, as_posix, osname):
mocker.patch("os.name", osname)
assert PathInfo(path).as_posix() == as_posix


def test_webdav_url_info_str():
u1 = WebdavURLInfo("webdav://test.com/t1")
u2 = WebdavURLInfo("webdavs://test.com/t1")
assert u1.url == "http://test.com/t1"
assert u2.url == "https://test.com/t1"
22 changes: 22 additions & 0 deletions tests/utils/httpd.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,28 @@ def do_POST(self):
self.end_headers()


class WebDavSimpleHandler(SimpleHTTPRequestHandler):
def do_HEAD(self):
if self.path == "/a/":
self.send_response(HTTPStatus.OK)
elif self.path == "/a/b/":
self.send_response(HTTPStatus.OK)
elif self.path == "/a/c/":
self.send_response(HTTPStatus.BAD_REQUEST)
else:
self.send_response(HTTPStatus.BAD_REQUEST)
self.end_headers()

def do_MKCOL(self):
if self.path == "/a/b/":
self.send_response(HTTPStatus.CREATED)
elif self.path == "/a/c/":
self.send_response(HTTPStatus.BAD_REQUEST)
else:
self.send_response(HTTPStatus.BAD_REQUEST)
self.end_headers()


class StaticFileServer:
_lock = threading.Lock()

Expand Down