Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do cleanup if there is an error mid-download #68

Merged
merged 6 commits into from
Sep 12, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,14 @@ name = "DataDeps"
uuid = "9ca5147a-7bb5-11e8-2d1a-6b423858166c"
version = "0.5.0"


[deps]
HTTP = "cd3eb016-35fb-5094-929b-558a96fad6f3"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
SHA = "ea8e919c-243c-51af-8825-aaa63cd721ce"

[compat]
ExpectationStubs = "0.2.0"

[extras]
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -391,7 +391,9 @@ DataDeps.jl tries to have very sensible defaults.
- `DATADEPS_DISABLE_DOWNLOAD` -- causes any action that would result in the download being triggered to throw an exception.
- useful e.g. if you are in an environment with metered data, where your datasets should have already been downloaded earlier, and if there were not you want to respond to the situation rather than let DataDeps download them for you.
- default `false`
- `DATADEPS_DISABLE_ERROR_CLEANUP` -- By default DataDeps.jl will cleanup the directory the datadep was being downloaded to if there is an error during the resolution (In any of the `fetch`, `checksum`, or `post_fetch`). For debugging purposes you may wish to disable this cleanup step so you can interrogate the files by hand.


## Extending DataDeps.jl for Contributors
Feel free (encouraged even) to open issues and make PRs.

Expand Down
4 changes: 2 additions & 2 deletions src/fetch_helpers.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ This is using the HTTP protocol's method of defining filenames in headers,
if that information is present.
"""
function fetch_http(remotepath, localdir)
@assert(localdir |> isdir)
@assert(isdir(localdir))
filename = get_filename(remotepath)
localpath = safer_joinpath(localdir, filename)
Base.download(remotepath, localpath)
Expand All @@ -27,7 +27,7 @@ are not allowed to contain `..`, or begin with a `/`.
If they do then this throws an `DomainError`.
"""
function safer_joinpath(basepart, parts...)
explain = "Possible Directory Traversal Attack detected."
explain = "Possible directory traversal attack detected."
for part in parts
occursin(part, "..") && throw(DomainError(part, "contains illegal string \"..\". $explain"))
startswith(part, '/') && throw(DomainError(part, "begins with \"/\". $explain"))
Expand Down
23 changes: 14 additions & 9 deletions src/resolution_automatic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ As such it include a number of parameters that most people will not want to use.
- `skip_checksum`: setting this to true causes the checksum to not be checked. Use this if the data has changed since the checksum was set in the registry, or for some reason you want to download different data.
- `i_accept_the_terms_of_use`: use this to bypass the I agree to terms screen. Useful if you are scripting the whole process, or using annother system to get confirmation of acceptance.
- For automation perposes you can set the enviroment variable `DATADEPS_ALWAYS_ACCEPT`
- If not set, and if `DATADEPS_ALWAYS_ACCEPT` is not set, then the user will be prompted
- If not set, and if `DATADEPS_ALWAYS_ACCEPT` is not set, then the user will be prompted.
- Strictly speaking these are not always terms of use, it just refers to the message and permission to download.

If you need more control than this, then your best bet is to construct a new DataDep object, based on the original,
Expand All @@ -45,15 +45,21 @@ function Base.download(

accept_terms(datadep, localdir, remotepath, i_accept_the_terms_of_use)

local fetched_path
while true
fetched_path = run_fetch(datadep.fetch_method, remotepath, localdir)
if skip_checksum || checksum_pass(datadep.hash, fetched_path)
break
mkpath(localdir)
try
local fetched_path
while true # this is a Do-While loop
fetched_path = run_fetch(datadep.fetch_method, remotepath, localdir)
if skip_checksum || checksum_pass(datadep.hash, fetched_path)
break
end
end
end

run_post_fetch(datadep.post_fetch_method, fetched_path)
run_post_fetch(datadep.post_fetch_method, fetched_path)
catch err
env_bool("DATADEPS_DISABLE_ERROR_CLEANUP") || rm(localdir, force=true, recursive=true)
rethrow(err)
end
end

"""
Expand All @@ -64,7 +70,6 @@ into the local directory and local paths.
Performs in (async) parallel if multiple paths are given
"""
function run_fetch(fetch_method, remotepath, localdir)
mkpath(localdir)
localpath = fetch_method(remotepath, localdir)
localpath
end
Expand Down
56 changes: 55 additions & 1 deletion test/main.jl
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ withenv("DATADEPS_ALWAYS_ACCEPT"=>"true") do
))

@test endswith(datadep"Test1", "Test1") || endswith(datadep"Test1", "Test1/") || endswith(datadep"Test1", "Test1\\")

@test all_expectations_used(dummyhash)
@test all_expectations_used(dummydown)

Expand All @@ -36,4 +36,58 @@ withenv("DATADEPS_ALWAYS_ACCEPT"=>"true") do
@test true
end


@testset "Ensure when errors occur the datadep will still retrydownloading" begin
@testset "error in checksum" begin
@stub dummydown
@expect dummydown(::Any, ::Any) = @__FILE__ # give path to an actual file so `open` works


register(DataDep("TestErrorChecksum", "dummy message", "http://example.void",
(error, "1234"); # this will throw an error
fetch_method=dummydown))
@test_throws ErrorException datadep"TestErrorChecksum"
#datadep"TestErrorChecksum"
@test @usecount(dummydown(::Any, ::Any)) == 1

@test_throws ErrorException datadep"TestErrorChecksum"
@test @usecount(dummydown(::Any, ::Any)) == 2 # it should have tried to download again
end

@testset "error in post fetch" begin
@stub dummydown
@expect dummydown(::Any, ::Any) = joinpath(@__DIR__, "eg.zip")

register(DataDep("TestErrorPostFetch", "dummy message", "http://example.void", Any,
fetch_method=dummydown,
post_fetch_method = error
))
@test_throws ErrorException datadep"TestErrorPostFetch"
@test @usecount(dummydown(::Any, ::Any)) == 1

@test_throws ErrorException datadep"TestErrorPostFetch"
@test @usecount(dummydown(::Any, ::Any)) == 2 # it should have tried to download again
end


@testset "error in fetch" begin
use_count = 0
function error_down(rp,lp)
use_count += 1
error("no download for you")
end

register(DataDep("TestErrorFetch", "dummy message", "http://example.void", Any,
fetch_method = error_down
))
@test_throws ErrorException datadep"TestErrorFetch"
@test use_count == 1

@test_throws ErrorException datadep"TestErrorFetch"
@test use_count == 2 # it should have tried to download again
end
end
end