New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Travis CI: download spark from apache mirror #259
Conversation
Codecov Report
@@ Coverage Diff @@
## master #259 +/- ##
=======================================
Coverage 88.49% 88.49%
=======================================
Files 20 20
Lines 739 739
Branches 57 57
=======================================
Hits 654 654
Misses 85 85 Continue to review full report at Codecov.
|
bin/download_travis_dependencies.sh
Outdated
@@ -1,18 +1,34 @@ | |||
#!/usr/bin/env bash | |||
|
|||
SPARK_BUILD_URL="http://d3kbcqa49mib13.cloudfront.net/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz" | |||
set -euo pipefail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider: -x?
bin/download_travis_dependencies.sh
Outdated
|
||
echo "Content of directory:" | ||
ls -la | ||
tar xvf "${spark_tarball}" > /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to retry if/when curl or tar fails
it actually happens quite a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds great. I added retry and checksum, etc.
bin/download_travis_dependencies.sh
Outdated
local spark_tarball="${SPARK_BUILD}.tgz" | ||
local apache_mirror_cgi="https://www.apache.org/dyn/closer.lua" | ||
local apache_archive_prefix="https://archive.apache.org/dist" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that it's easier to directly download from apache archive than from the mirror.
bin/download_travis_dependencies.sh
Outdated
|
||
echo "Content of directory:" | ||
ls -la | ||
gpg --print-md MD5 "${spark_tarball}" | tee "${spark_tarball}.gen.md5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark distribution's md5
is not generated from md5sum
.
That maybe, but there are some policy against direct download from Apache.org...
|
Ah, sorry, didn't realize the policy against direct download. |
Strangely, the Spark version
|
hmm, it should be, unless the mirror is out of sync or corrupted. does the tgz unpack? |
btw, I think it's a problem with 2.1.1 - it's gone from mirrors because the latest is 2.1.2 btw, we shouldn't need to keep our hashes. This is the way ASF release work
|
for example https://github.com/apache/spark/blob/master/R/pkg/R/install.R#L214 |
Indeed. The file downloaded from the mirror wasn't Thanks for pointing out the Apache release policies. |
} | ||
|
||
mkdir -p "${HOME}/.cache/spark-versions" && pushd $_ | ||
try_download_from_apache || try_download_from_apache || try_download_from_apache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, neat with pipefail
Changed Travis dependency download script to use apache mirror CGI for Spark.
Also updated Spark versions used in the tests to match the latest patch release.