Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Travis CI: download spark from apache mirror #259

Merged
merged 5 commits into from Jan 28, 2018

Conversation

darthsuogles
Copy link
Contributor

@darthsuogles darthsuogles commented Jan 22, 2018

Changed Travis dependency download script to use apache mirror CGI for Spark.
Also updated Spark versions used in the tests to match the latest patch release.

@codecov-io
Copy link

codecov-io commented Jan 22, 2018

Codecov Report

Merging #259 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #259   +/-   ##
=======================================
  Coverage   88.49%   88.49%           
=======================================
  Files          20       20           
  Lines         739      739           
  Branches       57       57           
=======================================
  Hits          654      654           
  Misses         85       85

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 207d592...3c9fdd8. Read the comment docs.

@@ -1,18 +1,34 @@
#!/usr/bin/env bash

SPARK_BUILD_URL="http://d3kbcqa49mib13.cloudfront.net/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz"
set -euo pipefail
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider: -x?


echo "Content of directory:"
ls -la
tar xvf "${spark_tarball}" > /dev/null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to retry if/when curl or tar fails
it actually happens quite a bit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great. I added retry and checksum, etc.

local spark_tarball="${SPARK_BUILD}.tgz"
local apache_mirror_cgi="https://www.apache.org/dyn/closer.lua"
local apache_archive_prefix="https://archive.apache.org/dist"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that it's easier to directly download from apache archive than from the mirror.


echo "Content of directory:"
ls -la
gpg --print-md MD5 "${spark_tarball}" | tee "${spark_tarball}.gen.md5"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark distribution's md5 is not generated from md5sum.

@felixcheung
Copy link
Member

felixcheung commented Jan 23, 2018 via email

@darthsuogles
Copy link
Contributor Author

Ah, sorry, didn't realize the policy against direct download.
Switching back to using the mirror and added MD5 files manually.

@darthsuogles
Copy link
Contributor Author

Strangely, the Spark version 2.1.1 I am downloading from the mirror gives a different checksum than the one from the archive.

< spark-2.1.1-bin-hadoop2.7.tgz: 52 A0 05 CE CB E0 CE 49  39 F5 5B 0C 78 65 3A 2C
---
> spark-2.1.1-bin-hadoop2.7.tgz: C0 AC C4 44 7D 5C D7 CB  43 5E 51 64 80 98 86 E1

@felixcheung
Copy link
Member

hmm, it should be, unless the mirror is out of sync or corrupted. does the tgz unpack?

@felixcheung
Copy link
Member

felixcheung commented Jan 24, 2018

btw, I think it's a problem with 2.1.1 - it's gone from mirrors because the latest is 2.1.2
so your hash is probably a html page or something.

btw, we shouldn't need to keep our hashes. This is the way ASF release work

http://www.apache.org/legal/release-policy.html#host-GA

@felixcheung
Copy link
Member

for example https://github.com/apache/spark/blob/master/R/pkg/R/install.R#L214
but we don't check the hash there :) (maybe we should...)

@darthsuogles
Copy link
Contributor Author

Indeed. The file downloaded from the mirror wasn't tgz.
Let me remove the signature check and perhaps put it in a later PR.

Thanks for pointing out the Apache release policies.
If we were to check the

}

mkdir -p "${HOME}/.cache/spark-versions" && pushd $_
try_download_from_apache || try_download_from_apache || try_download_from_apache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, neat with pipefail

@felixcheung felixcheung merged commit a142721 into graphframes:master Jan 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants