Skip to content
This repository has been archived by the owner on Apr 8, 2021. It is now read-only.

Quick Start example in README. #95

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

eightysteele
Copy link

This PR adds some love to the Quick Start example in README. :)

  • Fixed an issue where the SNAPSHOT version in example/iris didn't match what was in version.sbt.
  • Fixed an issue where org.apache.hadoop was excluded from runtime by sbt-assembly, causing a ClassNotFoundException to be thrown by example/iris. Basically the hadoopClient in Deps.scala was "provided" so it wasn't getting included in target/scala-2.11/brushfire-scalding-0.7.5-SNAPSHOT-jar-with-dependencies.jar during builds.

@avibryant
Copy link
Contributor

This is great, thanks! I think the reason that hadoopClient was listed as provided is that when you are submitting the assembly jar to a hadoop cluster, the hadoop jars are indeed provided in that execution environment, and it can be problematic to duplicate them. But obviously that's not the case when running locally. I'm not sure what the best way to resolve this is.

@eightysteele
Copy link
Author

Good catch!

The sbt-assembly docs offers a clue on how to resolve this.

If we add this to brushfire-scalding/build.sbt:

run in Compile <<= Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run))

runMain in Compile <<= Defaults.runMainTask(fullClasspath in Compile, runner in(Compile, run))

Then we can run it locally (with all the dependencies) like this:

$ sbt "brushfireScalding/runMain com.twitter.scalding.Tool com.stripe.brushfire.scalding.IrisJob --local --input example/iris.data --output example/iris.output"

Boom!

I wrapped the above command in a new script called quick-start, moved hadoopClient back to provided, and updated the README with examples of running locally and on the cluster.

There might be an even BETTER way to resolve this. Happy to pivot. Tell me what you think. :)

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


eightysteele seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants