Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avro Loader #101

Merged
merged 6 commits into from Jan 14, 2015
Merged

Avro Loader #101

merged 6 commits into from Jan 14, 2015

Conversation

theJohnnyBrown
Copy link
Contributor

TODO

  • Add tests. I've run this code against fairly complicated avro data, but it's private so I'll need to write some example data.
  • Add license to files and tidy up the code.
  • parameterize the jar locations in load-avro.
  • (Stretch goal) add store-avro and local/store

(let [parsed-schema (parse-schema (:schema opts))
storage (raw/storage$ ;; this creates a new storage definition
;; add these jars to $PIG_CLASSPATH (most likely /home/hadoop/pig/lib)
["json-simple-1.1.1.jar"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need these on the classpath, would it be easier to just add them as dependencies that get pulled into the uberjar? The purpose of this was more for the annoying jars that aren't in maven.

@mbossenbroek
Copy link
Contributor

Looks good! I'm looking forward to seeing the sample data & tests. I think that'll make it a lot more clear how the schema is being used.

@mbossenbroek mbossenbroek changed the title WIP basic load works on local and cluster WIP Avro Loader Jan 10, 2015
@theJohnnyBrown
Copy link
Contributor Author

It would be great to hear about the gradle/clojure workflow people are using on this project - does anyone know how to get a repl with the dependencies loaded? I've resorted to creating another leningen project, with a dependency on pigpen-avro, opening core.clj in a cider buffer and eval-ing it in that repl.

EDIT: there's ../gradlew clojureRepl but that seems to have issues for me.

@theJohnnyBrown
Copy link
Contributor Author

@mbossenbroek, I think the issues you mentioned have been addressed. I'll save the storage side for next time (Or for someone else!). Anything else you need to see?


(set! *warn-on-reflection* true)

(declare field-names)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything except load-avro should be private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dotted-args->nested-map needs to be public so we can require it from pig

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. In other places where that's the case, I'll put the logic in a different namespace and just expose the public API by copying the symbols there. Take a look at pigpen.core for examples. It also makes testing easier because you can access all of the internals easily.

@mbossenbroek
Copy link
Contributor

Looks great! Just a couple of polish items. Thanks for the work!

@mbossenbroek
Copy link
Contributor

Almost forgot - what issues are you seeing with ./gradlew clojureRepl? IIRC, I'm getting that from clojuresque and it should work. Personally, I use eclipse and CCW - one keystroke and I have a repl in my IDE. IntelliJ+Cursive should work similarly.

@theJohnnyBrown
Copy link
Contributor Author

Here's the output from running clojureRepl: https://gist.github.com/theJohnnyBrown/faa5782176ebe327dade

@theJohnnyBrown
Copy link
Contributor Author

updated

@mbossenbroek mbossenbroek changed the title WIP Avro Loader Avro Loader Jan 14, 2015
mbossenbroek added a commit that referenced this pull request Jan 14, 2015
@mbossenbroek mbossenbroek merged commit b030ccd into Netflix:master Jan 14, 2015
@mbossenbroek mbossenbroek mentioned this pull request Jan 19, 2015
@mbossenbroek
Copy link
Contributor

Released in 0.2.13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants