Custom Hive UDFs in Clojure
If you want to compile yourself you can:
# wherever you have your code
lein compile
lein uberjar
scp build/smoker-1.0.0-SNAPSHOT-standalone.jar myserver:~/hive-jars/smoker-standalone.jar
Then use it within Hive:
# on your server, start hive with auxpath
hive --auxpath /home/nmurray/hive-jars
# tell hive about your jars (possibly optional)
add jar /home/nmurray/hive-jars/smoker-standalone.jar;
list jars;
# create your operations
create temporary function my_lower as 'smoker.udf.MyLowerCase';
select my_lower(my_column) from my_table where ds=20110101 limit 10;
Lower-case. The "hello-world" of UDFs
create temporary function my_lower as 'smoker.udf.MyLowerCase';
select my_lower(my_column) from my_table where ds=20110101 limit 10;
Tokenize. The "hello-cruel-world" of UDTFs. UDFs emit a single record, UDTFs can emit multiple records for a single input record.
create temporary function tokenize as 'smoker.udf.MyTokenize';
select tokenize(my_column) AS (word, count) from my_table where ds=20110101 limit 10;
- Nate Murray nate@natemurray.com
- http://dev.bizo.com/2009/06/custom-udfs-and-hive.html
- http://dev.bizo.com/2010/07/extending-hive-with-custom-udtfs.html
- http://stackoverflow.com/questions/2181774/calling-clojure-from-java
- http://kotka.de/blog/2010/03/proxy_gen-class_little_brother.html
- http://clojure.org/compilation
- http://asymmetrical-view.com/2009/07/02/clojure-primitive-arrays.html
- http://groups.google.com/group/clojure/browse_thread/thread/1f48e0d4f42e95fd/b578a1834f307b06
- https://github.com/stuartsierra/clojure-hadoop