This program uses Oracle JDBC to export Oracle table data to CSV files, then upload CSV files to HDFS, and finally use impala JDBC to operate kudu, build tables and import data
git clone https://github.com/fengbaoli/oracle-to-kudu.git
<1>This example shows how to build and run a Maven-based project to execute SQL queries on Impala using JDBC
This example was tested using Impala 2.3 included with CDH 5.12.0 and theImpala JDBC Driver v2.5.30
When you download the Impala JDBC Driver from the link above, it is packaged as a zip file with separate distributions for JDBC3, JDBC4 and JDBC4.1. This example uses the distribution for JDBC4.1 on RHEL6 x86_64. The downloaded zip file contains the following eleven jar files:
Manually configure project dependency packages using the MVN command for example configure zookeeper-3.4.6.jar maven: mvn install:install-file -Dfile=zookeeper-3.4.6.jar -DgroupId=ora.apache.zookeeper -DartifactId=zookeeper -Dversion=3.4.6 -Dpackaging=jar
<2>Download JDBC from Oracle's official website and configure JDBC manually
mvn install:install-file -Dfile=ojdbc6.jar -DgroupId=com.oracle -DartifactId=ojdbc6 -Dversion=6.0 -Dpackaging=jar
mvn package
ora_url=jdbc:oracle:thin:@10.205.44.53:1521:ora11g
##oracle jdbc url
ora_username = test
##oracle export username
ora_password = test
##oracle export password
impala_url=jdbc:impala://hadoop4:21050
##impala jdnc url
impala_database = oracle
##impala database name
batch_size=4
##per export tables nums
hdfssuperuser = hdfs
##hdfs supper username
fs.defaultFS=hdfs://ns1
##hdfs ha
dfs.nameservices=ns1
##hdfs ha nameservices
local_path=data
##local export path,default data dir
hdfs_path=/opt/ogg
##hdfs unload path
timezone = PRC
##csv file timezone transfer
skip_tables=
##no need export tablename,ie:tesst1,test2
table import kudu define primary key,the format is:
tablename1 = pk1
tablename2 = pk2
configure logs
java -jar oracle-kudu-1.0-SNAPSHOT.jar