-
Make root of the project as your working directory
-
Run the docker compose file:
docker-compose up -
sh into hadoop container:
docker exec -it <container-id> sh -
Run
sh /import_data.sh <table_name>. In our case, table name is 'product'. -
Insert new records into table
Using Airflow
-
Make Python-3.6.3 as working dir
-
Activate python virtual env
source venv/bin/activate -
Initialize airflow db for storing state
airflow initdb -
Start web server through which we can track the dags:
airflow webserver -p 8080 -
Run
sh /create_sqoop_job.sh <table_name> <column_name> <last_id_value>. In our case, table name is 'product', column name is 'id', last id value is 1003 -
Run the scheduler:
airflow scheduler
(To test the airflow, run airflow test pricing import_pricing_data <yyyy-mm-dd>]
Hive and ELK
-
cd into the hive home directory path and initialise the metastore for hive:
schematool -initSchema -dbType derby -
Add the following lines inside hive-site.xml
<property>
<name>system:user.name</name>
<value>usrname</value>
</property>
<property>
<name>system:java.io.tmpdir</name>
<value>/tmp/</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:/usr/local/hive/metastore_db;databaseName=metastore_db;create=true</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>/usr/local/hive/lib/elasticsearch-hadoop-7.0.0.jar</value>
</property>
- Run 'hive' and in the hive shell, execute the following statements sequentially:
-
CREATE EXTERNAL TABLE product_hdfs (id int, price int,name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/product'; -
CREATE EXTERNAL TABLE product_es (id bigint, price bigint, name string) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' ='pricing/product','es.nodes'= 'elasticsearch’); -
INSERT INTO TABLE product_es SELECT * FROM product_hdfs;
-
Check imported data in elastic search: http://localhost:9200/pricing/_search
-
Finally, create visualisation in Kibana: http://localhost:5601/