#### Sqoop InterView Questions
 * What are the source and targets used in Sqoop
 * How do you automate sqoop scripts?
 * where does sqoop scripts store?
 * what password-file contains
 * what error throws when import table has no primary key but specified --num-mappers>1 or --m>1 ?
     * No primary key found for table either use split-by or specify --m 1
 * do you use hive/hbase to import/export?
     * if import what's the approach
     * if export what's the target(either RDBMS or flat files)
 * does Incremental update supports hive via sqoop?
 * sqoop export process? what is clear-staging-table
 * sqoop export column datatypes convertion?
 


In [None]:
echo -n "itversity" > database.password

# hdfs path for password file
sqoop list-tables \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password-file /user/rposam2020/database.password

# local path for password file

sqoop list-tables \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password-file file:///home/rposam2020/database.password


sqoop eval \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password-file file:///home/rposam2020/database.password \
--query "select * from emp"

sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password-file file:///home/rposam2020/database.password \
--table "emp" \
--split-by "deptno" \
--target-dir sqoop_import/emp

sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password-file file:///home/rposam2020/database.password \
--table "emp" \
--m 1 \
--target-dir sqoop_import/emp_target_mappers

sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password-file file:///home/rposam2020/database.password \
--table emp \
--m 1 \
--hbase-row-key "empno" \
--hbase-table "rposam:emp_import" \
--column-family emp_col_fam \
--hbase-create-table

sqoop import \
-D sqoop.hbase.add.row.key=true  \
--verbose \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password-file file:///home/rposam2020/database.password \
--table emp \
--m 1 \
--hbase-row-key empno,deptno \
--hbase-table "rposam:emp_import" \
--column-family emp_col_fam \
--hbase-create-table


sqoop export \
-D \
--verbose \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password-file file:///home/rposam2020/database.password \
--table "emp_export" \
--export-dir data/emp.csv \
--input-fields-terminated-by "," \
--input-lines-terminated-by "\n" \
--input-optionally-enclosed-by '"'	

sqoop export \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password-file file:///home/rposam2020/database.password \
--table "emp_export" \
--update-key empno \
--export-dir data/emp_updated.csv \
--input-fields-terminated-by "," \
--input-lines-terminated-by "\n" \
--input-optionally-enclosed-by '"'	


sqoop job --create emp_incr_job \
-- import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password-file file:///home/rposam2020/database.password \
--table emp_copy \
--target-dir sqoop_import/emp_incremental \
--append \
--split-by "deptno" \
--fields-terminated-by ',' \
--lines-terminated-by '\n' \
--check-column "empno" \
--incremental append \
--last-value 0

sqoop job --exec emp_incr_job
sqoop job --delete emp_incr_job
sqoop job --list 

sqoop import \
 --num-mappers 1 \
 --connect jdbc:mysql://ms.itversity.com/retail_export \
 --username retail_user \
 --password-file file:///home/rposam2020/database.password \
 --table emp \
 --hive-import \
 --hive-overwrite \
 --hive-table rposam_db.emp\
 --compress \
 --compression-codec org.apache.hadoop.io.compress.SnappyCodec

##### Let us understand the relevance of using staging tables while performing sqoop export from HDFS to tables in relational databases.

* By default if there are any exceptions in map tasks while running map reduce job of sqoop export, the task will be retried four times.
* Due to that, the data in the target table might be partially loaded. At times cleaning up the partial load and reloading can be tedious.
* We can use --staging-table to overcome this issue. Data will be first loaded into staging table. If there are no exceptions then data will be copied from staging table into the target table.
* If data in staging table is not cleaned up for any reason, we might have to use additional control argument --clear-staging-table.
* --clear-staging-table will ensure that data is deleted in the staging table before the export.

        USE retail_export;

        CREATE TABLE training_daily_revenue_stage
        AS SELECT * FROM training_daily_revenue WHERE 1=2;

        DELETE FROM training_daily_revenue
          WHERE order_date != '2013-08-03 00:00:00.0';
        COMMIT;

        sqoop export \
          --connect jdbc:mysql://ms.itversity.com:3306/retail_export \
          --username retail_user \
          --password itversity \
          --export-dir /apps/hive/warehouse/training_sqoop_retail.db/daily_revenue \
          --table training_daily_revenue \
          --staging-table training_daily_revenue_stage \
          --clear-staging-table


In [None]:
sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_db \
--username retail_user \
--password itversity \
--table departments \
--warehouse-dir /user/rposam2020/sqoop_import/retail_db \
--delete-target-dir \
--compress 


sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_db \
--username retail_user \
--P \
--table departments \
--columns department_name \
--warehouse-dir /user/rposam2020/sqoop_import/retail_db \
--delete-target-dir

sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--P \
--query "select d.deptno,d.dname,count(1) no_of_employees from dept d join emp e on (d.deptno = e.deptno)
where \$CONDITIONS
group by d.deptno,d.dname" \
--target-dir /user/rposam2020/sqoop_import/retail_db/dept_details \
--delete-target-dir \
--split-by d.deptno 

Table and/or columns are mutually exclusive with the query
for query split-by is mandatory if num-mappers greater than 1
query should have a placeholder \$CONDITIONS

sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password itversity \
--table emp \
--warehouse-dir /user/rposam2020/sqoop_import/retail_db \
--delete-target-dir \
--m 1

sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password itversity \
--table emp \
--warehouse-dir /user/rposam2020/sqoop_import/retail_db \
--delete-target-dir \
--m 1 \
--null-string '""' \
--null-non-string "0.00" \
--enclosed-by '"' \
--fields-terminated-by '\t' \
--lines-terminated-by '\n'

sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password itversity \
--table dept \
--where "deptno = 10" \
--warehouse-dir /user/rposam2020/sqoop_import/retail_db \
--delete-target-dir \
--m 1

sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password itversity \
--query "select * from dept where \$CONDITIONS and deptno = 20 " \
--target-dir /user/rposam2020/sqoop_import/retail_db/dept \
--append \
--m 1

sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password itversity \
--table dept \
--where "deptno = 30" \
--target-dir /user/rposam2020/sqoop_import/retail_db/dept \
--append \
--m 1 

sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--P \
--table dept \
--check-column deptno \
--target-dir /user/rposam2020/sqoop_import/retail_db/dept \
--incremental append \
--last-value 30 \
--m 1 

sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password itversity \
--table emp \
--hive-import \
--hive-database abc_rposam_db \
--hive-table emp \
--num-mappers 1


sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password itversity \
--table dept \
--hive-import \
--hive-database abc_rposam_db \
--hive-table dept \
--num-mappers 1

describe formatted emp;

sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password itversity \
--table emp \
--hive-import \
--hive-database abc_rposam_db \
--hive-table emp \
--hive-overwrite \
--num-mappers 1

sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password itversity \
--table dept \
--hive-import \
--hive-database abc_rposam_db \
--hive-table dept \
--hive-overwrite \
--num-mappers 1

hdfs://nn01.itversity.com:8020/apps/hive/warehouse/abc_rposam_db.db/emp

sqoop import \
--connect jdbc:mysql://ms.itversity.com/retail_export \
--username retail_user \
--password itversity \
--table dept \
--hive-import \
--hive-database abc_rposam_db \
--hive-table dept \
--create-hive-table \
--num-mappers 1

staging location will be user home location /user/rposam2020 if you don't mention any staging table path
