## Copying files from HDFS to Local

We can copy files from HDFS to local file system either by using `copyToLocal` or `get` command.

* `hdfs dfs -copyToLocal` or `hdfs dfs -get` â€“ to copy files or directories from HDFS to local filesystem.
* It will read all the blocks using index in sequence and construct the file in local file system.
* If the target file or directory already exists in the local file system, `get` will fail saying **already exists**

In [1]:
%%sh

hdfs dfs -help get

-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst> :
  Copy files that match the file pattern <src> to the local name.  <src> is kept. 
  When copying multiple files, the destination must be a directory. Passing -f
  overwrites the destination if it already exists and -p preserves access and
  modification times, ownership and the mode.


In [2]:
%%sh

hdfs dfs -help copyToLocal

-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst> :
  Identical to the -get command.


```{warning}
This will copy the entire folder from `/user/${USER}/retail_db` to local home directory and you will see `/home/${USER}/retail_db`. 
```

In [3]:
%%sh

hdfs dfs -ls /user/`whoami`/retail_db

Found 9 items
drwxr-xr-x   - itversity supergroup          0 2022-05-29 17:24 /user/itversity/retail_db/categories
-rw-r--r--   1 itversity supergroup   10303297 2022-05-29 17:24 /user/itversity/retail_db/create_db.sql
-rw-r--r--   1 itversity supergroup       1748 2022-05-29 17:24 /user/itversity/retail_db/create_db_tables_pg.sql
drwxr-xr-x   - itversity supergroup          0 2022-05-29 17:24 /user/itversity/retail_db/customers
drwxr-xr-x   - itversity supergroup          0 2022-05-29 17:24 /user/itversity/retail_db/departments
-rw-r--r--   1 itversity supergroup   10297372 2022-05-29 17:24 /user/itversity/retail_db/load_db_tables_pg.sql
drwxr-xr-x   - itversity supergroup          0 2022-05-29 17:24 /user/itversity/retail_db/order_items
drwxr-xr-x   - itversity supergroup          0 2022-05-29 17:24 /user/itversity/retail_db/orders
drwxr-xr-x   - itversity supergroup          0 2022-05-29 17:24 /user/itversity/retail_db/products


In [4]:
%%sh

ls -ltr /home/`whoami`/

total 0
drwxr-xr-x 19 itversity itversity 608 May 29 17:03 itversity-material


In [5]:
%%sh

mkdir /home/`whoami`/retail_db

In [6]:
%%sh

hdfs dfs -get /user/`whoami`/retail_db/* /home/`whoami`/retail_db

In [7]:
%%sh

ls -ltr /home/`whoami`/retail_db

total 20152
drwxr-xr-x 2 itversity itversity     4096 May 29 17:24 categories
-rw-r--r-- 1 itversity itversity 10303297 May 29 17:24 create_db.sql
-rw-r--r-- 1 itversity itversity     1748 May 29 17:24 create_db_tables_pg.sql
drwxr-xr-x 2 itversity itversity     4096 May 29 17:24 customers
drwxr-xr-x 2 itversity itversity     4096 May 29 17:24 departments
-rw-r--r-- 1 itversity itversity 10297372 May 29 17:24 load_db_tables_pg.sql
drwxr-xr-x 2 itversity itversity     4096 May 29 17:24 order_items
drwxr-xr-x 2 itversity itversity     4096 May 29 17:24 orders
drwxr-xr-x 2 itversity itversity     4096 May 29 17:24 products


```{note}
This will fail as retail_db folder already exists.
```

In [8]:
%%sh

hdfs dfs -get /user/`whoami`/retail_db /home/`whoami`

get: `/home/itversity/retail_db/categories/part-00000': File exists
get: `/home/itversity/retail_db/create_db.sql': File exists
get: `/home/itversity/retail_db/create_db_tables_pg.sql': File exists
get: `/home/itversity/retail_db/customers/part-00000': File exists
get: `/home/itversity/retail_db/departments/part-00000': File exists
get: `/home/itversity/retail_db/load_db_tables_pg.sql': File exists
get: `/home/itversity/retail_db/order_items/part-00000': File exists
get: `/home/itversity/retail_db/orders/part-00000': File exists
get: `/home/itversity/retail_db/products/part-00000': File exists


CalledProcessError: Command 'b'\nhdfs dfs -get /user/`whoami`/retail_db /home/`whoami`\n'' returned non-zero exit status 1.

```{note}
Alternative approach, where the folder and contents are copied directly.
```

In [9]:
%%sh

rm -rf /home/`whoami`/retail_db

In [10]:
%%sh

ls -ltr /home/`whoami`

total 0
drwxr-xr-x 19 itversity itversity 608 May 29 17:03 itversity-material


In [11]:
%%sh

hdfs dfs -get /user/`whoami`/retail_db /home/`whoami`

In [12]:
%%sh

ls -ltr /home/`whoami`/retail_db/*

-rw-r--r-- 1 itversity itversity 10303297 May 29 17:25 /home/itversity/retail_db/create_db.sql
-rw-r--r-- 1 itversity itversity     1748 May 29 17:25 /home/itversity/retail_db/create_db_tables_pg.sql
-rw-r--r-- 1 itversity itversity 10297372 May 29 17:25 /home/itversity/retail_db/load_db_tables_pg.sql

/home/itversity/retail_db/categories:
total 4
-rw-r--r-- 1 itversity itversity 1029 May 29 17:25 part-00000

/home/itversity/retail_db/customers:
total 932
-rw-r--r-- 1 itversity itversity 953719 May 29 17:25 part-00000

/home/itversity/retail_db/departments:
total 4
-rw-r--r-- 1 itversity itversity 60 May 29 17:25 part-00000

/home/itversity/retail_db/order_items:
total 5284
-rw-r--r-- 1 itversity itversity 5408880 May 29 17:25 part-00000

/home/itversity/retail_db/orders:
total 2932
-rw-r--r-- 1 itversity itversity 2999944 May 29 17:25 part-00000

/home/itversity/retail_db/products:
total 172
-rw-r--r-- 1 itversity itversity 174155 May 29 17:25 part-00000


* We can also use patterns while using `get` command to get files from HDFS to local file system. Also, we can pass multiple files or folders in HDFS to `get` command.

In [13]:
%%sh

rm -rf /home/`whoami`/retail_db

In [14]:
%%sh

ls -ltr /home/`whoami`

total 0
drwxr-xr-x 19 itversity itversity 608 May 29 17:03 itversity-material


In [15]:
%%sh

mkdir /home/`whoami`/retail_db

In [17]:
%%sh

hdfs dfs -get /user/`whoami`/retail_db/order* /home/`whoami`/retail_db

In [18]:
%%sh

ls -ltr /home/`whoami`/retail_db

total 8
drwxr-xr-x 2 itversity itversity 4096 May 29 17:26 order_items
drwxr-xr-x 2 itversity itversity 4096 May 29 17:26 orders


In [19]:
%%sh

hdfs dfs -get /user/`whoami`/retail_db/departments /user/`whoami`/retail_db/products /home/`whoami`/retail_db

In [20]:
%%sh

ls -ltr /home/`whoami`/retail_db

total 16
drwxr-xr-x 2 itversity itversity 4096 May 29 17:26 order_items
drwxr-xr-x 2 itversity itversity 4096 May 29 17:26 orders
drwxr-xr-x 2 itversity itversity 4096 May 29 17:26 departments
drwxr-xr-x 2 itversity itversity 4096 May 29 17:26 products


In [21]:
%%sh

hdfs dfs -get /user/`whoami`/retail_db/categories /user/`whoami`/retail_db/customers /home/`whoami`/retail_db

In [22]:
%%sh

ls -ltr /home/`whoami`/retail_db

total 24
drwxr-xr-x 2 itversity itversity 4096 May 29 17:26 order_items
drwxr-xr-x 2 itversity itversity 4096 May 29 17:26 orders
drwxr-xr-x 2 itversity itversity 4096 May 29 17:26 departments
drwxr-xr-x 2 itversity itversity 4096 May 29 17:26 products
drwxr-xr-x 2 itversity itversity 4096 May 29 17:26 categories
drwxr-xr-x 2 itversity itversity 4096 May 29 17:26 customers
