## Copying files from local to HDFS

We can copy files from local file system to HDFS either by using `copyFromLocal` or `put` command.
* `hdfs dfs -copyFromLocal` or `hdfs dfs -put` – to copy files or directories from local filesystem into HDFS. We can also use `hadoop fs` in place of `hdfs dfs`.
* However, we will not be able to update or fix data in files when they are in HDFS. If we have to fix any data, we have to move file to local file system, fix data and then copy back to HDFS.
* Files will be divided into blocks and will be stored on Datanodes in distributed fashion based on block size and replication factor. We will get into the details later.

![test](https://s3.amazonaws.com/kaizen.itversity.com/hadoop-overview/04HDFSAnatomyOfFileWrite.png)


In [1]:
%%sh

hdfs dfs -ls /user/${USER}

Found 3 items
drwx------   - itversity students          0 2021-01-17 19:11 /user/itversity/.Trash
drwxr-xr-x   - itversity students          0 2021-01-14 07:46 /user/itversity/.sparkStaging
drwxr-xr-x   - itversity students          0 2021-01-14 07:41 /user/itversity/warehouse


In [2]:
%%sh

hdfs dfs -mkdir /user/${USER}/retail_db

In [4]:
%%sh

hdfs dfs -ls /user/${USER}

Found 4 items
drwx------   - itversity students          0 2021-01-17 19:11 /user/itversity/.Trash
drwxr-xr-x   - itversity students          0 2021-01-14 07:46 /user/itversity/.sparkStaging
drwxr-xr-x   - itversity students          0 2021-01-17 19:16 /user/itversity/retail_db
drwxr-xr-x   - itversity students          0 2021-01-14 07:41 /user/itversity/warehouse


In [5]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db

In [6]:
%%sh

hdfs dfs -help put

-put [-f] [-p] [-l] <localsrc> ... <dst> :
  Copy files from the local file system into fs. Copying fails if the file already
  exists, unless the -f flag is given.
  Flags:
                                                                       
  -p  Preserves access and modification times, ownership and the mode. 
  -f  Overwrites the destination if it already exists.                 
  -l  Allow DataNode to lazily persist the file to disk. Forces        
         replication factor of 1. This flag will result in reduced
         durability. Use with care.


In [7]:
%%sh

hdfs dfs -help copyFromLocal

-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst> :
  Identical to the -put command.


```{warning}
This will copy the entire folder to `/user/${USER}/retail_db` and you will see `/user/${USER}/retail_db/retail_db`. You can use the next command to get files as expected.
```

In [8]:
%%sh

ls -ltr /data/retail_db

total 24
drwxr-xr-x 2 root root 4096 Feb 20  2017 products
drwxr-xr-x 2 root root 4096 Feb 20  2017 orders
drwxr-xr-x 2 root root 4096 Feb 20  2017 order_items
drwxr-xr-x 2 root root 4096 Feb 20  2017 departments
drwxr-xr-x 2 root root 4096 Feb 20  2017 customers
drwxr-xr-x 2 root root 4096 Feb 20  2017 categories


In [9]:
%%sh

hdfs dfs -put /data/retail_db /user/${USER}/retail_db

In [10]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db

Found 1 items
drwxr-xr-x   - itversity students          0 2021-01-17 19:19 /user/itversity/retail_db/retail_db


In [11]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db/retail_db

Found 6 items
drwxr-xr-x   - itversity students          0 2021-01-17 19:19 /user/itversity/retail_db/retail_db/categories
drwxr-xr-x   - itversity students          0 2021-01-17 19:19 /user/itversity/retail_db/retail_db/customers
drwxr-xr-x   - itversity students          0 2021-01-17 19:19 /user/itversity/retail_db/retail_db/departments
drwxr-xr-x   - itversity students          0 2021-01-17 19:19 /user/itversity/retail_db/retail_db/order_items
drwxr-xr-x   - itversity students          0 2021-01-17 19:19 /user/itversity/retail_db/retail_db/orders
drwxr-xr-x   - itversity students          0 2021-01-17 19:19 /user/itversity/retail_db/retail_db/products


```{note}
Let's drop this folder and make sure files are copied as expected. As the folder is pre-created, we can use patterns to copy the sub folders.
```

In [12]:
%%sh

hdfs dfs -help rm

-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ... :
  Delete all files that match the specified file pattern. Equivalent to the Unix
  command "rm <src>"
                                                                                 
  -f          If the file does not exist, do not display a diagnostic message or 
              modify the exit status to reflect an error.                        
  -[rR]       Recursively deletes directories.                                   
  -skipTrash  option bypasses trash, if enabled, and immediately deletes <src>.  
  -safely     option requires safety confirmation, if enabled, requires          
              confirmation before deleting large directory with more than        
              <hadoop.shell.delete.limit.num.files> files. Delay is expected when
              walking over large directory recursively to count the number of    
              files to be deleted before the confirmation.                       


In [13]:
%%sh

hdfs dfs -rm -R -skipTrash /user/${USER}/retail_db/retail_db

Deleted /user/itversity/retail_db/retail_db


In [15]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db/

In [16]:
%%sh

hdfs dfs -put /data/retail_db/order* /user/${USER}/retail_db

In [17]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db/

Found 2 items
drwxr-xr-x   - itversity students          0 2021-01-17 19:21 /user/itversity/retail_db/order_items
drwxr-xr-x   - itversity students          0 2021-01-17 19:21 /user/itversity/retail_db/orders


In [20]:
%%sh

hdfs dfs -put -f /data/retail_db/* /user/${USER}/retail_db

In [21]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db/

Found 6 items
drwxr-xr-x   - itversity students          0 2021-01-17 19:22 /user/itversity/retail_db/categories
drwxr-xr-x   - itversity students          0 2021-01-17 19:22 /user/itversity/retail_db/customers
drwxr-xr-x   - itversity students          0 2021-01-17 19:22 /user/itversity/retail_db/departments
drwxr-xr-x   - itversity students          0 2021-01-17 19:22 /user/itversity/retail_db/order_items
drwxr-xr-x   - itversity students          0 2021-01-17 19:22 /user/itversity/retail_db/orders
drwxr-xr-x   - itversity students          0 2021-01-17 19:22 /user/itversity/retail_db/products


In [23]:
%%sh

hdfs dfs -ls -R /user/${USER}/retail_db/

drwxr-xr-x   - itversity students          0 2021-01-17 19:22 /user/itversity/retail_db/categories
-rw-r--r--   2 itversity students       1029 2021-01-17 19:22 /user/itversity/retail_db/categories/part-00000
drwxr-xr-x   - itversity students          0 2021-01-17 19:22 /user/itversity/retail_db/customers
-rw-r--r--   2 itversity students     953719 2021-01-17 19:22 /user/itversity/retail_db/customers/part-00000
drwxr-xr-x   - itversity students          0 2021-01-17 19:22 /user/itversity/retail_db/departments
-rw-r--r--   2 itversity students         60 2021-01-17 19:22 /user/itversity/retail_db/departments/part-00000
drwxr-xr-x   - itversity students          0 2021-01-17 19:22 /user/itversity/retail_db/order_items
-rw-r--r--   2 itversity students    5408880 2021-01-17 19:22 /user/itversity/retail_db/order_items/part-00000
drwxr-xr-x   - itversity students          0 2021-01-17 19:22 /user/itversity/retail_db/orders
-rw-r--r--   2 itversity students    2999944 2021-01-17 19:22 /user

```{note}
Alternatively you can use `copyFromLocal` as well.
```

In [None]:
%%sh

hdfs dfs -rm -R -skipTrash /user/${USER}/retail_db

In [27]:
%%sh

hdfs dfs -mkdir /user/${USER}/retail_db

In [28]:
%%sh

hdfs dfs -ls /user/itversity/retail_db/

In [29]:
%%sh

hdfs dfs -copyFromLocal /data/retail_db/* /user/${USER}/retail_db

In [30]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db

Found 6 items
drwxr-xr-x   - itversity students          0 2021-01-17 19:24 /user/itversity/retail_db/categories
drwxr-xr-x   - itversity students          0 2021-01-17 19:24 /user/itversity/retail_db/customers
drwxr-xr-x   - itversity students          0 2021-01-17 19:24 /user/itversity/retail_db/departments
drwxr-xr-x   - itversity students          0 2021-01-17 19:24 /user/itversity/retail_db/order_items
drwxr-xr-x   - itversity students          0 2021-01-17 19:24 /user/itversity/retail_db/orders
drwxr-xr-x   - itversity students          0 2021-01-17 19:24 /user/itversity/retail_db/products


```{note}
We can also use this alternative approach to directly copy the folder `/data/retail_db` to `/user/${USER}/retail_db`. Let us first delete `/user/${USER}/retail_db` using `skipTrash`.
```

In [None]:
%%sh

hdfs dfs -rm -R -skipTrash /user/${USER}/retail_db

```{note}
We can specify the target location as `/user/${USER}`. It will create the retail_db folder and its contents.
```

In [33]:
%%sh

hdfs dfs -put /data/retail_db /user/${USER}

In [34]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db

Found 6 items
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/categories
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/customers
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/departments
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/order_items
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/orders
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/products


* If we try to run `hdfs dfs -put /data/retail_db /user/${USER}` again it will fail as the target folder already exists.

In [35]:
%%sh

hdfs dfs -put /data/retail_db /user/${USER}

put: `/user/itversity/retail_db/categories/part-00000': File exists
put: `/user/itversity/retail_db/customers/part-00000': File exists
put: `/user/itversity/retail_db/departments/part-00000': File exists
put: `/user/itversity/retail_db/order_items/part-00000': File exists
put: `/user/itversity/retail_db/orders/part-00000': File exists
put: `/user/itversity/retail_db/products/part-00000': File exists


CalledProcessError: Command 'b'\nhdfs dfs -put /data/retail_db /user/${USER}\n'' returned non-zero exit status 1.

* We can use `-f` as part of `put` or `copyFromLocal` to replace existing folder.

In [36]:
%%sh

hdfs dfs -put -f /data/retail_db /user/${USER}

In [37]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db

Found 6 items
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/categories
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/customers
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/departments
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/order_items
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/orders
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/products


In [38]:
%%sh

hdfs dfs -ls -R /user/${USER}/retail_db

drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/categories
-rw-r--r--   2 itversity students       1029 2021-01-17 19:25 /user/itversity/retail_db/categories/part-00000
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/customers
-rw-r--r--   2 itversity students     953719 2021-01-17 19:25 /user/itversity/retail_db/customers/part-00000
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/departments
-rw-r--r--   2 itversity students         60 2021-01-17 19:25 /user/itversity/retail_db/departments/part-00000
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/order_items
-rw-r--r--   2 itversity students    5408880 2021-01-17 19:25 /user/itversity/retail_db/order_items/part-00000
drwxr-xr-x   - itversity students          0 2021-01-17 19:25 /user/itversity/retail_db/orders
-rw-r--r--   2 itversity students    2999944 2021-01-17 19:25 /user