## Copying files from local to HDFS

We can copy files from local file system to HDFS either by using `copyFromLocal` or `put` command.

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/yLuuQRThYB4?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* `hdfs dfs -copyFromLocal` or `hdfs dfs -put` – to copy files or directories from local filesystem into HDFS. We can also use `hadoop fs` in place of `hdfs dfs`.
* However, we will not be able to update or fix data in files when they are in HDFS. If we have to fix any data, we have to move file to local file system, fix data and then copy back to HDFS.
* Files will be divided into blocks and will be stored on Datanodes in distributed fashion based on block size and replication factor. We will get into the details later.

![test](https://s3.amazonaws.com/kaizen.itversity.com/hadoop-overview/04HDFSAnatomyOfFileWrite.png)

In [None]:
%%sh

hdfs dfs -ls /user/${USER}

In [None]:
%%sh

hdfs dfs -mkdir /user/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -ls /user/${USER}

In [None]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -help put

In [None]:
%%sh

hdfs dfs -help copyFromLocal

```{warning}
This will copy the entire folder to `/user/${USER}/retail_db` and you will see `/user/${USER}/retail_db/retail_db`. You can use the next command to get files as expected.
```

In [None]:
%%sh

ls -ltr /data/retail_db

In [None]:
%%sh

hdfs dfs -put /data/retail_db /user/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db/retail_db

```{note}
Let's drop this folder and make sure files are copied as expected. As the folder is pre-created, we can use patterns to copy the sub folders.
```

In [None]:
%%sh

hdfs dfs -help rm

In [None]:
%%sh

hdfs dfs -rm -R -skipTrash /user/${USER}/retail_db/retail_db

In [None]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db/

In [None]:
%%sh

hdfs dfs -put /data/retail_db/order* /user/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db/

In [None]:
%%sh

hdfs dfs -put -f /data/retail_db/* /user/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db/

In [None]:
%%sh

hdfs dfs -ls -R /user/${USER}/retail_db/

```{note}
Alternatively you can use `copyFromLocal` as well.
```

In [None]:
%%sh

hdfs dfs -rm -R -skipTrash /user/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -mkdir /user/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -ls /user/itversity/retail_db/

In [None]:
%%sh

hdfs dfs -copyFromLocal /data/retail_db/* /user/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db

```{note}
We can also use this alternative approach to directly copy the folder `/data/retail_db` to `/user/${USER}/retail_db`. Let us first delete `/user/${USER}/retail_db` using `skipTrash`.
```

In [None]:
%%sh

hdfs dfs -rm -R -skipTrash /user/${USER}/retail_db

```{note}
We can specify the target location as `/user/${USER}`. It will create the retail_db folder and its contents.
```

In [None]:
%%sh

hdfs dfs -put /data/retail_db /user/${USER}

In [None]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db

* If we try to run `hdfs dfs -put /data/retail_db /user/${USER}` again it will fail as the target folder already exists.

In [None]:
%%sh

hdfs dfs -put /data/retail_db /user/${USER}

* We can use `-f` as part of `put` or `copyFromLocal` to replace existing folder.

In [None]:
%%sh

hdfs dfs -put -f /data/retail_db /user/${USER}

In [None]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -ls -R /user/${USER}/retail_db