## Managing Files in HDFS

Let us understand how to manage files in HDFS. We will see how to create directory, copy files, delete files as well as dropping directories.

* As developers, it is very important to understand how to manage files in HDFS.
* First let us understand the user space that is provided to each user in our labs. You can also replace `${USER}` with the OS user you have logged in.
* You also need to make sure to understand the difference between Linux local file system and HDFS.

In [None]:
%%sh

hdfs dfs -ls /user/${USER}

* Here is how you can get list of supported HDFS commands. Out of all the commands we typically use the following.
  * `help` to get syntax and semantics of a sub command.
  * `mkdir` to create directory.
  * `copyFromLocal` or `put` to copy files from local file system to HDFS
  * `copyToLocal` or `get` to copy files from HDFS to local file system.
  * `rm` to delete files or directories.
  * `cp` to copy files from one HDFS location to another HDFS location.
  * `mv` to move files from one HDFS location to another HDFS location. We can also use this to rename the files.

In [None]:
%%sh

hdfs dfs -help

* Let us create directory to store all the folders and files related to HDFS under user space.

In [None]:
%%sh

hdfs dfs -mkdir /user/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -ls /user/${USER}

* Let us copy files into the newly created folder - `/user/${USER}/retail_db`.
  * We can use either `put` or `copyFromLocal` for the same.
  * Our source data is in local file system under `/data/retail_db`.
  * As the `/user/${USER}/retail_db` is already created - make sure to specify `/data/retail_db/*` to copy files under the folder with out copying the `retail_db` folder under `/user/${USER}/retail_db`.

In [None]:
%%sh

hdfs dfs -help put

In [None]:
%%sh

hdfs dfs -help copyFromLocal

```{warning}
This will copy the entire folder to `/user/${USER}/retail_db` and you will see `/user/${USER}/retail_db/retail_db`. You can use the next command to get files as expected.
```

In [None]:
%%sh

hdfs dfs -put /data/retail_db /user/${USER}/retail_db

```{note}
Alternatively you can use `copyFromLocal` as well.
```

In [None]:
%%sh

hdfs dfs -copyFromLocal /data/retail_db /user/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db

```{note}
Let's drop this folder and make sure files are copied as expected.
```

In [None]:
%%sh

hdfs dfs -help rm

In [None]:
%%sh

hdfs dfs -rm -R /user/itversity/retail_db/retail_db

In [None]:
%%sh

hdfs dfs -ls /user/itversity/retail_db/

In [None]:
%%sh

hdfs dfs -put /data/retail_db/* /user/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -ls /user/itversity/retail_db/

```{note}
We can also use this alternative approach to directly copy the folder `/data/retail_db` to `/user/${USER}/retail_db`. Let us first delete `/user/${USER}/retail_db` using `skipTrash`.
```

In [None]:
%%sh

hdfs dfs -rm -R -skipTrash /user/itversity/retail_db

```{note}
We can specify the target location as `/user/${USER}`. It will create the retail_db folder and its contents.
```

In [None]:
%%sh

hdfs dfs -put /data/retail_db /user/${USER}

* If we try to run `hdfs dfs -put /data/retail_db /user/${USER}` again it will fail as the target folder already exists.

In [None]:
%%sh

hdfs dfs -put /data/retail_db /user/${USER}

* We can use `-f` as part of `put` or `copyFromLocal` to replace existing folder.

In [None]:
%%sh

hdfs dfs -put -f /data/retail_db /user/${USER}

In [None]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db