## Managing HDFS Directories

Now let us have a look at how to create directories and manage ownership.

* By default hdfs is superuser of HDFS
* `hadoop fs -mkdir` or `hdfs dfs -mkdir` – to create directories
* `hadoop fs -chown` or `hdfs dfs -chown` – to change ownership of files
* `chown` can also be used to change the group. We can change the group using `-chgrp` command as well. Make sure to run `-help` on chgrp and check the details.
* Here are the steps to create user space. Only users in HDFS group can take care of it.
  * Create directory with user id `spark` under /user
  * Change ownership to the same name as the directory created earlier (/user/spark)
  * You can validate permissions by using `hadoop fs -ls` or `hdfs dfs -ls` command on /user. Make sure to grep for the user name you are looking for.
* Let's go ahead and create user space in HDFS for `spark`. I have to login as sudoer and run below commands.

```shell
sudo -u hdfs hdfs dfs -mkdir /user/spark
sudo -u hdfs hdfs dfs -chown -R spark:students /user/spark
hdfs dfs -ls /user|grep spark
```

* You should be able to create folders under your home directory.

In [13]:
%%sh

hdfs dfs -ls /user/`whoami`

Found 1 items
drwxr-xr-x   - spark supergroup          0 2022-05-29 17:08 /user/spark/.sparkStaging


In [14]:
%%sh

hdfs dfs -mkdir /user/`whoami`/retail_db

In [15]:
%%sh

hdfs dfs -ls /user/`whoami`

Found 2 items
drwxr-xr-x   - spark supergroup          0 2022-05-29 17:08 /user/spark/.sparkStaging
drwxr-xr-x   - spark supergroup          0 2022-05-29 17:20 /user/spark/retail_db


* You can create the directory structure using `mkdir -p`. The existing folders will be ignored and non existing folders will be created.
  * Let us run `hdfs dfs -mkdir -p /user/${USER}/retail_db/orders/year=2020`.
  * As `/user/${USER}/retail_db` already exists, it will be ignored.
  * Both `/user/${USER}/retail_db/orders` as well as `/user/${USER}/retail_db/orders/year=2020` will be created.

In [16]:
%%sh

hdfs dfs -help mkdir

-mkdir [-p] <path> ... :
  Create a directory in specified location.
                                                  
  -p  Do not fail if the directory already exists 


In [19]:
%%sh

hdfs dfs -ls -R /user/`whoami`/retail_db

In [20]:
%%sh

hdfs dfs -mkdir -p /user/`whoami`/retail_db/orders/year=2020

In [21]:
%%sh

hdfs dfs -ls -R /user/`whoami`/retail_db

drwxr-xr-x   - spark supergroup          0 2022-05-29 17:20 /user/spark/retail_db/orders
drwxr-xr-x   - spark supergroup          0 2022-05-29 17:20 /user/spark/retail_db/orders/year=2020


* We can delete non empty directory using `hdfs dfs -rm -R` and empty directory using `hdfs dfs -rmdir`. We will explore `hdfs dfs -rm` in detail later.

In [22]:
%%sh

hdfs dfs -help rmdir

-rmdir [--ignore-fail-on-non-empty] <dir> ... :
  Removes the directory entry specified by each directory argument, provided it is
  empty.


In [23]:
%%sh

hdfs dfs -rmdir /user/`whoami`/retail_db/orders/year=2020

In [24]:
%%sh

hdfs dfs -rm /user/`whoami`/retail_db

rm: `/user/spark/retail_db': Is a directory


CalledProcessError: Command 'b'\nhdfs dfs -rm /user/`whoami`/retail_db\n'' returned non-zero exit status 1.

In [25]:
%%sh

hdfs dfs -rmdir /user/`whoami`/retail_db

rmdir: `/user/spark/retail_db': Directory is not empty


CalledProcessError: Command 'b'\nhdfs dfs -rmdir /user/`whoami`/retail_db\n'' returned non-zero exit status 1.

In [26]:
%%sh

hdfs dfs -rm -R /user/`whoami`/retail_db

Deleted /user/spark/retail_db


In [27]:
%%sh

hdfs dfs -ls /user/`whoami`

Found 1 items
drwxr-xr-x   - spark supergroup          0 2022-05-29 17:08 /user/spark/.sparkStaging
