# Demo 2

# HDFS Commands

You can interact with the Hadoop Distributed File System (HDFS) invoking
```bash
hdfs <command> <command-options>
```

In [None]:
!hdfs version

In [None]:
!hdfs --help

## getconf

In [None]:
!hdfs getconf 

### Listing namenodes

In [None]:
!hdfs getconf -namenodes

### Listing Secondary NameNodes

In [None]:
!hdfs getconf -secondaryNameNodes

# Example dataset #1

In this demo we will be using a dataset that contains social networking, tagging, and music artist listening information 
from a set of 2K users from **[Last.fm](http://www.last.fm)** online music system. 

**[Last.fm Dataset](https://grouplens.org/datasets/hetrec-2011/):** 

* 1892 users; 
* 17632 artists; 
* 12717 bi-directional user friend relations, i.e. 25434 (user_i, user_j) pairs; 
* 92834 user-listened artist relations, i.e. tuples [user, artist, listeningCount]; 
* 11946 tags; 186479 tag assignments (tas), i.e. tuples [user, tag, artist]. 
* Last Update May, 2011.
         

## Understanding the dataset

Access http://files.grouplens.org/datasets/hetrec2011/hetrec2011-lastfm-readme.txt in order to know better the dataset

## Downloading the dataset:

In [None]:
!wget http://files.grouplens.org/datasets/hetrec2011/hetrec2011-lastfm-2k.zip -q --show-progress
!mkdir dataset-lastfm && unzip hetrec2011-lastfm-2k.zip -d dataset-lastfm
!rm hetrec2011-lastfm-2k.zip

In [None]:
!ls dataset-lastfm

In [None]:
!tail dataset-lastfm/artists.dat

## dfs

### Listing the content of a directory

In [None]:
!hdfs dfs -ls /

In [None]:
!hdfs dfs -ls /user

### Creating a directory

In [None]:
!hdfs dfs -mkdir /user/theo

In [None]:
!hdfs dfs -ls /user

### Copying data to HDFS

In [None]:
!hdfs dfs -ls /user/theo

#### put

In [None]:
!hdfs dfs -put dataset-lastfm/artists.dat /user/theo

In [None]:
!hdfs dfs -ls /user/theo

<img src= "resources/images/fileinfo.png" width="55%">

#### copyFromLocal

In [None]:
!hdfs dfs -copyFromLocal dataset-lastfm/user_artists.dat /user/theo

In [None]:
!hdfs dfs -ls /user/theo

#### appendToFile

Creating to files (list1 and list2):

In [None]:
!touch list1.txt
!echo "item1" >  list1.txt
!echo "item2" >> list1.txt
!echo "item3" >> list1.txt
!cat list1.txt

In [None]:
!touch list2.txt
!echo "item4" >  list2.txt
!echo "item5" >> list2.txt
!echo "item6" >> list2.txt
!cat list2.txt

In [None]:
!hdfs dfs -appendToFile list1.txt list2.txt  /user/theo/full-list.txt

In [None]:
!hdfs dfs -ls /user/theo

### Getting the content of a file

#### cat

In [None]:
!hdfs dfs -cat /user/theo/full-list.txt

#### tail

In [None]:
!hdfs dfs -tail /user/theo/artists.dat

### Creating a empty file

`touchz` creates a file of zero length. An error is returned if the file exists with non-zero length.


In [None]:
!hdfs dfs -touchz /user/theo/newfile.txt

In [None]:
!hdfs dfs -ls /user/theo/

### Copying data from HDFS to local system

In [None]:
!hdfs dfs -get /user/theo/newfile.txt newfile.txt

In [None]:
!ls

### Merging files

In [None]:
!hdfs dfs -put dataset-lastfm/tags.dat /user/theo

`getmerge` - Takes a source directory and a destination file as input and concatenates files in src into the destination local file


In [None]:
!hdfs dfs -getmerge /user/theo/tags.dat /user/theo/artists.dat artist-tags.txt

In [None]:
!tail artist-tags.txt

### Verifiying replication factory

In [None]:
!hdfs dfs -stat %r /user/theo/artists.dat

### Changing replication factory

In [None]:
!hdfs dfs -setrep 3 /user/theo/artists.dat

In [None]:
!hdfs dfs -ls /user/theo/artists.dat

In [None]:
!hdfs dfs -stat %r /user/theo/artists.dat

### Deleting a file

In [None]:
!hdfs dfs -ls /user/theo

In [None]:
!hdfs dfs -rm /user/theo/tags.dat

In [None]:
!hdfs dfs -ls /user/theo

In [None]:
!hdfs dfs -rm /user/theo/*

In [None]:
!hdfs dfs -ls /user/theo

### Deleting a directory

In [None]:
!hdfs dfs -ls /user

In [None]:
!hdfs dfs -rmdir /user/theo/

In [None]:
!hdfs dfs -ls /user

### Getting help

In [None]:
#usage - Return the help for an individual command
!hdfs dfs -usage chmod

In [None]:
!hdfs dfs -help

## More commands

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html