# Linux

<a href="#Overview">Overview of Linux and Virtual Machine</a>  
<a href="#Navigating">Navigating with Linux</a>  
<a href="#Pipes-and-Working-With-Files">Pipes and Working With Files</a>  



### Overview 

Hadoop is native to Unix / Linux, uses similar commands as Unix  

Linux uses hierarchical structure  
`/` Top level, root  
`/home/cloudera` where the files will be kept that we use

Terminal is where you type your commands  
Shell is the interactive environment that takes your commands and runs them (eg Bash)  

Absolute paths starts with /  
Relative paths starts with someone other than /, such as cd ../cloudera

**Tools**  
gedit = pretty user friendly  
vi = more powerful, steeper learning curve  

**Job Control** (multitask)  
`&` runs something in the background (eg ping google.com)  
`ps` lists the processes that are running in background   
`sudo` lets you change things as an admin

**3 components of Linux commands**  
1) The command itself  
2) The options (short form, long form, etc)    
3) The arguments (file name, text, etc)

---

## Navigating

**Useful commands:**  


#### Definitions
`.` means current folder  
`pwd` shows working directory  
`ls` lists contents of the current directory  
`cd` change directory  
`mv` move  
`cp` copy  
`rm` remove

#### Listing
`ls /` list files in the root directory  
`ls -l` list files and tell type  
`ls /vagrant` list files in the vagrant folder    
`ls -lR` list files recurrsively in the long form  

#### Keyboard Shortcuts
ctrl+u  = delete current line  
ctrl+a  = move to beginning of line  
ctrl+e  = move to end of line  
ctrl+y  = paste  
`clear` clear the screen

#### Changing directory
`cd /` changes the current directory  
`cd ~` change dir to home  
`mkdir dir1`   create a new directory  
`cd ../../etc/X11` relative path


---

#### Copy, move, rename, remove

**Copy**  
`cp file1 file2` save as  
`cp file1 dir1` copy and move  
`cp README dir1/readme2` copy and move and rename

`cp loyalty_data.txt /vagrant` copy file to folder  
`cp -i file1 file2` copy interactively  
`cp ADIR/data/*.txt data.dir` copy all text files from a folder

**Move**  
`mv file1 file2` rename or replace  
`mv file1 dir1` move
 
 **Remove**  
`rm file1` remove  
`rm -r dir1` removes directory  
`/vagrant rm loyalty_data.txt` delete file from folder

**Find files**  
`find . -name "test*"` find a file that starts with 'test' in the current folder  
`find ~/dir1 -name "*test*"` find a file in the given folder

**Wildcards**  
`ls g*.txt`        files that start with g  
`ls g??.txt`       files that start with g and are 3 characters long  
`rm data[1-9].txt` remove all data from 1 to 9


---

## Working with Files

**Large text files**  
`less`        lets you interact  
`head/tail`   first or last 10 lines  
`cat`         page by page display  
`grep`        print a line based on conditions  
$\quad$ `grep "word" filename`  
$\quad$ `cat filename | grep "word"`

`cat ratings_2013.txt` display the content of fratings_2013.txt on screen  

`tail -n 20 ratings_2013.txt` display the last 20 rows of the file

`grep -vi "the" ratings_2013.txt` find rows that does not contain the word "the" (case insensitive)  

`grep "the" ratings_2013.txt` find rows that contain the word "the" 

---

### Pipes and Working With Files

**I/O Redirection and Pipes**  
Output of one command can be input for another command  
`ls -1 > file.txt` takes the result from list and stores in file.txt  
`sort <- file.txt` takes the existing file and sorts  
`cat file | more` lets you see screen by screen, instead of all at once  
`grep -i "the" filename | less` easier to interact with  
`|` connects commands

---

**Manipulate text files**  
`wc` = prints out lines, words, bytes  
`wc -l` = prints number of lines  
`wc ad_data1.txt -l` number of lines in file  
`sort`  
`sed` = for each line, perform some action (search and replace)  
`awk` = print out 
  
  
**Good examples**  
`grep -i 'GOLD' loyalty_data.txt | wc -l` count the number of records with the word gold (case sensitive)  

 
`head -n 100 latlon.tsv > samples/latlon100.tsv` take a 100 records as a sample 


`sed "s/\t/|/g" latlon100.tsv > latlon100.txt` search \t and replace it with |, use g to replace all \t in the row.  

find a folder name called data_mgmt  
`cd training_materials  
find . -name "data_mgmt*"`

---




## 2. View/handle text files

In [None]:
# display the file content interactively on screen  
less ad_data1.txt 


# take first 100 rows from `ad_data1.txt` and save it as `ad500.txt`  
head -n 100 ad_data1.txt > ad500.txt


# from the file `ad_data1.txt`, find lines with "REVIEW" in the text, viewing results screen by screen   
`grep "REVIEW" ad_data1.txt | less`


# find how many lines with REVIEW in `ad_data1.txt`  
`grep "REVIEW" ad_data1.txt | wc -l`