Overview
--------

_Readings: The [Appendix A of Learn Python the Hard Way](http://learnpythonthehardway.org/book/appendixa.html) also discusses the material below._

Modern data science is impossible without some understanding of the Unix command line.  Unix is a family of computer operating systems including the Mac’s OS X and Linux (technically, Linux is a Unix clone); Windows has also Unix emulators, which allow running Unix commands.  In our class, we use the Linux (specifically, the Ubuntu distribution), running on the Amazon EC2 cloud infrastructure.

This document is a tutorial in some of the basic unix command-line utilities used for data gathering, searching, cleaning and summarizing. Generally, unix commands are very efficient, and can be used to process data that is quite large, beyond what can be loaded into your computer’s main memory, and can easily handle workloads far exceeding the capabilities of tools like Excel. We will start by covering various Unix tools early on, showing how we can combine and pipeline their output, using pipes, filters, and redirection. 

Command-line Utilities
----------------------

This section gives some crucial unix utilities. This list is by no means exhaustive, and the ordering is not perfect; different tasks have different demands. Fortunately, unix has been around for a while and has an extremely active user base, developing a wide range of utilities for common data processing, networking, system management, and automation tasks.

Once you are familiar with programming, you will be able to write your own scripts in Python that can perform tasks which you are unable to accomplish using existing unix utilities. The tradeoff between writing hand-coded scripts and existing unix utilities is an increase in flexibility at the expense of increased development time, and therefore a reduction in the speed of iteration.

Once you have access to the terminal in your machine, try it out! Let's start:

(_**Note**: In IPython, to call a command line script, you add an exclamation mark before the command. That's why you will see all the commands in this notebook being preceded by a `!` character._)

### `pwd`

Prints the current directory. Type `pwd` in the shell prompt. This will tell you your current directory. 

In [None]:
!pwd

### `ls`

Lists the contents of a directory or provide information about the specified file. Typical usage: 

`ls [options] [files or directories]`

If you want to know the contents of this directory, type `ls -A`. 

In [None]:
!ls

Let's try now to execute ls with a different set of option, to print the contents of all the folders under the current one:

In [None]:
!ls -R



By default, ls simply lists the contents of the current directory. There are several options that when used in conjunction with ls give more detailed information about the files or directories being queried. Here are a sample:

+ `-A`: list all of the contents of the queried directory, even hidden files.
+ `-l`: detailed format, display additional info for all files and directories.
+ `-R`: recursively list the contents of any subdirectories.
+ `-t`: sort files by the time of the last modification.
+ `-S`: sort files by size.
+ `-r`: reverse any sort order.
+ `-h`: when used in conjunction with `-l`, gives a more human-readable output.

### `cd`

Change the current directory. Usage: 

`cd [directory to move to]`

For example, to change to the `/home/ubuntu` directory:

In [None]:
!cd /home/ubuntu

If we want to run two commands in a row, we separate them using the `;` character. For example, to change to a directory and show its contents:

In [None]:
!cd /home/ubuntu; ls -l

### `mkdir`

Creates a new folder. For example, to create a new folder named `DealingWithData` under the current folder, we type:


In [None]:
!mkdir DealingWithData
!ls -lA

### `rmdir` 

Removes a folder. (The folder must be empty for the command to succeed.)

In [None]:
!rmdir DealingWithData

### `cp` 

Copies a file. Usage:

`cp [source file] [destination file]`

It can also be used to copy multiple files into a directory.

`cp [source file1] [source file2] ... [destination directory]`

For example, to copy the file 'A-Basic_Unix_Shell_Commands.ipynb' and name the file NotebookA.ipynb

In [None]:
!cp A-Basic_Unix_Shell_Commands.ipynb NotebookA.ipynb
!ls -l 

Or we can copy the file to another folder. For example, the following command copies the file `A-Basic_Unix_Shell_Commands.ipynb` to folder `DealingWithData` and names the new file `NotebookA.ipynb`

In [None]:
!mkdir DealingWithData
!cp A-Basic_Unix_Shell_Commands.ipynb DealingWithData/NotebookA.ipynb
!ls -lA DealingWithData

### `rm` 

The `rm` command is used to delete a file.

rm -r : deletes a folder, recursively

In [None]:
!rm DealingWithData/NotebookA.ipynb
!rm NotebookA.ipynb

In [None]:
#clean up
!rmdir DealingWithData




### `mv`

The `mv` command is similar to `cp` but it moves the file instead of just copying it. Effectively it performs a `cp` command, followed by an `rm` for the original file

### `man` 

The `man` command shows the instructions (manual) for that command. For example, if we want to see all the options for the command `ls`, we type:

In [None]:
!man ls

### `date`

The `date` commands prints the current date

In [None]:
!date

### `logout` 

The logout command logs you out of the shell.

## Exercise

* Create two new directories, dir1 and dir2 with the mkdir command. 
* Use ls to confirm
* Copy the file /home/ubuntu/data/titanic.xls to dir1 and name it file1.xls
* Copy the file /home/ubuntu/data/imdb.sql to dir2 and name it file2.sql
* Move each file to the other directory (file1.xls to dir2 and file2.sql to dir1) with the mv command
* Delete both directories with the rm -r command


In [None]:
# your code here