# Unix Tools For Data Science
-------------------


## Table of contents
__[1. Introduction](#Introduction)__

__[2. Basic Commands](#Basics)__

__[3. Files (And More Commands)](#Files)__

__[4. Vim (And Why You Sometimes Need It)](#Vim)__

__[5. Working Over A Network](#Network)__

__[6. Working With Git](#Git)__

__[6. More](#Git)__


## Introduction <a class="anchor" id="Introduction"></a>
-------------------

Knowing Unix tools and Bash commands are not the sexiest part of data science and is often the most overlooked skillset.  During my time as a Ph.D. student in Computational Science/Applied Mathematics I picked up a bunch of unix commands that were life savers and I'm going to go over a few here. Learning these skills is definitely can seem a little bit boring, but I cannot emphasize how useful they are. Setting up my unix environment and linking various libraries was one of the most frustrating parts of graduate school, but I believe I am much more productive as a data scientist having learned these valuable lessons.  



Most of the commands and concepts I will be going over don't require any special libraries and when they do I'll provide links to them.  Infact most of the unix commands can be run from <a href="http://jupyter.org/">Jupyter Notebook</a>.  When they don't I will run them from the <a href="https://en.wikipedia.org/wiki/Terminal_(macOS)">Terminal</a> which is the MacOS version of the <a href="https://en.wikipedia.org/wiki/Unix_shell">Unix/Linux shell</a>.


## Basic Commands <a class="anchor" id="Basics"></a>
-------------------

man

history


- **<code> ls </code>**

The first command to learn is <code>ls</code> this lists all the files in the current *directory* (this is just a fancier way of saying "folder"):

In [45]:
ls

[34mDirec1[m[m/           Unix_Tools.ipynb  file1


We see that there is a directory called <code>Direc1/</code> (the forward slash after the name gives away that it is a directory) and two files: this notebook (<code>Unix_Tools.ipynb</code>) as well as a file called <code>file1</code>. We can view the "hidden files" (this will make more sense later) using a **<code>/-a</code>** after the  <code>ls</code>:

In [46]:
ls -a

[34m.[m[m/                  [34m.ipynb_checkpoints[m[m/ Unix_Tools.ipynb
[34m..[m[m/                 [34mDirec1[m[m/             file1


The </code>.ipynb_checkpoints/</code> is a "hidden directory."  The <code>./</code> stands for the current directory (we could also use <code>ls .</code> instead of **<code>ls</code>**). And the <code>../</code> stands for the parent directory (the directory containing this directory). 

We can get more information on the files and directories using the <code>ls -al</code> command (this will give us information on all the files, if we wanted just non hidden ones we would use <code>ls -al</code>): 

In [47]:
ls -al

total 24
drwxr-xr-x   6 Mike  staff   204 Jul 16 20:54 [34m.[m[m/
drwxr-xr-x  14 Mike  staff   476 Jul 16 14:30 [34m..[m[m/
drwxr-xr-x   3 Mike  staff   102 Jul 16 14:03 [34m.ipynb_checkpoints[m[m/
drwxr-xr-x   4 Mike  staff   136 Jul 16 20:45 [34mDirec1[m[m/
-rw-r--r--   1 Mike  staff  9944 Jul 16 20:52 Unix_Tools.ipynb
-rw-r--r--   1 Mike  staff     0 Jul 16 20:54 file1


Each row now corresponds to the a file or directory and we have information on the permisions for the file/directory, number of links, owner name, group name, number of bytes in the file, abbreviated month, day-of-month file was last modified, hour file last modified, minute file last modified, and the pathname/file name. 

 We can also view the path to the current directory using,

- ** <code> pwd </code> **

In [22]:
pwd

u'/Users/Mike/Documents/DS_Projects/Unix_Tools'

You can see that in jupyter notebooks this is returned as unicode. We can also use <code>ls</code> to view the contents of other directories than our current one.  We can see the contents of <code>Direc1/</code> by typing:

In [23]:
ls Direc1

Nothing happened here because that directory is emtpy.  We can move the file <code>file1</code> into <code>Direc1/</code> by using the command

- ** <code> mv </code> **

In [24]:
mv file1 Direc1/

We can now see the contents of <code>Direc1</code> again to see that the file has moved there:

In [31]:
ls Direc/

ls: Direc/: No such file or directory


We can then go into to <code>Direc1</code> by using,

- **<code>cd</code>** 

which stands for "change directory,"

In [32]:
cd Direc1/

/Users/Mike/Documents/DS_Projects/Unix_Tools/Direc1


We can also use <code>mv</code> to *change the name of a file or directory*:

In [34]:
mv file1 file2

In [37]:
ls 

file2


We can copy the contents of <code>file2</code> into a new file, <code>file1</code> using the command,

- **<code>cp</code>**

In [38]:
cp file2 file1

We can can then go back to the the parent (original) directory using,

In [48]:
cd ..

/Users/Mike/Documents/DS_Projects


We can the process that are running in this directory,

- **</code>ps</code>**

This one we will have to use in the terminal,

![](images/ps.png)


The <code>PID</code> is the process id and is important because we can use it to help us <a> href="https://en.wikipedia.org/wiki/Kill_(command)">kill</a> the *process* or *command* if we need to. The <code>Time</code> is how long the process has been running and <code>CMD</code> is the name of the *command* or process that is running.  The <code>TTY</code> isn't something that I have ever have had to use.

We can also use the command,

- **<code>top</code>**

To see all the processes running on our computer, the results in from my terminal are below,

![](images/top.png)

As you can see theres a lot more information that is provided by <code>top</code> including the ammount of memory each process is using.  One tool I liked using in graduate school is called, 

- **<a href="http://hisham.hm/htop/">htop</a>**

which provides an interactive version of of <code>top</code>.  I liked it because when writing multi-threaded applications you can see directly how much work each core/thread is using (you can get a similar effect using  <code>top</code> by pressing <code>1</code> while <code>top</code> is running.  An example on my computer of the results of <code>htop</code> are shown below,

![](images/htop.png)


The last two basic commands I'll mention are

- **<code>history</code> **

which shows use a list of all the commands you have used recently.  As well as,


- ** <a href="https://en.wikipedia.org/wiki/Man_page">man</a> **

which can be used to show the manual page of specific unix commands. 

Now that we have on the basics of unix commands we can move on to dealing with directories and files more concretely.

## Files (And More Comands) <a class="anchor" id="Files"></a>
-------------------

mkdir
rm -rf
touch
rm
less and (>)
chmod


## Vim (And Why You Sometimes Need It) <a class="anchor" id="Vim"></a>
-------------------

vi
i
?
control-n
w
wq
q!
vimrc


## Working Over A Network <a class="anchor" id="Network"></a>
-------------------

ssh
tar
sftp
scp




## Working With Git <a class="anchor" id="Git"></a>
-------------------

## More <a class="anchor" id="More"></a>
Brew, Anaconda
Bash Scripts
Environement Variables
Paths
bashrc (.profile)
CMake
man
help