## **DSTEP20 // UNIX commands, loops, and functions introduction**
<small> February 5, 2020 </small>

This notebook will go through the basica of unix commands as well as examples of loops and user-defined functions in python.


---

In [0]:
from google.colab import drive
drive.mount("/content/drive")

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


## **The unix terminal**

Unix is one of the oldest and most barebones Operating Systems (OS), but it serves as the platform for more advanced OSs like Mac OS and some modes of Windows.

Unix and unix-based OSs like Mac OS, have a powerful utility called the "Terminal", that allows for a "command line interface" for interacting with your computer (Windows has something similar, but the syntax is different).

Jupyter allows us to run some of these commands **as if we were in a terminal**.  Let's start by exploring the `ls` command:

In [0]:
ls

[0m[01;34mdrive[0m/  [01;34msample_data[0m/


Some of the most common and useful are:
* ls - list the files in the current directory
* cp - copy one file to another
* cd - change the directory
* mv - rename (move) a file
* mkdir - make a directory
* rm - delete a file
* rmdir - delete an empty directory

Bear in mind **these are <u>not</u> python commands**, but jupyter understands them.  Here are some examples creating directories, copying files, and changing directories:


These unix commands can also be combined with some python syntax using the os module:

In [0]:
# -- import the os module for operating system interaction
import os

In [0]:
# -- create a string that contains the command you want (make a directory)
cmd = "mkdir temp"
# -- execute the command with os.system
os.system(cmd)

0

We'll see how this can be put to good use below.  But bear in mind that a cell can be treated either as a command line **OR** a Juptyer cell, but not both:

In [0]:
ls

[0m[01;34mdrive[0m/  [01;34msample_data[0m/  [01;34mtemp[0m/


## **The `wget` command**

When dealing with data on the web, the wget command can be very useful for quickly grabbing data that you want.  Let's get the zipcode business patterns for 2016 (note the exclamation point before the wget command):

In [0]:
!wget https://www2.census.gov/programs-surveys/cbp/datasets/2016/zbp16detail.zip

--2020-02-05 20:17:51--  https://www2.census.gov/programs-surveys/cbp/datasets/2016/zbp16detail.zip
Resolving www2.census.gov (www2.census.gov)... 104.100.67.107, 2600:1409:d000:5a5::208c, 2600:1409:d000:5a9::208c
Connecting to www2.census.gov (www2.census.gov)|104.100.67.107|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘zbp16detail.zip’

zbp16detail.zip         [          <=>       ]  27.51M   792KB/s    in 36s     

2020-02-05 20:18:27 (793 KB/s) - ‘zbp16detail.zip’ saved [28843251]



Now, what if I wanted them for 2008?  I could guess the following:

In [0]:
!wget https://www2.census.gov/programs-surveys/cbp/datasets/2008/zbp08detail.zip

--2020-02-05 20:19:19--  https://www2.census.gov/programs-surveys/cbp/datasets/2008/zbp08detail.zip
Resolving www2.census.gov (www2.census.gov)... 104.91.176.162, 2600:1409:5000:388::208c, 2600:1409:5000:399::208c
Connecting to www2.census.gov (www2.census.gov)|104.91.176.162|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘zbp08detail.zip’

zbp08detail.zip         [        <=>         ]  15.77M   734KB/s    in 23s     

2020-02-05 20:19:43 (699 KB/s) - ‘zbp08detail.zip’ saved [16540860]



And now, as above, let's put this into some python syntax: 

In [0]:
# -- define the command (note, *no* ! is needed now)
cmd = "wget https://www2.census.gov/programs-surveys/cbp/datasets/2002/zbp02detail.zip"

# -- execute the command
res = os.system(cmd)

In [0]:
ls

[0m[01;34mdrive[0m/  [01;34msample_data[0m/  [01;34mtemp[0m/  zbp02detail.zip  zbp08detail.zip  zbp16detail.zip


Let's delete those files

In [0]:
rm zbp*.zip

In [0]:
ls

[0m[01;34mcollab[0m/  [01;34mcollab_e[0m/  [01;34mcollab_py[0m/  [01;34mdrive[0m/  [01;34msample_data[0m/  [01;34mtemp[0m/


## **`for` loops in python**

In [0]:
mkdir collab

In [0]:
!mkdir collab_e

In [0]:
os.system('mkdir collab_py')

0

In [0]:
!ls -la

total 36
drwxr-xr-x 1 root root 4096 Feb  5 20:49 .
drwxr-xr-x 1 root root 4096 Feb  5 20:08 ..
drwxr-xr-x 2 root root 4096 Feb  5 20:48 collab
drwxr-xr-x 2 root root 4096 Feb  5 20:48 collab_e
drwxr-xr-x 2 root root 4096 Feb  5 20:49 collab_py
drwxr-xr-x 1 root root 4096 Jan 31 17:11 .config
drwx------ 5 root root 4096 Feb  5 20:14 drive
drwxr-xr-x 1 root root 4096 Jan 30 17:25 sample_data
drwxr-xr-x 2 root root 4096 Feb  5 20:16 temp


In python `for` loops allow us to repeat a task many times to save us on some typing:

In [0]:
import numpy as np

for ii in np.arange(10):
  if (ii // 2) == 0:
    print(ii)
  else:
    print('Nope')

0
1
Nope
Nope
Nope
Nope
Nope
Nope
Nope
Nope


`for` loops can be combined with "`if/else` statements" to provide some interactivity:

In [0]:
!cat htop

)07[?47h[1;24r[m[4l[?1h=[m[?1000h[m[m[H[2J[1B  [36m1  [m[1m[[30m[40m                          0.0%[m][m   [36mTasks: [1m14[m[36m, [32m[1m52[m[32m thr[36m; [32m[1m1[m[36m running[3;3H2  [m[1m[[30m[40m                          0.0%[m][m   [36mLoad average: [m[1m0.00 [36m0.00 [m[36m0.00 [4;3HMem[m[1m[[m[32m||[34m|[33m||||||[30m[40m[1m           566M/12.7G[m][m   [36mUptime: [1m01:27:41[5;3H[m[36mSwp[m[1m[[30m[40m                         0K/0K[m][2B[m[30m[42m    PID USER      PRI  NI  VIRT   RES   SHR S [30m[46mCPU% [30m[42mMEM%   TIME+  Command        [8;1H[30m[46m      1 root       20   0 39192  6200  4708 S  0.0  0.0  0:00.04 /bin/bash -e /d[9;6H[m[m10 root[7C20   0 [36m 672M 55[m[m368 [36m24[m[m800 S  0.0  0.4  0:00.07 [32m/tools/node/bin[10;6H[m[m11 root[7C20   0 [36m 672M 55[m[m368 [36m24[m[m800 S  0.0  0.4  0:00.08 [32m/tools/node/bin[11;6H[m[m12 root[7C20   0 [36m 672M 55[m

Now we can see the utility of combining unix commands with python to make our lives much easier.  First notice this:

In [0]:
# -- define a command to make a directory in your gdrive called zbp
cmd_mkdir = 

# -- run that command
res = 

# -- check if the command worked
if res == 0:
  print("directory created")
else:
  print("failed to create directory!")

Now get all ZBP data from 2010 to 2015:

In [0]:
# -- create a directory in gdrive

# -- loop through the years for which we want the zbp data
for ii in range(10, 15):

  # -- turn ii into a string
  snum = 
  
  # -- make the command strings
  cmd_wget = 
  cmd_mv = 

  # -- let me know what is happening  
  print("downloading year {0}".format(snum))
  
  res = os.system(cmd_wget)
  if res != 0:
    print("failed to download {0}!!!".format(snum))
  
  print("moving file...")
  
  res = os.system(cmd_mv)
  if res != 0:
    print("failed to move file!!!")


## **Functions in python**

So far, we've used many, many functions in python: `.mean()`, `.min()`, `.max()`, `print()`, `np.unique()`, `pd.DataFrame`, etc.  But what if you want to do something many times that is **not** a function that exists?  How about adding strings of numbers?

Now let's go all the way back to our Delaware Natural Areas example:

In [0]:
import pandas as pd

# -- set the file name
fname = "https://data.delaware.gov/api/views/9be9-z9z2/rows.csv?accessType=DOWNLOAD"

# -- load the data
natarea = pd.read_csv(fname)

Recall how we split the "SHAPE" column into latitude and longitude:

Now let's make a function to do that (**and** convert to float) for a given entry (including a "docstring"):

Notice also the "docstring":

Pandas actually allows us to use this syntax in very powerful ways via the `.apply` method: