## **DSTEP20 // UNIX commands, loops, and functions introduction**
<small> February 3, 2020 </small>

This notebook will go through the basica of unix commands as well as examples of loops and user-defined functions in python.


---

In [0]:
from google.colab import drive
drive.mount("/content/drive")

## **The unix terminal**

Unix is one of the oldest and most barebones Operating Systems (OS), but it serves as the platform for more advanced OSs like Mac OS and some modes of Windows.

Unix and unix-based OSs like Mac OS, have a powerful utility called the "Terminal", that allows for a "command line interface" for interacting with your computer (Windows has something similar, but the syntax is different).

Jupyter allows us to run some of these commands **as if we were in a terminal**.  Let's start by exploring the `ls` command:

Some of the most common and useful are:
* ls - list the files in the current directory
* cp - copy one file to another
* cd - change the directory
* mv - rename (move) a file
* mkdir - make a directory
* rm - delete a file
* rmdir - delete an empty directory

Bear in mind **these are <u>not</u> python commands**, but jupyter understands them.  Here are some examples creating directories, copying files, and changing directories:


These unix commands can also be combined with some python syntax using the os module:

In [0]:
# -- import the os module for operating system interaction
import os

In [0]:
# -- create a string that contains the command you want (make a directory)

# -- execute the command with os.system


0

We'll see how this can be put to good use below.  But bear in mind that a cell can be treated either as a command line **OR** a Juptyer cell, but not both:

## **The `wget` command**

When dealing with data on the web, the wget command can be very useful for quickly grabbing data that you want.  Let's get the zipcode business patterns for 2016 (note the exclamation point before the wget command):

In [0]:
!wget https://www2.census.gov/programs-surveys/cbp/datasets/2016/zbp16detail.zip

Now, what if I wanted them for 2008?  I could guess the following:

In [0]:
!wget https://www2.census.gov/programs-surveys/cbp/datasets/2008/zbp08detail.zip

And now, as above, let's put this into some python syntax: 

In [0]:
# -- define the command (note, *no* ! is needed now)
cmd = "wget https://www2.census.gov/programs-surveys/cbp/datasets/2002/zbp02detail.zip"

# -- execute the command
res = os.system(cmd)

In [0]:
ls

[0m[01;34mdata[0m/   [01;34msample_data[0m/     zbp08detail.zip  zbp16detail.zip
[01;34mdrive[0m/  zbp02detail.zip  zbp16detail.txt


Let's delete those files

In [0]:
rm zbp*.zip

## **`for` loops in python**

In python `for` loops allow us to repeat a task many times to save us on some typing:

`for` loops can be combined with "`if/else` statements" to provide some interactivity:

Now we can see the utility of combining unix commands with python to make our lives much easier.  First notice this:

In [0]:
# -- define a command to make a directory in your gdrive called zbp
cmd_mkdir = 

# -- run that command
res = 

# -- check if the command worked
if res == 0:
  print("directory created")
else:
  print("failed to create directory!")

Now get all ZBP data from 2010 to 2015:

In [0]:
# -- create a directory in gdrive

# -- loop through the years for which we want the zbp data
for ii in range(10, 15):

  # -- turn ii into a string
  snum = 
  
  # -- make the command strings
  cmd_wget = 
  cmd_mv = 

  # -- let me know what is happening  
  print("downloading year {0}".format(snum))
  
  res = os.system(cmd_wget)
  if res != 0:
    print("failed to download {0}!!!".format(snum))
  
  print("moving file...")
  
  res = os.system(cmd_mv)
  if res != 0:
    print("failed to move file!!!")


## **Functions in python**

So far, we've used many, many functions in python: `.mean()`, `.min()`, `.max()`, `print()`, `np.unique()`, `pd.DataFrame`, etc.  But what if you want to do something many times that is **not** a function that exists?  How about adding strings of numbers?

Now let's go all the way back to our Delaware Natural Areas example:

In [0]:
import pandas as pd

# -- set the file name
fname = "https://data.delaware.gov/api/views/9be9-z9z2/rows.csv?accessType=DOWNLOAD"

# -- load the data
natarea = pd.read_csv(fname)

Recall how we split the "SHAPE" column into latitude and longitude:

Now let's make a function to do that (**and** convert to float) for a given entry:

Notice also the "docstring":

Pandas actually allows us to use this syntax in very powerful ways via the `.apply` method: