# Demo 1: Basics of Notebooks and Python Review!

In this demo we will get started with: 
1. Jupyter Notebooks
2. How Jupyter Local Handles the Command Line
3. How Google CoLab (Skippable if working Local)
4. Git
5. Python 

These are all the basic ingreediants for our data science course!



Let's start today by going over what notebooks are, and the final project we'll be doing for this course

# Welcome to Notebooks!

There are a lot of useful keyboard shortcuts you can use -- Check out the Help >> Keyboard Shortcuts menu..

We can designate cells as markdown -- which lets us do some cools stuff...  A few quick and useful things.

[Note: Examples Taken from Adam-p Markdown Cheat Sheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)

First let's look at some headings..

# H1
## H2
### H3
#### H4
##### H5
###### H6

---

We can also section break things...


## Next is how to add emphsis to some text..

Emphasis, aka italics, with *asterisks* or _underscores_.

Strong emphasis, aka bold, with **asterisks** or __underscores__.

Combined emphasis with **asterisks and _underscores_**.

Strikethrough uses two tildes. ~~Scratch this.~~

## Let's also see how to do lists and alignment...

1. First ordered list item
2. Another item
 * Unordered sub-list.
1. Actual numbers don't matter, just that it's a number
 1. Ordered sub-list
4. And another item.

   You can have properly indented paragraphs within list items. Notice the blank line above, and the leading spaces (you need three spaces).


* Unordered list can use asterisks
- Or minuses
+ Or pluses
* add another element
* add another **fancy element**

## Links are really important ... code blocks...

[I'm an inline-style link](https://www.google.com)

URLs and URLs in angle brackets will automatically get turned into links.
http://www.example.com or <http://www.example.com> and sometimes
example.com (but not on Github, for example).

Inline `code has backticks` around it.

```
You can also do blocks of code.
```


You can also tell markdown what type of code you are using...

```javascript
var s = "JavaScript syntax highlighting";
alert(s);
```

```python
s = "Python syntax highlighting"
print s
```

```
No language indicated, so no syntax highlighting.
But let's throw in a <b>tag</b>.
```


## Finally, tables are a bit cumbersome...

Markdown | Less | Pretty
--- | --- | ---
*Still* | `renders` | **nicely**
1 | 2 | 3

---

Let's go back to the slides and learn a little bit more about Git!

## A Little About Directories, Git, Commands, and CoLab

**Note: If you are not using COLAB you can skip this section! For 6720 we'll be using our local machine!**


What Colab does is create a [virtual machine](https://en.wikipedia.org/wiki/Virtual_machine), which is like a fresh install of an operating system made just for you.

This means that in addition to running Python code in this notebook, you can also interact with the command-line as if you were using a terminal to navigate a computer. If you have never used the command-line before, it is worth reading through [this tutorial](https://computers.tutsplus.com/tutorials/navigating-the-terminal-a-gentle-introduction--mac-3855).

Notebooks also allow you to run shell commands by preceding the command with a `!`.

In [None]:
!pwd

In [None]:
!ls

### Cloning a Git Repository

For many labs and demos, you will need to access data that is stored in the class GitHub repository, which is here:

https://github.com/nmattei/cmps3160/

If you have never used GitHub before, git is one of the most widely used version control management systems today, and invaluable when working in a team. GitHub is a web-based hosting service built around git that supports hosting git repositories, user management, etc. There are other similar services, e.g., BitBucket and GitLab.

Our use of git/github for the class will be minimal; however, we encourage you to use it for collaboration for your class project, or for other classes, or for anything because it's great. To learn more about GitHub, see [this tutorial](https://docs.github.com/en/get-started/quickstart/hello-world). Note -- you don't need to do that tutorial to complete this notebook.

The main thing we want to do is clone the course files into this Colab virtual machine. To do so, we will issue a `git clone` command. This will copy all the files from the course Github to our virtual machine:

In [None]:
# clone the course repository
!git clone https://github.com/nmattei/cmps6790.git

We can now see the files in our virtual machine:

In [None]:
!ls

In [None]:
!ls cmps6790

You'll see a `git clone` command at the top of each assignment. This ensures that you have the latest version of the data before you start your work.

To change the current working directory, we will use the `cd` command. Note that we need to prefix this with a % symbol, to ensure the directory change will persist to the next cells.

In [None]:
%cd cmps6790/_labs

In [None]:
!pwd

In [None]:
!ls

Since we cloned the course repository, we now have access to the data in the `cmps3160/_labs/data` folder:

In [None]:
!ls data

In [None]:
# look at the top of the titanic.csv file
!cat data/titanic.csv
# this is equivalent to the file on GitHub:
# https://github.com/nmattei/cmps3160/blob/master/_labs/data/titanic.csv

## Working on assignments

To ensure you have everything configured properly for each assignment, you should do the following:

- Click the corresponding `Open in Colab` link in the `_labs` or `_demos` folder.
- Immediately click `File->Save` (Command/Ctrl S), then "Save a Copy in Drive"
- This will create a new copy in the `Colab Notebooks` folder of your personal Google Drive for whichever Google account you are signed into at the time.
- Save the file regularly as you complete the assignment.
- When you're ready to submit your work:
  + Go to File->Download .ipynb
  + Upload the .ipynb file to the appropriate assignment in [Canvas](https://tulane.instructure.com/)

## Non-persistence of Colab Virtual Machines

An important thing to note about Colab is that files you create during the session will not persist once the runtime shuts down. Google creates these temporary virtual environments to host your notebook, but it shuts them down so the resource can be reallocated to other notebooks. The runtime will shutdown automatically if not used for a few hours, so be careful about files that are created during the session.

This means that the `cmps3160` folder that we just created will disappear if we restart the session. You can test this by clicking on `Runtime->Disconnect and delete Runtime`. If you do so, you'll notice `cmps3160` is gone:

## Mounting Your Google Drive

On some occasions, you may want to create data that will persist. For example, when working on your course project, you don't want to re-collect any data you need for your analysis.

One way to make this data persist is to write directly to your Google Drive, rather than to this virual machine. To do so, we can use a Python command to "mount" the Google Drive. This will pop up a screen asking you to give this Colab notebook access to your Google Drive. **If you have multiple Google accounts, please be sure to use the same one consistently throughout the course.**:

In [None]:
# Mount our personal google drive. This will pop up a
# confirmation screen giving this notebook access to your Google drive.
# You will first need a gmail account for this to work.
from google.colab import drive
drive.mount('/content/drive')

You should now see the contents of your Google drive by navigating to the folder icon in the left panel. It is viewable at `/content/drive/MyDrive`.

To list the contents of a folder, you can use the `ls` command. This should list the contents of the root folder of your Google drive.

In [None]:
!ls /content/drive/MyDrive

We'll look more at this in the lab this week.

Let's go back to the slides and see a few more things about Python and why it's the best!

# Working With Local Files

For CMPS 6720 we'll be working in our local directories and using the command line. You should have started docker as we did in class and used the command line. You can still see where you are here!

In [None]:
!pwd

In [None]:
!ls

In [None]:
%cd ./data

In [None]:
!ls

In [None]:
!head adult.csv

# Let's do some Code!

The cell below loads up a few libraries and does some initialization.

In [None]:
### Import commonly used libraries.

# Load Numpy
import numpy as np
# Load MatPlotLib
import matplotlib.pyplot as plt
# Load Pandas
import pandas as pd

# A style for our plots
plt.style.use('fivethirtyeight')
# Seaborn is a plotting package for Pandas that we'll try out...
import seaborn as sns

### First let's go over the examples from the slides here and make sure we understand them.

In [None]:
# Define a simple function.

def my_func(x, y):
    if x > y:
        return x
    else:
        return y

In [None]:
my_func(1,2)

In [None]:
def my_func(x, y):
    return (x-1, y+2)


Be careful with notebooks... we can have scope problems if we run cells out of order!

In [None]:
# What is in scope here?
(a, b) = my_func(1, 2)

In [None]:
print(a)

In [None]:
print(b)

Let's look at some simple lists and data structures.

In [None]:
a = [1, 2, 4, 'a']
a

In [None]:
# len: returns the number of items of an enumerable object
len(['c', 'm', 's', 'c', 3, 2, 0])


In [None]:
# range: returns an iterable object
list(range(10))


In [None]:
# enumerate: returns iterable tuple (index, element) of a list
a = enumerate( ['311', '320', '330'] )
print(a)

In [None]:
# Recall here that Python3 does lazy evaluation for these iterators.  We have to manually expand it.
list(a)

In [None]:
a = ['311', '320', '330']
for i,j in enumerate(a):
    print(i,j)

In [None]:
def squared(x):
    return x**2

In [None]:
## Map and Filter

# map: apply a function to a sequence or iterable

arr = [1, 2, 3, 4, 5]
out = map(squared, arr)
print(list(out))
print(arr)

In [None]:
new_arr = [x**3 for x in arr if x <= 3]

In [None]:
new_arr

In [None]:
# What happened here??
arr = [1, 2, 3, 4, 5]
out = map(lambda x: x**2, arr)
print(out)


# Remember again lazy evaluation!
print(list(out))

In [None]:
# filter: returns a list of elements for which a predicate is true

arr = [1, 2, 3, 4, 5, 6, 7]
out = filter(lambda x: x % 2 == 0, arr)
print(out)

# Remember again that we have to explicitly evaluate the iterator.
print(list(out))

In [None]:
# A more pythonic way: list comprehension...
x = [i for i in arr if i % 2 == 0]
x

In [None]:
# List comprehensions are the best!
P = [ 2**x for x in range(17) ]
P

In [None]:
# Can also do dictionaries...
D = {x:['no'] for x in range(10)}

In [None]:
D