## Lighthouse Labs
### W07D2 Deployment of ML Models

Instructor: Jeremy Eng

Credit: [Socorro Dominguez](https://github.com/sedv8808/LighthouseLabs/tree/main/W07D2)

**Agenda:**

* Review: REST APIs  
* Intro to Flask
    * Flask for API creation
* Tmux
    

## How is Data Science related to the Web?

Web Pages are intended for Humans. However, there’s lots of valuable data embedded in web pages:
* course listings
* bank records
* blogs

### What if we wanted to collect this data for analysis?

We would need a program that acts like a web browser but collects web document data rather than displaying it.

This is called `web scraping`. Popular methods include Scrapy, a free and open-source web-crawling framework written in Python. 

A Web Scraper...
* acts like a web browser (i.e., sends HTTP GET requests to web server)
* at the time it allows your to process the data that comes back.

Some other useful libraries useful when scraping if you are interested:

Beautiful Soup
* python library that can parse HTML (Super useful)

### Disadvantages of Web Scraping

- Scraping processes are hard to understand.

- Extracted data needs extensive cleaning (This is where we use `Beautiful Soup`). 

- In certain cases, this might take a long time and a lot of energy to complete (show why)

- New data extraction applications a lot of time in the beginning. 

- Web scrapping services are slower than API calls.

- If the developer of a website decides to introduce changes in the code, the scrapping service might stop working.

# Super Easy Example of Web Scraping

In [1]:
import requests
from bs4 import BeautifulSoup

https://en.wikipedia.org/wiki/List_of_current_heads_of_state_and_government

In [2]:
URL = "https://en.wikipedia.org/wiki/List_of_current_heads_of_state_and_government"

In [3]:
res = requests.get(URL).text
res

'<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>List of current heads of state and government - Wikipedia</title>\n<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":false,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"6c62e162-9157-40fe-aea0-76375a6e9ac9","wgCSPNonce":false,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"List_of_current_heads_of_state_and_government","wgTitle":"List of current heads of state and government","wgCurRevisionId":1097698513,"wgRevisionId":1097698513,"wgArticleId":380398,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Short description 

In [4]:
soup = BeautifulSoup(res,'lxml')

In [5]:
type(soup)

bs4.BeautifulSoup

In [6]:
res = requests.get(URL).text
soup = BeautifulSoup(res,'lxml')
for items in soup.find('table', class_='wikitable').find_all('tr')[1::1]:
    data = items.find_all(['th','td'])
    try:
        country = data[0].a.text
        title = data[1].a.text
        try:
            name = data[1].a.find_next_sibling().text
        except:
            pass
    except IndexError:
        pass
                
    print("{}|{}|{}".format(country,title,name))

Afghanistan|Leader|Hibatullah Akhundzada
Albania|President|Ilir Meta
Algeria|President|Abdelmadjid Tebboune
Andorra|Episcopal Co-Prince|Joan Enric Vives i Sicília
Angola|President|João Lourenço
Antigua and Barbuda|Queen|Elizabeth II
Argentina|President|Alberto Fernández
Armenia|President|Vahagn Khachaturyan
Australia|Queen|Elizabeth II
Austria|President|Alexander Van der Bellen
Azerbaijan|President|Ilham Aliyev
Bahamas, The|Queen|Elizabeth II
Bahrain|King|Hamad bin Isa Al Khalifa
Bangladesh|President|Abdul Hamid
Barbados|President|Sandra Mason
Belarus|President|Alexander Lukashenko
Belgium|King|Philippe
Belize|Queen|Elizabeth II
Benin|President|Patrice Talon
Bhutan|King|Jigme Khesar Namgyel Wangchuck
Bolivia|President|Luis Arce
Bosnia and Herzegovina|High Representative|Christian Schmidt
Presidency|Prime Minister|Zoran Tegeltija
Šefik Džaferović|Prime Minister|Zoran Tegeltija
Milorad Dodik|Prime Minister|Zoran Tegeltija
Botswana|President|Mokgweetsi Masisi
Brazil|President|Jair Bolsona

## What is an API?

**A**pplication  
**P**rogramming  
**I**nterface  
  
**RE**presentation  
**S**tate  
**T**ransfer  
Characteristics

> Ordering at restaurant analogy

### Characteristics?

Client-server, typically HTTP-based, stateless server


### What representation is DATA found in?

**J**ava**S**cript **O**bject **N**otation (json)


Textual format for structured data  
* [a,b,c] for arrays  
* {‘x’: m, ‘y’: n, ‘z’: o} for objects

JSON
* textual description of python (javascript actually) objects
* arrays and dictionaries

```
{
'library': [
           {'title': 'For Whom the Bell Tolls', 'author': 'Ernest Hemingway'},
           {'title': 'Trump: The Art of the Deal', 'author': 'Good Question'}
           ]
}
```

### The Anatomy Of A Request

It’s important to know that a request is made up of four things:

1. The endpoint (the URL)

2. The method (verb: GET, PUT, POST, etc.)

3. The headers (parameters)

4. The data (or body)

1. The endpoint (or route) is the url you request for

root-endpoint/?

https://api.github.com

2. The Method is the type of request you send to the server. You can choose from these types below:

a. GET - Used to get resource from server

b. POST - Used to create new resource on server

c. PUT/PATCH - update resource on server

d. DELETE - delete a resource on the server

## FLASK

Flask is a micro web framework written in Python. It can create a REST API that allows you to send data, and receive a predictions as a response.

Now that you are going to be a Data Scientist, you cannot always rely on having your models in Jupyter Notebook.

Jupyter Notebooks are awesome for EDA. However, when you need an application that has a predictive model, you will need to deploy your model elsewhere.

You can try to get the best model possible in a notebook or a script. Once you have decided that you have the best model, you must hand it in a way that the client can run it easily in their infrastructure. 

For this purpose you need a tool that can fit in their  infrastructure, preferably in a language that you’re familiar with. This is where you can use Flask. Flask is a micro web framework written in Python. It can create a REST API that allows you to send data, and receive a prediction as a response.

## Pros of Flask
- Easy to understand development: Beginner friendly.
- It is very flexible and easy: Comes with a template engine too!
- Testing: Unit testing is possible. (relate to their programming tests)

## Cons of Flask
- Since it is too easy, it allows to use low-quality code creating a "bad web application".
- Scalability: It can handle every request one at a time. For multiple requests, it will be slow.
- Modules: Using more modules is seen as a third party involvement which could be a major breach in security and expense.
- Community support is limited - more support for frameworks such as Django (streamlit is another)

Let's do an example on how to do an API using our DS models [Boston_Model.ipynb](Boston_Model.ipynb)

Flask is not the only end point. 

Some people prefer using [Streamlit](https://streamlit.io/) and if you have to do a Dashboard, Plotly [Dash](https://plotly.com/dash/)

In practice, you will deploy a trained model to the cloud (AWS, Azure, GCP)

Cloud computing benefits:
- Cost reduction
- Quick Deployment
- Flexibility
- Scalability
- Security
- Backups

## What is Tmux

tmux’s is a terminal multiplexer. 

- Within one terminal window you can open multiple windows and split-views (called “panes” in tmux lingo). 

- Each pane contains its own, independently running terminal instance.

- You won't need to open multiple terminal emulator windows.

- You can also create multiple "sessions" of terminal. Allows you attach/detach from terminal instances. Useful when ssh'ing and don't want to maintain the connection. (e.g. running `python app.py`)

- Many shortcuts to remember: [https://tmuxcheatsheet.com/](https://tmuxcheatsheet.com/)

- You can learn more about it [here](https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/).


### Tmux?

Show Tmux and its interactivity for multiple session handling.



Installing tmux:  
`sudo apt-get install tmux` (Ubuntu and derivatives)   
`brew install tmux` (Mac)

Useful commands in Tmux


- Splitting Panes:
 - (Ctrl + b) %   (press ctrl+b together, release, type in %)

- Navigating Panes:
 - (Ctrl + b) + arrows

- new pane below
 - (Ctrl + b) "

- Exiting a pane:
 - exit

- Windows
 - (Ctrl + b) c
 
- Switch between windows
 - (Ctrl + b) 0
 - (Ctrl + b) 1
 
- Rename Windows
 - (Ctrl + b) ,
 
- Sessions
- Sessions allow you to ssh via terminal, but then keep it running in the background so you don't have to worry about disconnects.
- ssh into Plato
- Create new session
 - tmux new -s name_of_session
 - (e.g. run htop)

- Detach from session
 - (Ctrl + b) d
- View sessions
 - tmux ls

- Attach to session
 - tmux attach -t name_of_session
- Rename session
 - tmux rename-session new_name_of_session