## Lighthouse Labs
### W07D2 Deployment of ML Models
Instructor: Socorro Dominguez  
June 08, 2021

**Agenda:**

* REST APIs
    * What is it?
    * Applications
   
* Intro to Flask
    * Flask for API creation
    

## How is Data Science related to the Web?

Web Pages are intended for Humans. However, there’s lots of valuable data embedding in web pages:
* course listings
* bank records
* blogs

### What if we wanted to collect this data for analysis?

We would need a program that acts like a web browser but collects web document data rather than displaying it.

This is called `web scraping`. Popular methods include Scrapy, a free and open-source web-crawling framework written in Python. 

A Web Scraper...
* acts like a web browser (i.e., sends HTTP GET requests to web server)
* at the time it allows your to process the data that comes back.

Some other useful libraries useful when scraping if you are interested:

Beautiful Soup
* python library that can parse HTML (Super useful)

### Disadvantages of Web Scraping

- Scraping processes are hard to understand.

- Extracted data needs extensive cleaning (This is where we use `Beautiful Soup`). 

- In certain cases, this might take a long time and a lot of energy to complete (show why)

- New data extraction applications a lot of time in the beginning. 

- Web scrapping services are slower than API calls.

- If the developer of a website decides to introduce changes in the code, the scrapping service might stop working.

## What is an API?

**A**pplication  
**P**rogramming  
**I**nterface  
  
  
**RE**presentation  
**S**tate  
**T**ransfer  
Characteristics  

### Characteristics?

Client-server, typically HTTP-based, stateless server


### Furthermore....

some web site’s provide direct access to their data. For example: Twitter, Translink, Car2Go, Google Maps, Yahoo

* Why would they do this?

* Why would some web sites not do this?

### What representation is DATA found in?

**J**ava**S**cript **O**bject **N**otation (json)


Textual format for structured data  
* [a,b,c] for arrays  
* {‘x’: m, ‘y’: n, ‘z’: o} for objects

JSON
* textual description of python (javascript actually) objects
* arrays and dictionaries

```
{
'library': [
           {'title': 'For Whom the Bell Tolls', 'author': 'Ernest Hemingway'},
           {'title': 'Trump: The Art of the Deal', 'author': 'Good Question'}
           ]
}
```

### Using a Web API

Provider defines:
* message format for requests and responses
* usually in both XML and JSON
* registration and authentication
* usually using OAuth (delegated authorization framework for REST/APIs. It enables apps to obtain limited access to a user's data without giving away a user's password.)


Language integration
* might be provided or you might have to do it yourself
* if provided, usually someone other than data source
* library API for various languages like python
* you write a python program that calls library procedures
* library formats messages, sends them to web provider, translates responses as return values

### Getting JSON Data

We need to select the output format using API:
* e.g., http header: accept = application/json


View in browser or Postman
* good for exploration / debugging

Use request .get
* this returns a python array or dictionary

Get a string and parse
* import json
* x = json .loads(aJSONString)

Example using Trasnlink API  

 ``` Get out of slideshow mode```

In [1]:
import config as cfg

# Get your own API token from developer.translink.ca
apikey = cfg.translink['key']

I don't want to save my api keys in Environment Variables

How should you store your credentials in a `config.py` file?

`translink = {'key':'abcdefghi'}`

And then, add this file to your `.gitignore` file. That way, your credentials will be stored safely.

In [2]:
import requests

x = requests.get('http://api.translink.ca/rttiapi/v1/stops/61935?apikey={}' .format(apikey),headers={'accept': 'application/JSON'}).json()
y = requests.get('http://api.translink.ca/rttiapi/v1/stops/61935/estimates?apikey={}' .format(apikey),headers={'accept': 'application/JSON'}).json()
z = requests.get('http://api.translink.ca/rttiapi/v1/buses?apikey={}&routeNo=099' .format(apikey), headers={'accept': 'application/JSON'}).json()

In [3]:
y[0]['RouteNo']

'099'

### The Anatomy Of A Request

It’s important to know that a request is made up of four things:

1. The endpoint

2. The method

3. The headers

4. The data (or body)

1. The endpoint (or route) is the url you request for

root-endpoint/?

https://api.github.com

2. The Method is the type of request you send to the server. You can choose from these types below:

a. GET - Used to get resource from server

b. POST - Used to create new resource on server

c. PUT/PATCH - update resource on server

d. DELETE - delete a resource on the server

## FLASK

Flask is a micro web framework written in Python. It can create a REST API that allows you to send data, and receive a predictions as a response.

Now that you are going to be a Data Scientist, you cannot always rely on having your models in Jupyter Notebook.

Jupyter Notebooks are awesome for EDA. However, when you need an application that has a predictive model, you will need to deploy your model elsewhere.

You can try to get the best model possible in a notebook or a script. Once you have decided that you have the best model, you must hand it in a way that the client can run it easily in their infraestructure. 

For this purpose you need a tool that can fit in their  infrastructure, preferably in a language that you’re familiar with. This is where you can use Flask. Flask is a micro web framework written in Python. It can create a REST API that allows you to send data, and receive a prediction as a response.

## Pros of Flask
- Easy to understand development: Beginner friendly.
- It is very flexible and easy: Comes with a template engine too!
- Testing: Unit testing is possible.

## Cons of Flask
- Since it is too easy, it allows to use low-quality code creating a "bad web application".
- Scalability: It can handle every request one at a time. For multiple requests, it will be slow.
- Modules: Using more modules is seen as a third party involvement which could be a major breach in security.

Let's do an example on how to do an API using our DS models.

Flask is not the only end point. 

Some people prefer using [Streamlit](https://streamlit.io/) and if you have to do a Dashboard, Plotly [Dash](https://plotly.com/dash/)

## What is Tmux

tmux’s is a terminal multiplexer. 

- Within one terminal window you can open multiple windows and split-views (called “panes” in tmux lingo). 

- Each pane contains its own, independently running terminal instance.

- You won't need to open multiple terminal emulator windows.

### Tmux

Show Tmux and its interactivity for multiple session handling.

You can learn more about it [here](https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/)


Installing tmux:  
`sudo apt-get install tmux` (Ubuntu and derivatives)   
`brew install tmux` (Mac)  

Useful commands in Tmux


- Splitting Panes:
(Ctrl + b) %   (press ctrl+b together, release, type in %)

- Navigating Panes:
(Ctrl + b) + arrows

- Exiting a pane:
Type `exit`