<hr style="border: 5px solid#0B0B0B;" />
<br>


<br>

<div align="center">
    <img src= "/content/innovation_engineering.png" align="center" width="30%">
</div>


<br>

## FLASK: WEB DEVELOPMENT FOR RAPID AND SCALABLE DEPLOYMENT (OPTIONAL PART 1)



<br>


**Author List:** Elias Castro Hernandez

**About (TL/DR):** The following collection of notebooks introduces developers and data scientists to web development using Flask. Flask is one of many available web server gateway interface (WSGI) tools that enable rapid and scalable websites and apps with a relatively accessible learning curve. The barebones capacity of Flask is particularly valuable when prototyping and iterating upon products, services, and machine learning applications.

**Learning Goal(s):** Gain an understanding of how to utilize available libraries and packages to quickly build products and services -- in real-life settings, using web-first methodology, driven by data, and end-to-end. In particular, learn how to build a bare-bones flask environment for handling large scale email automation tasks.

**Target User:** Data scientists, applied machine learning engineers, and developers

**Prerequisite Knowledge:** (1) Python, (2) HTML, and (3) CSS

**Copyright:** Content curation has been used to expediate the creation of the following learning materials. Credit and copyright belong to the content creators used in facilitating this content. Please support the creators of the resources used by frequenting their sites, and social media.

<hr style="border: 2px solid#0B0B0B;" />

#### CONTENTS

> #### PART 1: PREREQUISITE KNOWLEDGE AND REQUIREMENTS

> #### PART 2: SETTING UP AWS SERVER

> #### PART 3: WRAP UP AND NEXT STEPS

#### APPENDIX

> #### REQUIRED PACKAGES: INSTALLATION

<br>


<hr style="border: 2px solid#0B0B0B;" />


#### PART 1
<br>

## PREREQUISITE **KNOWLEDGE** AND **REQUIREMENTS**


In order to create a barebones website that communicates with existing API's to generate emails, we will need to connect several components. Specifically, we will to spin up an [**AWS server**](https://aws.amazon.com/) account, create an [**Elastic Compute Cloud (EC2)**](https://aws.amazon.com/ec2/) instance, link our [**Flask**](https://flask.palletsprojects.com/en/1.1.x/) environment to the [**Google Sheets API**](https://developers.google.com/sheets/api), and then send emails using [**smtplib**](https://docs.python.org/3/library/smtplib.html) via either a local or remote server. There are many theoretical and applied topics that underlie the various components of technology stack just mentioned. For the sake of focusing on the implementation, none of the notebooks in this series will deepdive into any of the required components. Intead, the content is purposefully accessible and designed for selflearning with the assumption that the user is familiar with the above concepts. In case you are not yet ready but want to follow along, following are some helpful links to help you with some of the prerequisites.<br>


##### **PYTHON**
This notebook and the executable files are built using [Python](https://www.python.org/) and relies on common Python packages (e.g. [NumPy](https://numpy.org/)) for operation. If you have a lot of programming experience in a different language (e.g. C/C++/Matlab/Java/Javascript), you will likely be fine, otherwise:

> [**Python (EDX free)**](https://www.edx.org/course?search_query=python)<br>
> [**Python (Coursera free)**](https://www.coursera.org/search?query=python)


##### **AMAZON ELASTIC COMPUTE CLOUD (EC2)**
Amazon Web Services (AWS) is the leading cloud service provider in both revenue and market share. As such, this notebook and associated scripts are written to function with AWS's Elastic Compute Clud (EC2) service. However, several competitors such as Microsoft's [**Azure**](https://azure.microsoft.com/en-us/) and Google's [**GCP**](https://cloud.google.com/) provide entry pricing and free credits to test out their service. If you elect to use AWS and are not fully familiar with the process of setting a web server, the following are a few approachable resources for using AWS:

> [**How to Get Started with Amazon EC2**](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html#how-to-get-started) <br>


##### **AMAZON SIMPLE EMAIL SERVICE (SES)**
Amazon Simple Email Service (SES) is a cost-effective way for sending and receiving emails. In case you want to learn more, see the following accessible resource:

> [**EC2 INSTANCE USING LINUX**](https://docs.aws.amazon.com/ses/) <br>


##### **SMTPLIB**
Python's **smtplib** package is a powerful library that facilitates the sending of emails from either local or remote servers. This free package is a great alternative to services such as AWS's SES.

> [**Real Python: Sending Emails Using SMTPLIB**](https://realpython.com/python-send-email/)

<hr style="border: 2px solid#0B0B0B;" />

#### PART 2
<br>

## SETTING UP **AWS SERVER**


<div align="center" style="font-size:12px; font-family:FreeMono; font-weight: 100; font-stretch:ultra-condensed; line-height: 1.0; color:#2A2C2B">
    <img src="/images/replace_with_actual.png" align="center" width="40%" padding="20"><br>
    <br>
    System Architecture and Data Flows
</div>

<br>

<br>

Amazon Web Services, like all of their competitors, offers service credits to students in an effort to facilitate learning and promote adoption. If you are interested in requesting AWS student credits, click [here](https://aws.amazon.com/blogs/aws/aws-educate-credits-training-content-and-collaboration-for-students-educators/)


___

#### PART 3


<br>

## **BASELINE**, MODEL COMPARISON. AND **RESULTS**


<br>


###### **LOAD PACKAGES**

In [5]:
import ludwig                                               # main library
from ludwig.api import LudwigModel                          # machine learning 
import torch
import torch.utils.data as data                             # tokenizing and ngrams
from torchtext import data as torchtext_data                # data preprocessing utilities
from torchtext import datasets                              # get data --> https://pytorch.org/text/datasets.html
from nltk.tokenize.treebank import TreebankWordDetokenizer  # parse throught data structure
import pandas as pd
import pandas.util.testing as tm
import yaml                                                 # issuing Ludwig commands
import logging                                              # error and operation logs
from pprint import pprint                                   # human readable print

ModuleNotFoundError: No module named 'torchtext'

###### **LOAD DATA**
To perform sentiment analysis on the SST dataset, we will be using [**Torchtext**](https://pytorch.org/text/) to gather and split the data, while [**Pytorch**](https://pytorch.org/docs/stable/index.html) is used for perprocessing. These tools are being used for simplicity. If you elect to build from the ground up, you fill find the SST data in **stanfordSentimentTreebank** or **stanfordSentimentTreebankRaw** in the data folder. The following provides and gentle introduction to DIY [**data preprocessing**](https://towardsdatascience.com/nlp-text-preprocessing-a-practical-guide-and-template-d80874676e79), while [**mlexplained**](https://mlexplained.com/2018/02/08/a-comprehensive-tutorial-to-torchtext/) has a great intro to Torchtext.

In [None]:
# intialize Vocab tensor objects
text = torchtext_data.Field()
label = torchtext_data.Field(sequential=False)  # no tokenization since False


# get data and split into testing and training --> https://pytorch.org/text/datasets.html#sst
train_data, val_data, test_data = datasets.SST.splits(
    text,
    label,
    fine_grained=True,
    train_subtrees=True,  #use all subtrees in the training set
)

x = []
y = []
for i in trange(len(train_data), ascii=True):
    seq = TreebankWordDetokenizer().detokenize(
        vars(train_data[i])["text"]
    )
    seq = discriminator.tokenizer.encode(seq)
    if add_eos_token:
        seq = [50256] + seq
    seq = torch.tensor(seq, device=device, dtype=torch.long)
    x.append(seq)
    y.append(class2idx[vars(train_data[i])["label"]])
train_dataset = Dataset(x, y)

test_x = []
test_y = []
for i in trange(len(test_data), ascii=True):
    seq = TreebankWordDetokenizer().detokenize(
        vars(test_data[i])["text"]
    )
    seq = discriminator.tokenizer.encode(seq)
    if add_eos_token:
        seq = [50256] + seq
    seq = torch.tensor(seq, device=device, dtype=torch.long)
    test_x.append(seq)
    test_y.append(class2idx[vars(test_data[i])["label"]])
test_dataset = Dataset(test_x, test_y)

discriminator_meta = {
    "class_size": len(idx2class),
    "embed_size": discriminator.embed_size,
    "pretrained_model": pretrained_model,
    "class_vocab": class2idx,
    "default_class": 2,
}


___

#### APPENDIX

<br>

## **INSTALLATION:** LUDWIG, PYTORCH, TORCHTEXT, ALTAIR

<br>

**About Ludwig:** Ludwig is an open source deep learning framework buit atop of TensorFlow that allows users to rapidly train and iterate on state of the art deep learning models with only a few lines of code.

**About Pytorch:** Pytorch is an open source machine library commonly used in language and vision applications. Pytorch is fast through it's use of tensors and optimized algorithms, while also being accessible.

**About Torchtext:** Torchtext makes common natural language preprocessing easier and convenient -- particularly because of built-in functionality that loads, pads and batches data into whatever your prefered deep learning framework requires.

**About Altair:** MAY DELETE ALTAIR IF NO VALUE

<br>

### **General Information**

> [**INSTALL LUDWIG**](https://uber.github.io/ludwig/getting_started/)

> [**INSTALL PYTORCH**](https://pytorch.org/get-started/locally/)

> [**INSTALL TORCHTEXT**](https://pytorch.org/text/index.html)

> [**INSTALL ALTAIR**](https://altair-viz.github.io/getting_started/installation.html)

<br>


___



### **Install LUDWIG**


**To Install Ludwig:**

```bash
# recommended approach
pip install ludwig 
```
OR
```bash
# install it by building the source code from the repository:
git clone git@github.com:uber/ludwig.git
cd ludwig
virtualenv -p python3 venv
source venv/bin/activate
pip install -r requirements.txt
python setup.py install
```


**Note:** Ludwig is developed and tested with Python 3 in mind. To install Python 3:

```bash
sudo apt install python3  # on ubuntu
brew install python3      # on mac
```


___

### **Install PYTORCH**

```python
# prefered method 
conda install -c pytorch pytorch
```


___

### **Install TORCHTEXT**

```python
# prefered method 
conda install -c pytorch torchtext
```


___

### **Install ALTAIR**

```bash
# in a new terminal
pip install altair vega_datasets 
```

OR

```python
conda install -c conda-forge altair vega_datasets
```


___
___
