#Deep Learning for Developers with Ludwig

<a href="https://imgur.com/exy6IQj"><img src="https://i.imgur.com/exy6IQj.jpg" title="source: imgur.com" /></a>

Ludwig is a toolbox built on top of TensorFlow that allows users to train and test deep learning models without the need to write code.

All you need to provide is a dataset file containing your data, a list of columns to use as inputs, and a list of columns to use as outputs, Ludwig will do the rest. Simple commands can be used to train models both locally and in a distributed way, and to use them to predict new data.

A programmatic API is also available in order to use Ludwig from your python code. A suite of visualization tools allows you to analyze models' training and test performance and to compare them.

### Why use W&B

Think of W&B like GitHub for machine learning models — save machine learning experiments to your private, hosted dashboard. Experiment quickly with the confidence that all the versions of your models are saved for you, no matter where you're running your scripts.

W&B lightweight integrations works with any Python script, and all you need to do is sign up for a free W&B account to start tracking and visualizing your models.

We've instrumented the the Ludwig repo to automatically log training and evaluation metrics to W&B at each logging step.

### Using W&B with Ludwig
To use Ludwig’s new Weights and Biases integration, just add the `–wandb` parameter to your ludwig commands. This will allow training and experiments to be tracked and interacted with on the corresponding Weights and Biases page.

And here an example:
`ludwig train --dataset <DATASET_PATH> --config_file <CONFIG_FILE_PATH> --wandb`

# Installation and demo

In [None]:
! pip install git+http://github.com/uber/ludwig.git -qq
! pip install ludwig[serve] -qq
! pip install wandb -qq 

In [None]:
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

### Text Classification

Text classification also known as text tagging or text categorization is the process of categorizing text into organized groups. By using Natural Language Processing (NLP), text classifiers can automatically analyze text and then assign a set of pre-defined tags or categories based on its content.

Unstructured text is everywhere, such as emails, chat conversations, websites, and social media but it’s hard to extract value from this data unless it’s organized in a certain way. Doing so used to be a difficult and expensive process since it required spending time and resources to manually sort the data or creating handcrafted rules that are difficult to maintain. 

Let's build a text classifier using ludwig.

### Kaggle's AGNews Dataset
AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000  news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html .

The articles are divided into 4 classes:
```
World
Sports
Business
Sci/Tech
```
Let's download the dataset. The dataset from kaggle has been pre-processed and uploaded to W&B as dataset artifact. It can be downloaded using the API that comes associated with each artifact.

In [None]:
import wandb
run = wandb.init()

artifact = run.use_artifact('authors/Classification/AGNews:v0', type='dataset')
artifact_dir = artifact.download()


In [None]:
# Move the files to /content
!mv /content/artifacts/AGNews:v0/final_train.csv  /content/
!mv /content/artifacts/AGNews:v0/final_test.csv  /content/

In [None]:
id_to_label = {
   1: 'World', 2: 'Sports', 3: 'Business', 4: 'Sci/Tech'
    }

In [None]:
import pandas as pd

train_csv = pd.read_csv("final_train.csv")
train_csv.head()

## Experiment Tracking
Ludwig comes with native support for weights and biases which enables you to log the metrics while experimenting witht model training so that you don't lose your progress. These metrics can be viewed in you W&B dashboard. To enable W&B logging you need to pass an additional `--wandb` argument for training.
By default you are given 3 choices during runtime:

* (1) Create a W&B account
* (2) Use an existing W&B account
* (3) Don't visualize my results

After selecting option 2, you'll be redirected to authentication page, after which the training and logging starts.
[Here's the dashboard](https://app.wandb.ai/authors/experiment/runs/yjra94mp?workspace=user-cayush) for the following run

## Train 
This command lets you train a model from your data. You can call it with:


In [None]:
!ludwig train --dataset final_train.csv --config \
"{ input_features: [{name: Title, type: text}, {name: Description, type: text}], \
   output_features: [{name: ClassIndex, type: category}] }" \
 -g 0 --wandb --experiment_name "Classification"


You get all of these detailed insights about the training process in the W&B dashboard:



*  ## Training and system usage Metrics

<a href="https://imgur.com/PT1FE4V"><img src="https://i.imgur.com/PT1FE4V.gif" title="source: imgur.com" /></a>

* ## Tensorboard Metrics
<a href="https://imgur.com/uZzSe9J"><img src="https://i.imgur.com/uZzSe9J.gif" title="source: imgur.com" /></a>


## Predict
This command lets you use a previously trained model to predict on new data. You can call it with:

In [None]:
!ludwig predict --dataset final_test.csv \
--model_path results/Classification_run/model

On performing prediction, you get the following files in both csv and npy format:
* Class Index predictions 
* Class Index probalities for each class
* Highest Class Index probability

In [None]:
prediction = pd.read_csv('results/ClassIndex_predictions.csv')

In [None]:
# Check on some random examples
test_dataset = pd.read_csv('final_test.csv')
index = [100,900,575,1100,1500]
for i in index:
  print(test_dataset.iloc[i], '\n Prediction -> ', id_to_label[prediction.iloc[i][0]], '\n')


## Evaluate
This command lets you use a previously trained model to predict on new data and evaluate the performance of the prediction compared to ground truth. You can call it with:




In [None]:
!ludwig evaluate --dataset final_test.csv \
--model_path results/Classification_run/model

Running evaluation saves the evaluation metrics in the results folder in json format

# Visualize
Ludwig comes with many visualization options. If you want to look at the learning curves of your model for instance, run:

In [None]:
!ludwig visualize \
--visualization learning_curves \
--training_statistics results/Classification_run/training_statistics.json \
--output_directory results/

using `--output_directory ` argument saves the outputs in the desired directory instead of directly displaying them.

## Serving
This command lets you load a pre-trained model and serve it on an http server. It uses port 8000 by default.
Once the server is up and running, you can pass the parameters defined the model configuration as inputs. 
Example:
```
curl http://0.0.0.0:8000/predict -X POST -F 'Title=Science' -F 'Description=Techology'
```

In [None]:
!ludwig serve --model_path results/Classification_run/model

# Save model artifact
Wandb allows you to save your datasets, models and entire directories along with a graph that represents the relation b/w the runs and the logged files. These logged files/directories are termed as Artifacts.  As Ludwig's trained models consist of multiple files, we'll log the entire directory using artifacts.

Here's an example artifacts graph


<a href="https://imgur.com/Y6H4QXZ"><img src="https://i.imgur.com/Y6H4QXZ.png" title="source: imgur.com" /></a>

In [None]:
import wandb
run = wandb.init(project='Classification',name='Artifact_model')
artifact = wandb.Artifact('News_classifier', type='model')

In [None]:
artifact.add_dir('results/Classification_run/model')

In [None]:
run.log_artifact(artifact)

### Load the saved artifact
The saved artifacts can be accessed using the API that automatically gets generated once you upload the artifact. You can then perform transfer learning, prediction or even deploy the models downloaded using artifacts.

# Other Features
There are many other useful commands supported by Ludwig CLI such as:
```
hyperopt: Perform hyperparameter optimization
collect_summary: Prints names of weights and layers activations to use with other collect commands
collect_weights: Collects tensors containing a pretrained model weights
collect_activations: Collects tensors for each datapoint using a pretrained model
export_savedmodel: Exports Ludwig models to SavedModel
export_neuropod: Exports Ludwig models to Neuropod
preprocess: Preprocess data and saves it into HDF5 and JSON format
synthesize_dataset: Creates synthetic data for tesing purposes
```
To know more about these features please visit the [official Ludwig docs](https://ludwig-ai.github.io/ludwig-docs/user_guide/) 