In [2]:
!pip install mlflow

Collecting mlflow
  Downloading mlflow-1.30.0-py3-none-any.whl (17.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.0/17.0 MB[0m [31m44.7 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting querystring-parser<2
  Using cached querystring_parser-1.2.4-py2.py3-none-any.whl (7.9 kB)
Collecting gitpython<4,>=2.1.0
  Using cached GitPython-3.1.31-py3-none-any.whl (184 kB)
Collecting databricks-cli<1,>=0.8.7
  Using cached databricks-cli-0.17.5.tar.gz (82 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting prometheus-flask-exporter<1
  Downloading prometheus_flask_exporter-0.22.3-py3-none-any.whl (18 kB)
Collecting gitdb<5,>=4.0.1
  Using cached gitdb-4.0.10-py3-none-any.whl (62 kB)
Collecting smmap<6,>=3.0.1
  Using cached smmap-5.0.0-py3-none-any.whl (24 kB)
Building wheels for collected packages: databricks-cli
  Building wheel for databricks-cli (setup.py) ... [?25ldone
[?25h  Created wheel for databricks-cli: filename=databricks_cli-0.17.5-py

To package up a TensorFlow model into an MLflow package with accompanying code examples, you can follow these general steps:

Train and save your TensorFlow model: Train your TensorFlow model and save it in a format that can be loaded later. For example, you can use ```tf.saved_model.save``` or ```model.save```.

Set up MLflow tracking: Initialize an MLflow tracking server or set up a local tracking directory to store your experiment runs and artifacts.

Create an MLflow project: Create an MLflow project with a directory structure that includes your TensorFlow model and any necessary dependencies or code examples. The directory structure should look something like this:

```
my_project/
├── MLproject
├── conda.yaml
├── code/
│   ├── train.py
│   ├── predict.py
│   └── ...
└── model/
    └── my_model/
        ├── saved_model.pb
        └── variables/
            ├── variables.data-00000-of-00001
            └── variables.index
```
Define the MLproject file: In the MLproject file, define the entry points for training and predicting with your TensorFlow model. You can also specify the required parameters and dependencies.

Create a conda environment file: In the conda.yaml file, define the required dependencies for your project.

Write code examples: In the code directory, write code examples that demonstrate how to use your TensorFlow model for training and prediction. These examples should use the entry points defined in the MLproject file.

Package up the project: Package up your project as an MLflow artifact by running mlflow projects pack my_project. This will create a .tar.gz file that includes your TensorFlow model, code examples, and dependencies.

Publish the artifact: Publish your MLflow artifact to a registry so that others can access and use your TensorFlow model.

Here's an example of what the MLproject file might look like:
```
name: my_project
conda_env: conda.yaml

entry_points:
  train:
    command: "python code/train.py --data-path {data_path} --model-path {model_path}"
    parameters:
      data_path: {type: str, default: data/}
      model_path: {type: str, default: model/}
  predict:
    command: "python code/predict.py --model-path {model_path} --input-path {input_path} --output-path {output_path}"
    parameters:
      model_path: {type: str, default: model/}
      input_path: {type: str, default: data/test.tfrecord}
      output_path: {type: str, default: predictions.csv}
```
In this example, there are two entry points: train and predict. The train entry point runs the train.py script and takes two parameters: data_path and model_path. The predict entry point runs the predict.py script and takes three parameters: model_path, input_path, and output_path. These parameters are specified as command line arguments and can be set when running the MLflow project.

In [3]:
import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import mlflow

In [12]:
df = pd.read_csv('spam.csv', encoding_errors='ignore')

In [13]:
df = df.iloc[:,0:2]

In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5572 entries, 0 to 5571
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   v1      5572 non-null   object
 1   v2      5572 non-null   object
dtypes: object(2)
memory usage: 87.2+ KB


In [16]:
df['v1'].value_counts()

ham     4825
spam     747
Name: v1, dtype: int64

In [17]:
def get_sequences(texts, tokenizer, train=True, max_seq_length=None):
    sequences = tokenizer.texts_to_sequences(texts)
    
    if train:
        max_seq_length = np.max(list(map(lambda x: len(x) , sequences)))
    
    sequences = tf.keras.preprocessing.sequence.pad_sequences(sequences, maxlen=max_seq_length, padding='post')
    
    return sequences

In [50]:
def preprocess_inputs(df):
    df = df.copy()
    
    df['v1'] = df['v1'].apply(lambda x: 1 if x =='spam' else 0)
     
    # split into X and y
    y = df.iloc[:,0] 
    X = df.iloc[:,1]
    
    # create train test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, train_size= 0.7, shuffle=True, random_state=1234) #shuffle again
    
    # create and fit tokenizers
    tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=8000)
    tokenizer.fit_on_texts(X_train)

    # Convert text to sequences
    X_train = get_sequences(X_train, tokenizer, train=True)
    X_test = get_sequences(X_test, tokenizer, train=False, max_seq_length=X_train.shape[1])
    
    return X_train, X_test, y_train, y_test, tokenizer

In [51]:
X_train, X_test, y_train, y_test, tokenizer = preprocess_inputs(df)

In [52]:
inputs = tf.keras.Input(shape=(X_train.shape[1],))

In [53]:
embedding = tf.keras.layers.Embedding(
    input_dim = tokenizer.num_words, # to match num of unique words in subject line
    output_dim = 64
)(inputs)

# so we have an input of 8000 neurons, going into 64 neurons

In [54]:
flatten = tf.keras.layers.Flatten()(embedding) # unstack amtrix into a vector so a ton of neurons

In [55]:
outputs = tf.keras.layers.Dense(1, activation = 'sigmoid')(flatten) # check out relu use N nodes for multiclass

In [56]:
model = tf.keras.Model(inputs=inputs, outputs=outputs)

In [57]:
model.compile(
    optimizer = 'adam',
    loss = 'binary_crossentropy', # 'sparse_categorical_crossentropy' for multiclass
    metrics = ['accuracy', tf.keras.metrics.AUC(name='auc')]
)

In [58]:
print(model.summary())

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_4 (InputLayer)        [(None, 189)]             0         
                                                                 
 embedding_1 (Embedding)     (None, 189, 64)           512000    
                                                                 
 flatten_1 (Flatten)         (None, 12096)             0         
                                                                 
 dense_1 (Dense)             (None, 1)                 12097     
                                                                 
Total params: 524,097
Trainable params: 524,097
Non-trainable params: 0
_________________________________________________________________
None


In [59]:
history = model.fit(
    X_train,
    y_train,
    validation_split= 0.2,
    batch_size = 32,
    epochs = 100,
    callbacks= [
        tf.keras.callbacks.EarlyStopping(
        monitor='val_loss',
            patience= 3,
            restore_best_weights= True
        )
    ]
)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100


In [60]:
results = model.evaluate(X_test, y_test, verbose=0)

In [61]:
print("    Test Loss: {:.4f}".format(results[0]))
print("Test Accuracy: {:.2f}%".format(results[1] * 100))
print("     Test AUC: {:.4f}".format(results[2]))

    Test Loss: 0.0628
Test Accuracy: 98.33%
     Test AUC: 0.9898


In [63]:
model.save(filepath='FancySpamModel')



INFO:tensorflow:Assets written to: FancySpamModel/assets


INFO:tensorflow:Assets written to: FancySpamModel/assets


In [64]:
new_model = tf.keras.models.load_model('FancySpamModel')

In [65]:
(np.squeeze(np.array(new_model.predict(X_test) >= 0.5, dtype= np.int)) != y_test).sum()



Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  """Entry point for launching an IPython kernel.


28

In [66]:
(np.squeeze(np.array(model.predict(X_test) >= 0.5, dtype= np.int)) != y_test).sum()



Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  """Entry point for launching an IPython kernel.


28

In [None]:
#looking good, now if I can get this up to garden, then pull down and get matching results we have a success!