The data for this Jupyter Notebook is sourced from the following web page: [GitHub - Lora for sequence classification with Roberta-Llama-Mistral](https://github.com/mehdiir/Roberta-Llama-Mistral/blob/main/Lora-for-sequence-classification-with-Roberta-Llama-Mistral.md)


In [4]:
import torch
from transformers import BertModel

# Set the quantization engine. Use "fbgemm" on x86 or "qnnpack" on ARM.
torch.backends.quantized.engine = "qnnpack" #on ARM.

# Load the default BERT model
model = BertModel.from_pretrained('bert-base-uncased')

# Apply dynamic quantization to all linear layers, converting them to int8
quantized_model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)

# Save the quantized model state dictionary
torch.save(quantized_model.state_dict(), "bert_int8_quantized.pth")
print("Quantized model saved as bert_int8_quantized.pth")

Quantized model saved as bert_int8_quantized.pth


In [6]:
MAX_LEN = 512 
roberta_checkpoint = "roberta-large"
mistral_checkpoint = "mistralai/Mistral-7B-v0.1"
llama_checkpoint = "meta-llama/Llama-2-7b-hf"

## Data preperation

### Read in the training data from csv

In [7]:
import pandas as pd
import os
DATA_PATH = "../data/"
train_df=pd.read_csv(os.path.join(DATA_PATH, 'enron_labeled_curated_train.csv'))
test_df=pd.read_csv(os.path.join(DATA_PATH, 'enron_labeled_curated_test.csv'))
# dummy target column for merge test and train into one huggingface data
test_df['target'] = 0 
train_df.info()
test_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2236 entries, 0 to 2235
Data columns (total 7 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   search_phrase    2236 non-null   object 
 1   label            2236 non-null   int64  
 2   email            2236 non-null   object 
 3   mistral_pred     2236 non-null   float64
 4   openhermes_pred  2236 non-null   float64
 5   vicuna_pred      2236 non-null   float64
 6   gemma_pred       2236 non-null   float64
dtypes: float64(4), int64(1), object(2)
memory usage: 122.4+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 559 entries, 0 to 558
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   search_phrase    559 non-null    object 
 1   label            559 non-null    int64  
 2   email            559 non-null    object 
 3   mistral_pred     559 non-null    float64
 4   openhermes_pred  559 non-null  

As the classes are not balanced, we will compute the positive and negative weights and use them for loss calculation later:

In [8]:
print(train_df.label.value_counts())
pos_weights = len(train_df) / (2 * train_df.label.value_counts()[1])
neg_weights = len(train_df) / (2 * train_df.label.value_counts()[0])
POS_WEIGHT, NEG_WEIGHT = (pos_weights, neg_weights)
print(POS_WEIGHT, NEG_WEIGHT)

label
0    1918
1     318
Name: count, dtype: int64
3.5157232704402515 0.5828988529718456


In [9]:
##Then, we compute the maximum length of the column text:
# Number of Characters
max_char=train_df['email'].str.len().max()
# Number of Words
max_words = train_df['email'].str.split().str.len().max()
print(f"The maximum number of character is {max_char}.")
print(f"The maximum number of word is {max_words}.")

The maximum number of character is 2983.
The maximum number of word is 650.


In [10]:
import nltk

nltk.download('punkt')

def split_into_sentences(text, max_words=200):
    sentences = nltk.sent_tokenize(text)
    chunks = []
    current_chunk = []
    current_word_count = 0

    for sentence in sentences:
        word_count = len(sentence.split())
        if current_word_count + word_count <= max_words:
            current_chunk.append(sentence)
            current_word_count += word_count
        else:
            chunks.append(" ".join(current_chunk))
            current_chunk = [sentence]
            current_word_count = word_count

    if current_chunk:
        chunks.append(" ".join(current_chunk))

    return chunks

train_df['email_sentences'] = train_df['email'].apply(lambda x: split_into_sentences(x, max_words=200))
train_df.head()

[nltk_data] Downloading package punkt to /Users/kariato/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Unnamed: 0,search_phrase,label,email,mistral_pred,openhermes_pred,vicuna_pred,gemma_pred,email_sentences
0,mark to market accounting practices,0,Subject: RE: Inconsistant Credit AmountsIssue ...,0.0,0.2,0.7,0.2,[Subject: RE: Inconsistant Credit AmountsIssue...
1,RANDOM,0,Subject: Copy Playstaion FREE ...,0.0,0.0,0.7,0.0,[Subject: Copy Playstaion FREE ...
2,RANDOM,0,Subject: Confidentiality Agreement - PETRONAS ...,0.0,0.0,0.0,0.0,[Subject: Confidentiality Agreement - PETRONAS...
3,it appears that some Enron employees used dumm...,0,Subject: Demand Ken Lay Donate Proceeds from E...,0.0,0.0,0.9,0.0,[Subject: Demand Ken Lay Donate Proceeds from ...
4,mark to market accounting practices,0,Subject: New EBS Procedures for Risk Managemen...,0.0,0.1,0.7,0.0,[Subject: New EBS Procedures for Risk Manageme...


In [12]:
# Create a new dataframe where each element in the list from train_df['email_sentences'] becomes its own row.
# We keep the original 'email' and 'search_phrase' columns.
new_df = train_df[['email', 'search_phrase', 'email_sentences','label']].explode('email_sentences')
# Optionally, rename 'email_sentences' to 'sentence'
new_df.rename(columns={'email_sentences': 'sentence'}, inplace=True)

# Display the first few rows to verify the structure
new_df.head()

Unnamed: 0,email,search_phrase,sentence,label
0,Subject: RE: Inconsistant Credit AmountsIssue ...,mark to market accounting practices,Subject: RE: Inconsistant Credit AmountsIssue ...,0
1,Subject: Copy Playstaion FREE ...,RANDOM,Subject: Copy Playstaion FREE ...,0
1,Subject: Copy Playstaion FREE ...,RANDOM,The PlayStation ?FFFFAE Wizard costs less than...,0
2,Subject: Confidentiality Agreement - PETRONAS ...,RANDOM,Subject: Confidentiality Agreement - PETRONAS ...,0
2,Subject: Confidentiality Agreement - PETRONAS ...,RANDOM,Please note that I have added Clay Harris' \nn...,0


In [None]:
import torch
from transformers import BertModel, BertTokenizer

# Set the quantization engine (use "fbgemm" for most x86 CPUs)
torch.backends.quantized.engine = "qnnpack"

# Load the base BERT model architecture
model = BertModel.from_pretrained('bert-base-uncased')

# Apply dynamic quantization to the model's linear layers (to match the saved quantized model)
quantized_model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)

# Load the saved quantized model's state dictionary
state_dict = torch.load("bert_int8_quantized.pth", map_location=torch.device("cpu"))
quantized_model.load_state_dict(state_dict)
quantized_model.eval()  # Set the model to evaluation mode

# Set up the tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Prepare an input sentence
text = "This is a test sentence."
inputs = tokenizer(text, return_tensors="pt")

# Run inference using the quantized model
with torch.no_grad():
    outputs = quantized_model(**inputs)

# For example, retrieve the pooled output (the [CLS] token representation)
# print("Pooled output:")
# print(outputs.pooler_output)

  device=storage.device,


Pooled output:
tensor([[-0.9373, -0.5128, -0.9141,  0.8618,  0.7192, -0.2890,  0.9293,  0.3266,
         -0.7994, -1.0000, -0.2393,  0.9231,  0.9807,  0.5595,  0.9350, -0.8071,
         -0.3153, -0.6337,  0.3625, -0.7238,  0.6890,  1.0000,  0.3534,  0.4075,
          0.4939,  0.9765, -0.7327,  0.9264,  0.9589,  0.7182, -0.8307,  0.2615,
         -0.9869, -0.2316, -0.9503, -0.9926,  0.5235, -0.6903, -0.0414, -0.1181,
         -0.9018,  0.3827,  1.0000, -0.5312,  0.4407, -0.4065, -1.0000,  0.3352,
         -0.8971,  0.8829,  0.8943,  0.8341,  0.2623,  0.6010,  0.6033, -0.4702,
         -0.0235,  0.2640, -0.3357, -0.6561, -0.6743,  0.4375, -0.7627, -0.9282,
          0.8246,  0.8268, -0.2709, -0.4294, -0.2558,  0.0183,  0.9287,  0.3908,
         -0.3416, -0.8182,  0.7300,  0.3304, -0.6114,  1.0000, -0.6179, -0.9747,
          0.8459,  0.7759,  0.5971, -0.2761,  0.6602, -1.0000,  0.6470, -0.1280,
         -0.9849,  0.1626,  0.6038, -0.3695,  0.8452,  0.6365, -0.6809, -0.5727,
         -0.4

In [14]:
from transformers import BertModel, BertTokenizer
import torch

# load BERT tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert_int8_quantized.pth")

def sentence_to_embedding(sentence):
    # tokenize the sentence: use MAX_LEN from cell index 1
    inputs = tokenizer(sentence, return_tensors="pt", truncation=True, padding="max_length", max_length=MAX_LEN)
    with torch.no_grad():
        outputs = model(**inputs)
    # extract the [CLS] token embedding (first token)
    cls_embedding = outputs.last_hidden_state[0, 0].numpy()
    return cls_embedding

# Convert the 'sentence' column of dev_df to 768-dimensional embeddings
new_df['embedding'] = new_df['sentence'].apply(sentence_to_embedding)
print(new_df['embedding'].iloc[0].shape)  # Expected output: (768,)

OSError: bert_int8_quantized.pth is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

In [36]:
for i in range(768):
    new_df[f'embedding_{i}'] = new_df['embedding'].apply(lambda x: x[i])

  new_df[f'embedding_{i}'] = new_df['embedding'].apply(lambda x: x[i])
  new_df[f'embedding_{i}'] = new_df['embedding'].apply(lambda x: x[i])
  new_df[f'embedding_{i}'] = new_df['embedding'].apply(lambda x: x[i])
  new_df[f'embedding_{i}'] = new_df['embedding'].apply(lambda x: x[i])
  new_df[f'embedding_{i}'] = new_df['embedding'].apply(lambda x: x[i])
  new_df[f'embedding_{i}'] = new_df['embedding'].apply(lambda x: x[i])
  new_df[f'embedding_{i}'] = new_df['embedding'].apply(lambda x: x[i])
  new_df[f'embedding_{i}'] = new_df['embedding'].apply(lambda x: x[i])
  new_df[f'embedding_{i}'] = new_df['embedding'].apply(lambda x: x[i])
  new_df[f'embedding_{i}'] = new_df['embedding'].apply(lambda x: x[i])
  new_df[f'embedding_{i}'] = new_df['embedding'].apply(lambda x: x[i])
  new_df[f'embedding_{i}'] = new_df['embedding'].apply(lambda x: x[i])
  new_df[f'embedding_{i}'] = new_df['embedding'].apply(lambda x: x[i])
  new_df[f'embedding_{i}'] = new_df['embedding'].apply(lambda x: x[i])
  new_

In [29]:
new_df.to_csv(os.path.join(DATA_PATH, 'enron_labeled_curated_train_embeddings.csv'), index=False)

In [16]:
!pip install torch
!pip install transformers
!pip install scikit-learn

Collecting torch
  Downloading torch-2.6.0-cp311-none-macosx_11_0_arm64.whl.metadata (28 kB)
Collecting sympy==1.13.1 (from torch)
  Using cached sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
Downloading torch-2.6.0-cp311-none-macosx_11_0_arm64.whl (66.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m66.5/66.5 MB[0m [31m23.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hUsing cached sympy-1.13.1-py3-none-any.whl (6.2 MB)
Installing collected packages: sympy, torch
  Attempting uninstall: sympy
    Found existing installation: sympy 1.13.3
    Uninstalling sympy-1.13.3:
      Successfully uninstalled sympy-1.13.3
Successfully installed sympy-1.13.1 torch-2.6.0
Collecting transformers
  Downloading transformers-4.50.3-py3-none-any.whl.metadata (39 kB)
Collecting huggingface-hub<1.0,>=0.26.0 (from transformers)
  Downloading huggingface_hub-0.30.1-py3-none-any.whl.metadata (13 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers)
  Downloading tokenizers-

In [61]:
from sklearn.model_selection import train_test_split

# Split new_df into train and temp (dev + test)
train_df, temp_df = train_test_split(new_df, test_size=0.2, random_state=42)

# Split temp into dev and test
dev_df, test_df = train_test_split(temp_df, test_size=0.5, random_state=42)

# Print the sizes of the splits
print(f"Train size: {len(train_df)}")
print(f"Dev size: {len(dev_df)}")
print(f"Test size: {len(test_df)}")

Train size: 3120
Dev size: 390
Test size: 391


In [62]:
import pandas as pd
import numpy as np
from autogluon.tabular import TabularPredictor


In [None]:

# Drop the columns that contain 'embeeding_' in their names
new_df = new_df[[col for col in new_df.columns if 'embeeding_' not in col]]

# Display the first few rows to verify the structure
new_df.head()

# Save the new dataframe to a CSV file
new_df.to_csv('train_sentences.csv', index=False)

# Load the new dataframe from the CSV file
train_df = pd.read_csv('train_sentences.csv')

# Display the first few rows to verify the structure
train_df.head()

Unnamed: 0,email,search_phrase,sentence,label,embedding,embedding_0,embedding_1,embedding_2,embedding_3,embedding_4,...,embedding_758,embedding_759,embedding_760,embedding_761,embedding_762,embedding_763,embedding_764,embedding_765,embedding_766,embedding_767
0,Subject: RE: Inconsistant Credit AmountsIssue ...,mark to market accounting practices,Subject: RE: Inconsistant Credit AmountsIssue ...,0,"[-0.4401801526546478, 0.01315762847661972, 0.1...",-0.44018,0.013158,0.141449,-0.02577,-0.153023,...,0.488915,-0.291685,-0.269912,-0.252105,0.161379,0.168627,-0.294079,-0.273376,0.337591,0.693351
1,Subject: Copy Playstaion FREE ...,RANDOM,Subject: Copy Playstaion FREE ...,0,"[-0.48891207575798035, -0.24375692009925842, 0...",-0.488912,-0.243757,0.316875,0.017723,-0.500928,...,-0.064557,-0.480082,0.077916,-0.335728,-0.415191,0.783664,-0.066681,-0.918032,0.376888,0.448518
2,Subject: Copy Playstaion FREE ...,RANDOM,The PlayStation ?FFFFAE Wizard costs less than...,0,"[-0.009964438155293465, -0.12823735177516937, ...",-0.009964,-0.128237,0.33153,-0.016355,-0.314405,...,-0.036721,-0.231209,0.00016,-0.048076,0.275122,0.555565,-0.212877,-0.370199,0.168414,0.515565
3,Subject: Confidentiality Agreement - PETRONAS ...,RANDOM,Subject: Confidentiality Agreement - PETRONAS ...,0,"[-0.5974640846252441, 0.03644591197371483, 0.3...",-0.597464,0.036446,0.323129,-0.37208,-0.392751,...,0.208595,-0.508679,-0.152902,-0.874383,-0.020434,0.205682,-0.403199,-0.316603,0.173883,0.727822
4,Subject: Confidentiality Agreement - PETRONAS ...,RANDOM,Please note that I have added Clay Harris' \nn...,0,"[-0.4135953187942505, -0.22871854901313782, 0....",-0.413595,-0.228719,0.056801,-0.283231,-0.164325,...,0.392242,-0.655821,-0.122332,-0.820025,-0.274874,0.369236,-0.270573,-0.291641,0.496671,0.822694


['embedding_0',
 'embedding_1',
 'embedding_2',
 'embedding_3',
 'embedding_4',
 'embedding_5',
 'embedding_6',
 'embedding_7',
 'embedding_8',
 'embedding_9',
 'embedding_10',
 'embedding_11',
 'embedding_12',
 'embedding_13',
 'embedding_14',
 'embedding_15',
 'embedding_16',
 'embedding_17',
 'embedding_18',
 'embedding_19',
 'embedding_20',
 'embedding_21',
 'embedding_22',
 'embedding_23',
 'embedding_24',
 'embedding_25',
 'embedding_26',
 'embedding_27',
 'embedding_28',
 'embedding_29',
 'embedding_30',
 'embedding_31',
 'embedding_32',
 'embedding_33',
 'embedding_34',
 'embedding_35',
 'embedding_36',
 'embedding_37',
 'embedding_38',
 'embedding_39',
 'embedding_40',
 'embedding_41',
 'embedding_42',
 'embedding_43',
 'embedding_44',
 'embedding_45',
 'embedding_46',
 'embedding_47',
 'embedding_48',
 'embedding_49',
 'embedding_50',
 'embedding_51',
 'embedding_52',
 'embedding_53',
 'embedding_54',
 'embedding_55',
 'embedding_56',
 'embedding_57',
 'embedding_58',
 'embed

In [63]:
subsample_size = 2000  # for quick demo, try setting to larger values
feature_columns = [i for i in new_df.columns if 'embedding_' in i]
label = 'label'


train_df = train_df[feature_columns + [label]]
dev_df = dev_df[feature_columns + [label]]
test_df = test_df[feature_columns + [label]]
print('Number of training samples:', len(train_df))
print('Number of dev samples:', len(dev_df))
print('Number of test samples:', len(test_df))

Number of training samples: 3120
Number of dev samples: 390
Number of test samples: 391


In [None]:
import pandas as pd
import numpy as np

# Assume np_array is your numpy array containing embeddings (shape: [n_samples, embedding_dim])
np_array = np.random.rand(390, 768)  # example numpy array

# Convert the numpy array to a pandas DataFrame with each embedding as an element in the 'embedding' column
df_embeddings = pd.DataFrame({'embedding': list(np_array)})

# Now df_embeddings can be used by autogluon
print(df_embeddings.head())

In [54]:
from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label='label', path='ag_test_dir2', eval_metric='f1_weighted')
predictor.fit(train_df)

Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.2
Python Version:     3.11.11
Operating System:   Darwin
Platform Machine:   arm64
Platform Version:   Darwin Kernel Version 24.3.0: Thu Jan  2 20:23:36 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T8112
CPU Count:          8
Memory Avail:       6.06 GB / 16.00 GB (37.9%)
Disk Space Avail:   56.05 GB / 1862.82 GB (3.0%)
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.

<autogluon.tabular.predictor.predictor.TabularPredictor at 0x31409c810>

In [59]:
test_df.columns

Index(['embedding_0', 'embedding_1', 'embedding_2', 'embedding_3',
       'embedding_4', 'embedding_5', 'embedding_6', 'embedding_7',
       'embedding_8', 'embedding_9',
       ...
       'embedding_758', 'embedding_759', 'embedding_760', 'embedding_761',
       'embedding_762', 'embedding_763', 'embedding_764', 'embedding_765',
       'embedding_766', 'embedding_767'],
      dtype='object', length=768)

In [53]:
predictor.delete_models(models_to_keep='NeuralNetTorch', dry_run=False)

Deleting model KNeighborsUnif. All files under /Volumes/External/source/GTPracticum/EDA/ag_test_dir/models/KNeighborsUnif will be removed.
Deleting model KNeighborsDist. All files under /Volumes/External/source/GTPracticum/EDA/ag_test_dir/models/KNeighborsDist will be removed.
Deleting model RandomForestGini. All files under /Volumes/External/source/GTPracticum/EDA/ag_test_dir/models/RandomForestGini will be removed.
Deleting model RandomForestEntr. All files under /Volumes/External/source/GTPracticum/EDA/ag_test_dir/models/RandomForestEntr will be removed.
Deleting model ExtraTreesGini. All files under /Volumes/External/source/GTPracticum/EDA/ag_test_dir/models/ExtraTreesGini will be removed.
Deleting model ExtraTreesEntr. All files under /Volumes/External/source/GTPracticum/EDA/ag_test_dir/models/ExtraTreesEntr will be removed.
Deleting model WeightedEnsemble_L2. All files under /Volumes/External/source/GTPracticum/EDA/ag_test_dir/models/WeightedEnsemble_L2 will be removed.


In [64]:
predictor.leaderboard(test_df, silent=True)

Unnamed: 0,model,score_test,score_val,eval_metric,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,NeuralNetTorch,0.875258,0.876538,f1_weighted,0.011753,0.007887,1.97094,0.011753,0.007887,1.97094,1,True,7
1,WeightedEnsemble_L2,0.871561,0.87779,f1_weighted,0.046018,0.035643,3.797954,0.001299,0.000574,0.043385,2,True,8
2,KNeighborsUnif,0.867672,0.819705,f1_weighted,0.026614,0.024629,0.109601,0.026614,0.024629,0.109601,1,True,1
3,KNeighborsDist,0.862582,0.828039,f1_weighted,0.028019,0.024534,0.068269,0.028019,0.024534,0.068269,1,True,2
4,RandomForestEntr,0.862412,0.858205,f1_weighted,0.032966,0.027182,1.783629,0.032966,0.027182,1.783629,1,True,4
5,RandomForestGini,0.851818,0.844939,f1_weighted,0.034254,0.027506,2.152438,0.034254,0.027506,2.152438,1,True,3
6,ExtraTreesGini,0.848218,0.836219,f1_weighted,0.032816,0.027441,0.40773,0.032816,0.027441,0.40773,1,True,5
7,ExtraTreesEntr,0.835,0.835639,f1_weighted,0.035742,0.027782,0.41673,0.035742,0.027782,0.41673,1,True,6


In [85]:
import pickle
x="ag_test_dir/models/NeuralNetTorch/model.pkl"
with open(x, 'rb') as f:
    model = pickle.load(f)
xx=dir(model)
for ii in [i for i in xx if i[0]!="_"]:
    print(ii,getattr(model,ii))

can_compile <bound method AbstractModel.can_compile of <autogluon.tabular.models.tabular_nn.torch.tabular_nn_torch.TabularNeuralNetTorchModel object at 0x306cf9dd0>>
can_estimate_memory_usage_static <bound method AbstractModel.can_estimate_memory_usage_static of <autogluon.tabular.models.tabular_nn.torch.tabular_nn_torch.TabularNeuralNetTorchModel object at 0x306cf9dd0>>
can_estimate_memory_usage_static_child <bound method AbstractModel.can_estimate_memory_usage_static_child of <autogluon.tabular.models.tabular_nn.torch.tabular_nn_torch.TabularNeuralNetTorchModel object at 0x306cf9dd0>>
can_fit <bound method AbstractModel.can_fit of <autogluon.tabular.models.tabular_nn.torch.tabular_nn_torch.TabularNeuralNetTorchModel object at 0x306cf9dd0>>
can_infer <bound method AbstractModel.can_infer of <autogluon.tabular.models.tabular_nn.torch.tabular_nn_torch.TabularNeuralNetTorchModel object at 0x306cf9dd0>>
can_predict_proba <bound method AbstractModel.can_predict_proba of <autogluon.tabular.

In [50]:
x=predictor.info()
import json
json_string = json.dumps(x, indent=2)
print(json_string)

TypeError: Object of type FeatureMetadata is not JSON serializable

In [51]:
# Retrieve the predictor info dictionary
info = predictor.info()

# Convert the FeatureMetadata entry to a string to make it JSON serializable
if 'feature_metadata_in' in info:
    info['feature_metadata_in'] = str(info['feature_metadata_in'])

import json
json_string = json.dumps(info, indent=2)
print(json_string)

TypeError: Object of type FeatureMetadata is not JSON serializable