When I started thinking of Streamlit app I had an idea in mind of person entering the values. One expects not to enter 'scaled' values, but normal ones understandable to human. By the nature of model I saw that those values should be scaled before entering it. So this is sort of a Data Engineering process. Not an optimized one, obviously but it will sure the purpose. ;)

This is just a skeleton of how data from Streamlit app should be processed before entering the FastAPI. First we need to create a dictionary, then follow the pattern used here to prepare the data for entering the FastAPI. 

In [1]:
import joblib
import pandas as pd

In [3]:
# Load the saved ColumnTransformer
loaded_column_transformer = joblib.load('../model/column_transformer.pkl')

In [4]:
expected_dict = {'seniority': 0.7357104035909454,
                                  'income': -0.3407031698861751,
                                  'assets': -0.4639393146108877,
                                  'time': 0.1065451583089126,
                                  'amount': -0.5035235815768409,
                                  'monthly_payment': -0.443498375820261,
                                  'job': 'fixed',
                                  'home': 'parents',
                                  'records': 'yes',
                                  'status': 1} # This is a dummy variable. Keep this always 1 but you will never need it. 
                                               # It is because ColumnTransformer expects it and I don't want to come back to that code
                                              # again for the sake of one column which could be solved like this. 



In [5]:
df = pd.DataFrame([expected_dict])
df

Unnamed: 0,seniority,income,assets,time,amount,monthly_payment,job,home,records,status
0,0.73571,-0.340703,-0.463939,0.106545,-0.503524,-0.443498,fixed,parents,yes,1


In [6]:
cust_scaled = loaded_column_transformer.transform(df)

In [7]:
cust_scaled

array([[-0.8871526903955013, -1.4969153448848151, -0.46397956924044664,
        -3.161781165075927, -2.1905958662964986, -1.4404559852246759,
        'fixed', 'parents', 'yes', 1]], dtype=object)

In [8]:
feature_names = loaded_column_transformer.get_feature_names_out()

In [9]:
df_before_dict = pd.DataFrame(cust_scaled, columns = feature_names)

df_before_dict

Unnamed: 0,scaler__seniority,scaler__income,scaler__assets,scaler__time,scaler__amount,scaler__monthly_payment,passthrough__job,passthrough__home,passthrough__records,passthrough__status
0,-0.887153,-1.496915,-0.46398,-3.161781,-2.190596,-1.440456,fixed,parents,yes,1


In [10]:
df_before_dict.columns

Index(['scaler__seniority', 'scaler__income', 'scaler__assets', 'scaler__time',
       'scaler__amount', 'scaler__monthly_payment', 'passthrough__job',
       'passthrough__home', 'passthrough__records', 'passthrough__status'],
      dtype='object')

In [11]:
df_before_dict.columns = df_before_dict.columns.str.replace('scaler__', '').str.replace('passthrough__', '')

In [12]:
df_before_dict

Unnamed: 0,seniority,income,assets,time,amount,monthly_payment,job,home,records,status
0,-0.887153,-1.496915,-0.46398,-3.161781,-2.190596,-1.440456,fixed,parents,yes,1


In [13]:
#Let's drop the unecessary column status (FastAPI pydantic model doesn't expects that)

df_before_dict = df_before_dict.drop('status', axis=1)

In [14]:
df_before_dict

Unnamed: 0,seniority,income,assets,time,amount,monthly_payment,job,home,records
0,-0.887153,-1.496915,-0.46398,-3.161781,-2.190596,-1.440456,fixed,parents,yes


In [15]:
#Finally, let's convert it to a dict and test if it works with FastAPI
customer_dict = df_before_dict.iloc[0].to_dict()
customer_dict

{'seniority': -0.8871526903955013,
 'income': -1.4969153448848151,
 'assets': -0.46397956924044664,
 'time': -3.161781165075927,
 'amount': -2.1905958662964986,
 'monthly_payment': -1.4404559852246759,
 'job': 'fixed',
 'home': 'parents',
 'records': 'yes'}

In [16]:
import httpx

In [17]:
customer_dict

{'seniority': -0.8871526903955013,
 'income': -1.4969153448848151,
 'assets': -0.46397956924044664,
 'time': -3.161781165075927,
 'amount': -2.1905958662964986,
 'monthly_payment': -1.4404559852246759,
 'job': 'fixed',
 'home': 'parents',
 'records': 'yes'}

In [18]:
httpx.post(url = "http://localhost:8000/", json = customer_dict)

<Response [422 Unprocessable Entity]>

In [19]:
#I've forgot that it also expects an id of a customer

customer_dict['id'] = '2033'

In [20]:
customer_dict

{'seniority': -0.8871526903955013,
 'income': -1.4969153448848151,
 'assets': -0.46397956924044664,
 'time': -3.161781165075927,
 'amount': -2.1905958662964986,
 'monthly_payment': -1.4404559852246759,
 'job': 'fixed',
 'home': 'parents',
 'records': 'yes',
 'id': '2033'}

In [21]:
httpx.post(url = "http://localhost:8000/", json = customer_dict)

<Response [200 OK]>

In [22]:
httpx.post(url = "http://localhost:8000/", json = customer_dict).json()

{'probability_of_defaulting': 0.7338042259216309, 'is_defaulting': True}

I think I'm blessed in knowing that I've made a fake id column in the dataframe (check `predict-script.py` and used DictVectorizer on top of it, and also a prediction using it. I think I will try to drop it from the dictionary just for the sake of not compromising the prediction of a model. I've fixed my `main.py` script. 