<a href="https://colab.research.google.com/github/lcbjrrr/algojust/blob/main/NB04_hist_algo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

- **Algorithmic bias**: Bias that arises from the design, implementation, or use of algorithms. This can be due to skewed training data or poor algorithm design and can lead to unfair outcomes in areas like recruiting or criminal justice.
Omitted variable bias: Occurs when important variables are left out of a model or analysis, leading to inaccurate results. For example, analyzing car data without considering mileage or age can lead to inaccurate conclusions about vehicle value

- **Historical bias**: Occurs when past socio-cultural prejudices and beliefs are reflected in the data and subsequent analysis. This is particularly challenging when historical data is used to train machine learning models, as the models can perpetuate and amplify these biases. An example is an AI hiring tool that learned from historically biased hiring data and ended up favoring male candidates

In [1]:
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/lcbjrrr/algojust/refs/heads/main/car_insurance%20-%20bias.csv")
df = pd.get_dummies(df )
df.head(3)

Unnamed: 0,age,female,premium,vehicle_age,claim_last2y,v_Hatchback,v_SUV,v_Sedan,v_Truck
0,23,1,580,3,0,False,False,True,False
1,24,0,520,5,1,False,True,False,False
2,31,1,510,2,0,True,False,False,False


In [None]:
import joblib
from sklearn.linear_model import LinearRegression
X = df.drop('premium', axis=1)
y = df['premium']
linreg = LinearRegression()
linreg.fit(X,y)
print(linreg.coef_,linreg.intercept_)
joblib.dump(linreg, 'lingreg.pkl')
linreg.score(X,y)

[-7.20418133 39.28951506 -0.31423604 -0.88926581 -0.21726562 -0.81547586
  1.4617983  -0.42905681] 695.0616893956837


0.9824782207152526

In [None]:
loaded_model = joblib.load('lingreg.pkl')
test = pd.DataFrame({'age':[33,33,33,33,66,66],
                     'female':[1,0,1,0,1,0],
                     'vehicle_age':[2,2,2,2,4,4],
                     'claim_last2y':[1,1,0,0,0,0],
                     'v_Hatchback':[1,1,1,1,0,0],
                     'v_SUV':[0,0,0,0,1,1],
                     'v_Sedan':[0,0,0,0,0,0],
                     'v_Truck':[0,0,0,0,0,0] })
test

Unnamed: 0,age,female,vehicle_age,claim_last2y,v_Hatchback,v_SUV,v_Sedan,v_Truck
0,33,1,2,1,1,0,0,0
1,33,0,2,1,1,0,0,0
2,33,1,2,0,1,0,0,0
3,33,0,2,0,1,0,0,0
4,66,1,4,0,0,1,0,0
5,66,0,4,0,0,1,0,0


In [None]:
loaded_model.predict(test[['age','female','vehicle_age','claim_last2y','v_Hatchback','v_SUV','v_Sedan','v_Truck']])

array([494.87821716, 455.5887021 , 495.76748297, 456.4779679 ,
       256.80281685, 217.51330179])