Skip to content

Predicting insurance costs with linear regression. (sequential feature selection, k-means cluster analysis)

Notifications You must be signed in to change notification settings

st-olz/linreg_insurance

Repository files navigation

Predicting Insurance Costs with Linear Regression

This project uses an insurance cost data set from Kaggle (https://www.kaggle.com/datasets/mirichoi0218/insurance). It contains information on individual medical insurance bills. Each bill is associated with some characteristics of the person who received it:

  • age: age of primary beneficiary
  • sex: insurance contractor gender, female, male
  • bmi: Body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg / m ^ 2) using the ratio of height to weight, ideally 18.5 to 24.9
  • children: Number of children covered by health insurance / Number of dependents
  • smoker: if person is smoking (yes/no)
  • region: the beneficiary's residential area in the US, northeast, southeast, southwest, northwest.
  • charges: Individual medical costs billed by health insurance

We are interested in how these different characteristics relate to the total medical cost. Since it is a continuous, positive number, a linear regression is promising to bring us some good results.

The procedure is described in the attached notebook linear_regression_insurance_costs.ipynb.

About

Predicting insurance costs with linear regression. (sequential feature selection, k-means cluster analysis)

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published