**Content**

Columns

age: age of primary beneficiary

sex: insurance contractor gender, female, male

bmi: Body mass index, providing an understanding of body, weights that are relatively high or low relative to height,
objective index of body weight (kg / m ^ 2) using the ratio of height to weight, ideally 18.5 to 24.9

children: Number of children covered by health insurance / Number of dependents

smoker: Smoking

region: the beneficiary's residential area in the US, northeast, southeast, southwest, northwest.

charges: Individual medical costs billed by health insurance

## Importing the libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


## Importing the dataset

In [3]:
dataset = pd.read_csv('/content/gdrive/MyDrive/Colab Notebooks/Regressions/Insurance-Forecast-by-using-Regression/insurance.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [4]:
dataset.isnull().values.any()

False

In [5]:
dataset.isnull().sum()

age         0
sex         0
bmi         0
children    0
smoker      0
region      0
charges     0
dtype: int64

In [7]:
#Lable Encoding
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:, 1] = le.fit_transform(X[:, 1])
X[:, 4] = le.fit_transform(X[:, 4])

In [8]:
#OneHot Encoding
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [5])], remainder='passthrough')
X = np.array(ct.fit_transform(X))

In [9]:
print(X)

[[0.0 0.0 0.0 ... 27.9 0 1]
 [0.0 0.0 1.0 ... 33.77 1 0]
 [0.0 0.0 1.0 ... 33.0 3 0]
 ...
 [0.0 0.0 1.0 ... 36.85 0 0]
 [0.0 0.0 0.0 ... 25.8 0 0]
 [0.0 1.0 0.0 ... 29.07 0 1]]


## Splitting the dataset into the Training set and Test set

In [10]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

In [11]:
print(X_train)

[[0.0 0.0 0.0 ... 34.1 4 1]
 [0.0 0.0 1.0 ... 34.43 0 0]
 [1.0 0.0 0.0 ... 36.67 2 1]
 ...
 [0.0 0.0 1.0 ... 25.08 0 0]
 [0.0 1.0 0.0 ... 35.53 0 0]
 [0.0 0.0 0.0 ... 18.5 1 0]]


In [12]:
print(X_test)

[[0.0 0.0 0.0 ... 30.2 1 0]
 [0.0 0.0 1.0 ... 29.37 1 0]
 [0.0 1.0 0.0 ... 40.565 2 1]
 ...
 [1.0 0.0 0.0 ... 40.28 0 0]
 [0.0 0.0 1.0 ... 39.05 3 1]
 [1.0 0.0 0.0 ... 24.795 3 0]]


In [13]:
print(y_train)

[40182.246   1137.4697 38511.6283 ...  5415.6612  1646.4297  4766.022 ]


In [14]:
print(y_test)

[ 9724.53      8547.6913   45702.02235  12950.0712    9644.2525
  4500.33925   2198.18985  11436.73815   7537.1639    5425.02335
  6753.038    10493.9458    7337.748     4185.0979   18310.742
 10702.6424   12523.6048    3490.5491    6457.8434   33475.81715
 23967.38305  12643.3778   23045.56616  23065.4207    1674.6323
  4667.60765   3732.6251    7682.67      3756.6216    8413.46305
  8059.6791   48970.2476   12979.358    20630.28351  14571.8908
  4137.5227    8347.1643   51194.55914  40003.33225   1880.487
  5458.04645   2867.1196   20149.3229   47496.49445  36149.4835
 26018.95052  19749.38338   6940.90985   4718.20355  22192.43711
  2899.48935  18838.70366  23568.272    46255.1125   24227.33724
  3268.84665   2322.6218    8827.2099   14478.33015  13112.6048
  1253.936    46718.16325  13919.8229    9630.397    10736.87075
  9880.068    32548.3405   38746.3551    3180.5101    8162.71625
 13041.921    11554.2236   16232.847    13887.9685   13012.20865
 13217.0945    7147.105     7731.4

## Training the Multiple Linear Regression model on the Training set

In [15]:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

LinearRegression()

### Predicting the Test set results

In [16]:
y_pred = regressor.predict(X_test)
np.set_printoptions(precision=2)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

[[1.12e+04 9.72e+03]
 [9.49e+03 8.55e+03]
 [3.82e+04 4.57e+04]
 [1.63e+04 1.30e+04]
 [6.91e+03 9.64e+03]
 [3.96e+03 4.50e+03]
 [1.58e+03 2.20e+03]
 [1.44e+04 1.14e+04]
 [9.01e+03 7.54e+03]
 [7.51e+03 5.43e+03]
 [4.49e+03 6.75e+03]
 [1.03e+04 1.05e+04]
 [8.80e+03 7.34e+03]
 [3.80e+03 4.19e+03]
 [2.79e+04 1.83e+04]
 [1.07e+04 1.07e+04]
 [1.13e+04 1.25e+04]
 [6.11e+03 3.49e+03]
 [8.24e+03 6.46e+03]
 [2.71e+04 3.35e+04]
 [3.36e+04 2.40e+04]
 [1.44e+04 1.26e+04]
 [1.17e+04 2.30e+04]
 [3.21e+04 2.31e+04]
 [4.17e+03 1.67e+03]
 [9.25e+03 4.67e+03]
 [1.08e+03 3.73e+03]
 [9.80e+03 7.68e+03]
 [3.77e+03 3.76e+03]
 [1.04e+04 8.41e+03]
 [9.01e+03 8.06e+03]
 [4.01e+04 4.90e+04]
 [1.57e+04 1.30e+04]
 [1.39e+04 2.06e+04]
 [2.48e+04 1.46e+04]
 [5.17e+03 4.14e+03]
 [1.26e+04 8.35e+03]
 [3.08e+04 5.12e+04]
 [3.35e+04 4.00e+04]
 [3.67e+03 1.88e+03]
 [3.98e+03 5.46e+03]
 [3.99e+03 2.87e+03]
 [3.05e+04 2.01e+04]
 [3.95e+04 4.75e+04]
 [2.78e+04 3.61e+04]
 [5.09e+03 2.60e+04]
 [1.06e+04 1.97e+04]
 [7.83e+03 6.

## Evaluating the Model Performance

In [17]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

0.7999876970680434