# Training and Testing Data

We are working with a dataset of used BMW car prices to build a prediction model. This model will estimate car prices based on mileage and age. To train and test it, we’ll split the data into training and testing dataset using sklearn’s train_test_split.

In [45]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [46]:
df = pd.read_csv(r'datasets\6_carprices.csv')
df.head()

Unnamed: 0,Mileage,Age(yrs),Sell Price($)
0,69000,6,18000
1,35000,3,34000
2,57000,5,26100
3,22500,2,40000
4,46000,4,31500


### Car Mileage vs Selling price

In [47]:
sns.scatterplot(data=df, x='Mileage', y='Sell Price($)')

<Axes: xlabel='Mileage', ylabel='Sell Price($)'>

### Car Age vs Selling price

In [48]:
sns.scatterplot(data=df, x='Age(yrs)', y='Sell Price($)')

<Axes: xlabel='Mileage', ylabel='Sell Price($)'>

### Features and Target variable

In [49]:
X = df[['Mileage', 'Age(yrs)']] # Features
y = df['Sell Price($)'] # target variable

In [50]:
X.head()

Unnamed: 0,Mileage,Age(yrs)
0,69000,6
1,35000,3
2,57000,5
3,22500,2
4,46000,4


In [51]:
y.head()

0    18000
1    34000
2    26100
3    40000
4    31500
Name: Sell Price($), dtype: int64

### Splitting data into training and testing sets

In [52]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [53]:
len(X_train), len(y_train), len(X_test), len(y_test)

(16, 16, 4, 4)

### Training a linear model using the training set

In [54]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
print("Model trained successfully")

Model trained successfully


Use testing features data to predicts outcome

In [55]:
model.predict(X_test)

array([22262.48189206, 22571.64380185, 38560.99055662, 35176.5451397 ])

In [56]:
y_test.values

array([18000, 19700, 35000, 34000], dtype=int64)

### Model Accuracy

We use the data that we didn't train our model on, to test its accuracy, so now we are seeing the ___testing accuracy___ or the ___predictive accuracy___ of our model. The accuracy is computed using ___model.score___ function.

In [58]:
model.score(X_test, y_test)

0.8360253892678232