# Test a trained model
Once you have trained a model, you can test it with the test data you put aside

We will start by rerunning the code from the previous notebook to create a trained model

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

In [2]:
# Load our data from the csv file
delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') 

# Remove rows with null values since those will crash our linear regression model training
delays_df.dropna(inplace=True)

# Move our features into the X DataFrame
X = delays_df.loc[:,['DISTANCE', 'CRS_ELAPSED_TIME']]

# Move our labels into the y DataFrame
y = delays_df.loc[:,['ARR_DELAY']] 

# Split our data into test and training DataFrames
X_train, X_test, y_train, y_test = train_test_split(
    X, 
    y, 
    test_size=0.3, 
    random_state=42
)
regressor = LinearRegression()     # Create a scikit learn LinearRegression object
regressor.fit(X_train, y_train)    # Use the fit method to train the model using your training data

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

## Test the model
Use **Scikitlearn LinearRegression predict** to have our trained model predict values for our test data

We stored our test data in X_Test

We will store the predicted results in  y_pred

In [3]:
y_pred = regressor.predict(X_test)

In [4]:
y_pred

array([[3.47739078],
       [5.89055919],
       [4.33288464],
       ...,
       [5.84678979],
       [6.05195889],
       [5.66255414]])

When we split our data into training and test data we stored the actual values for each row of test data in the DataFrame y_test

We can compare the values in y_pred to the value in y_test to get a sense of how accurately our mdoel predicted arrival delays

In [5]:
y_test

Unnamed: 0,ARR_DELAY
291483,-5.0
98997,-12.0
23454,-9.0
110802,-14.0
49449,-20.0
94944,14.0
160885,-17.0
47572,-20.0
164800,20.0
62578,-9.0
