Input and output features
In scikit-learn, most estimators require that the input and output features are passed as separate dataframes or arrays. The usual notation is to set X as a dataframe of input features and y as a dataframe containing just the output feature. Double brackets are used to select specific features from a dataframe. Ex: X = df[['x1', 'x2']] creates a new dataframe X containing the features x1 and x2 from the original dataframe df.

Occasionally, the output feature should be stored in array format. When an array is needed, the function np.ravel() from NumPy flattens a dataframe into an array. Using df.head() will print the first few rows of a dataframe df, which can be used to verify the input and output dataframes are as expected.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import LinearRegression

In [3]:
# Load the dataset and drop instances with missing values
rides = pd.read_csv("cab_rides.csv").dropna()
rides.head(2)

Unnamed: 0,distance,cab_type,time_stamp,destination,source,price,surge_multiplier,id,product_id,name
0,0.44,Lyft,1544950000000.0,North Station,Haymarket Square,5.0,1.0,424553bb-7174-41ea-aeb4-fe06d4f4b9d7,lyft_line,Shared
1,0.44,Lyft,1543280000000.0,North Station,Haymarket Square,11.0,1.0,4bd23055-6827-41c6-b23b-3c491f24e74d,lyft_premier,Lux


In [4]:
# X = dataframe of input features
X = rides[['distance']]
X.head(2)

Unnamed: 0,distance
0,0.44
1,0.44


In [5]:
# y = dataframe of the output feature
y = rides[['price']]
y.head(2)

Unnamed: 0,price
0,5.0
1,11.0


In [5]:
# Array of the output feature
np.ravel(y)

array([ 5. , 11. ,  7. , 26. ,  9. , 16.5, 10.5, 16.5,  3. , 27.5, 13.5,
        7. , 12. , 16. ,  7.5,  7.5, 26. ,  5.5, 11. , 16.5,  7. ,  3.5,
       26. , 13.5,  8.5, 15. , 20.5,  8.5,  7. , 27.5,  3.5, 11. , 19.5,
       26. , 16.5, 29.5,  9.5, 15. ,  9.5, 22. ,  9. ,  5. ,  9. , 16.5,
       26. , 13.5, 19.5,  7. ,  9. , 10.5,  3. ,  5. , 13.5, 11. , 26. ,
       16.5,  7. ,  9.5, 16.5,  7. ,  9.5, 27.5, 13. ,  9.5,  9.5,  7. ,
       13.5, 26. , 13.5,  5. , 16.5,  9.5,  9.5, 17. , 10. , 34. , 26. ,
       18.5, 11. , 11. , 11. , 36. , 27.5, 22.5, 16.5, 10.5, 32.5, 19.5,
        7. , 25. , 12. , 12. , 18.5, 32.5, 11. , 27. , 12. , 18.5, 12. ,
        8.5, 35. ,  8. , 13. ,  8. , 27. , 17. ,  8. , 13.5, 23.5, 14. ,
       34. , 26.5, 14. , 26.5, 12. , 11. , 19.5, 12. , 38.5, 23. , 30.5,
       10.5, 10.5, 17.5, 10. , 22.5,  7. , 13.5, 16.5, 27.5,  9. ,  9. ,
       30. , 16.5, 13.5, 22.5,  7. , 14. , 20.5,  9.5,  9.5,  8.5, 13.5,
       26. , 16.5, 11. , 16.5, 13.5,  5. , 27.5,  7