First let's load the data using pandas.

In [None]:
import pandas as pd

spain = pd.read_csv(
    "https://saturn-public-data.s3.us-east-2.amazonaws.com/examples/dashboard/thegurus_opendata_renfe_trips_filtered.csv"
)

### Feature Engineering
Now we will do some feature engineering. First we filter the dataframe for rows which do not have missing price value and use `astype` function to change the data types from string to datetime. To set the dataframe index from column insert_date, we use method `set_index` and create new columns. Then we select the predictors and target variable. Finally we use function `get_dummies` to convert categorical variables of X dataframe to binary vector form.

In [None]:
# Extract only the rows where column price is not null .( Objective 1)
spain = spain[spain["price"].notnull()]

# Convert dataType for columns ‘departure’, ‘arrival' and   	'insert_date' ( Objective 2)
spain["departure"] = spain["departure"].astype("datetime64[ns]")
spain["arrival"] = spain["arrival"].astype("datetime64[ns]")
spain["insert_date"] = spain["insert_date"].astype("datetime64[ns]")

# Set column insert_date as index . Create new columns  ‘Year’,     	‘Hour’, ‘Month’ from index. ( Objective 3)
spain.set_index("insert_date", inplace=True)

# Create new columns ‘Year’, ‘Hour’, ‘Month’ from index.
spain["Year"] = spain.index.year
spain["Hour"] = spain.index.hour
spain["Month"] = spain.index.month
# Splitting the data to dependent and independent variables
X = spain[["origin", "destination", "duration", "Hour", "Month", "Year"]]
y = spain["price"]
# Perform one hot coding over ‘origin’ and ‘destination’ ( Objective 4)
X = pd.get_dummies(X, columns=["origin", "destination"])

### Model Creation with Linear Regression
To perform linear regression we first import necessary libraries, `train_test_split` and `LinearRegression` from sklearn. Then we call the model, fit the training set and predict the test set.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
lr = LinearRegression(fit_intercept=True)
lr.fit(x_train, y_train)
preds = lr.predict(x_test)

### Model Creation with ElasticNet Regression
Now let us regularize the linear regression model by constraining weights. I choose ElasticNet, which is the combination of L1 and L2.

In [None]:
from sklearn.linear_model import ElasticNet

regr = ElasticNet(alpha=0.1, l1_ratio=0.5)
result_enet = regr.fit(x_train, y_train)
preds = result_enet.predict(x_test)