<h3>Making necessary imports</h3>

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn import metrics
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
from tensorflow.keras.optimizers import *
import plotly.express as px
import plotly.figure_factory as ff

<h3> Reading and a bit exploring data </h3>

In [None]:
df=pd.read_csv('/kaggle/input/car-price-prediction/CarPrice_Assignment.csv', index_col='car_ID')
df.head(10)

In [None]:
df.info()

In [None]:
df.describe()

<h3> Data processing </h3>

In [None]:
df.isnull().sum()

As we can see, there is no missing data

In [None]:
df.duplicated(subset=df.columns).sum()

No duplicate values as well. Let's make the following:

1) Drop unnecessary columns <br>
2) Change columns with string data to integer <br>
3) Perform feature scaling <br>
4) Split data for train and test <br>

In [None]:
encoder = LabelEncoder()

df['fueltype'] = encoder.fit_transform(df['fueltype'])
df['aspiration'] = encoder.fit_transform(df['aspiration'])
df['doornumber'] = encoder.fit_transform(df['doornumber'])
df['carbody'] = encoder.fit_transform(df['carbody'])
df['drivewheel'] = encoder.fit_transform(df['drivewheel'])
df['enginelocation'] = encoder.fit_transform(df['enginelocation'])
df['enginetype'] = encoder.fit_transform(df['enginetype'])
df['cylindernumber'] = encoder.fit_transform(df['cylindernumber'])
df['fuelsystem'] = encoder.fit_transform(df['fuelsystem'])

In [None]:
del df['CarName']

In [None]:
X = df[df.columns[:-1]]
y = np.array(df['price'])

In [None]:
scaler = MinMaxScaler(copy=True, feature_range=(0, 1))

X = scaler.fit_transform(X)

In [None]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=123)

<h3> Trying different regression techniques</h3>

Simple Linear Regression

In [None]:
model1 = LinearRegression()
model1.fit(X_train,y_train)

print("Score: ", round(model1.score(X_test,y_test)*100,3), "%")

Decision Tree Regressor

In [None]:
model2 = DecisionTreeRegressor()
model2.fit(X_train, y_train)

print("Score: ", round(model2.score(X_test, y_test)*100,3),"%") #better than previous one

Random Forest Regressor

In [None]:
model3 = RandomForestRegressor(max_depth=7)
model3.fit(X_train, y_train)

print("Score: ", round(model3.score(X_test, y_test)*100,3),"%") #even better

<h4> Neural Networks </h4>

In [None]:
model4 = Sequential()
model4.add(InputLayer(input_shape=(23,)))
    
model4.add(Dense(128,activation="relu",kernel_initializer="normal"))
model4.add(Dense(128,activation="relu",kernel_initializer="normal"))
model4.add(Dense(64,activation="relu",kernel_initializer="normal"))
model4.add(Dense(32,activation="relu",kernel_initializer="normal"))


optim = Adam(
    learning_rate=0.00055,
    beta_1=0.9,
    beta_2=0.999,
    epsilon=1e-07,
    amsgrad=False,
    name="Adam",
)

model4.add(Dense(1,activation="linear",kernel_initializer="normal"))
model4.compile(loss="mse",optimizer= optim,metrics="mae")

In [None]:
model4.fit(X_train,y_train,batch_size=16,epochs=1000,validation_data=(X_test,y_test))

In [None]:
predictions = model4.predict(X_test)

print("Score: ",metrics.r2_score(y_test,predictions)*100)

As we can see mean average error oscillates (mainly) between 1200-1300 , most probably there are lots of local minimums at which network stacks. I am pretty sure model can be improved after some tuning, but for now let me live current results like that. 

<h4> Sample Demonstration </h4>

Lets as an example take Random Forest Regressor and Neural Network models to compare result for random test sample.

In [None]:
random_index = np.random.randint(0,X_test.shape[0])


print("Prediction made by Random Forest Regressor:")
print("\t\tPredicted price:  ", np.round(model3.predict(X_test[random_index:random_index+1])[0],3))
print("\t\tActual price:  ", np.round(y_test[random_index],3))

print("Prediction made by Neural Networks:")
print("\t\tPredicted price:  ", np.round(model4.predict(X_test[random_index:random_index+1])[0][0],3))
print("\t\tActual price:  ", np.round(y_test[random_index],3))

<p style="font-family:verdana;font-size: 120%"> In conclusion, results show that in this particular problem Random Forest Regressor best predicts car price; however, I still want to believe that we can achieve the same results (may be even better) with Neural Networks. In my case I have used simple one, but it can be extended and tuned. At any rate, what makes Random Forest algorithm in this case extremely powerfull is that it does great job first of all in price prediction and unlike Neural Networks, it is light weight, so for this particular problem with such amount of data it is trained faster </p>