Skip to content

Predict automobile prices using 1985 Ward's Automotive Yearbook data. Analyze factors influencing pricing, including brand, performance, and features. Explore correlations, visualize insights, and employ machine learning for accurate predictions.

Notifications You must be signed in to change notification settings

marwahfaraj/Automobile_Price_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automobile Price Prediction

Car


Marwah Faraj

LinkedIn | GitHub | Email: marwah.faraj777@gmail.com

Automobile Dataset | Project Presentation


Table of Contents

  1. Background and Motivation

  2. Data

  3. Exploration

  4. Visualization

  5. Machine Learning

  6. Conclusions

  7. Further Study

Background and Motivation

This dataset contains information from the 1985 Ward's Automotive Yearbook, focusing on used cars. The aim is to analyze factors affecting the price estimation process.


Data

This dataset includes information from 1985 Ward's Automotive Yearbook. It comprises three entities:

  1. Specifications of autos in terms of various characteristics.
  2. Assigned insurance risk rating.
  3. Normalized losses in use compared to other cars.

Pipeline

The uncompressed CSV data is 41.8+ KB with 5530 records. Analysis is performed using pandas, numpy, scipy, matplotlib, and seaborn libraries.


Exploration

The dataset doesn't contain null values, but some columns lack statistics. Further exploration revealed '?' symbols, unexpected data types, and numerical values presented alphabetically.

Visualization

Price Estimation

Factors affecting car price include brand, performance, and features.

  • Distribution of car prices: Price Distribution

  • Car count by manufacturer: Car Count

  • Distribution of Japanese cars: Japanese Cars

The Make Factor

  • Median prices by manufacturer: Median Price

The Performance Factor

  • Correlation between engine size, curb weight, cylinders, and horsepower: Correlation Map

  • Price correlation with horsepower and engine size: Price vs. Horsepower and Engine Size

The Feature Factor

  • Price correlation with drive wheel type: Drive Wheel Correlation

  • Price correlation with fuel type: Fuel Type

Q: Are expensive cars safer?

Hypothesis testing:

  • Null Hypothesis (H0): The safety rate of expensive cars equals that of cheap cars.
  • Alternative Hypothesis (Ha): The safety rate of expensive cars differs from that of cheap cars.

Using the Mann-Whitney U Test, with a low car price range of $6,298, the result was:

  • p-value = 0.037

The null hypothesis is rejected, indicating that expensive cars are safer. High vs. Low Price

Machine Learning

Random Forest Algorithm yielded the best R^2 score among various machine learning algorithms.

Score Table

Conclusions

  • Expensive cars tend to be safer.
  • Price is strongly related to brand, performance, and specifications.
    • Brand: Jaguar is the most expensive.
    • Performance: Increasing engine size and horsepower correlates with higher prices.
    • Specifications: Rear-wheel drive cars are more expensive, and curb weight influences price, especially with diesel fuel types.
  • Random Forest Algorithm achieved the highest accuracy with an R^2 score of 0.87.

Further Study

  • Apply deep learning algorithms for price prediction.
  • Explore price differences between Japanese and non-Japanese cars.

About

Predict automobile prices using 1985 Ward's Automotive Yearbook data. Analyze factors influencing pricing, including brand, performance, and features. Explore correlations, visualize insights, and employ machine learning for accurate predictions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published