LinkedIn | GitHub | Email: marwah.faraj777@gmail.com
Automobile Dataset | Project Presentation
This dataset contains information from the 1985 Ward's Automotive Yearbook, focusing on used cars. The aim is to analyze factors affecting the price estimation process.
This dataset includes information from 1985 Ward's Automotive Yearbook. It comprises three entities:
- Specifications of autos in terms of various characteristics.
- Assigned insurance risk rating.
- Normalized losses in use compared to other cars.
The uncompressed CSV data is 41.8+ KB with 5530 records. Analysis is performed using pandas, numpy, scipy, matplotlib, and seaborn libraries.
The dataset doesn't contain null values, but some columns lack statistics. Further exploration revealed '?' symbols, unexpected data types, and numerical values presented alphabetically.
Factors affecting car price include brand, performance, and features.
Hypothesis testing:
- Null Hypothesis (H0): The safety rate of expensive cars equals that of cheap cars.
- Alternative Hypothesis (Ha): The safety rate of expensive cars differs from that of cheap cars.
Using the Mann-Whitney U Test, with a low car price range of $6,298, the result was:
- p-value = 0.037
The null hypothesis is rejected, indicating that expensive cars are safer.
Random Forest Algorithm yielded the best R^2 score among various machine learning algorithms.
- Expensive cars tend to be safer.
- Price is strongly related to brand, performance, and specifications.
- Brand: Jaguar is the most expensive.
- Performance: Increasing engine size and horsepower correlates with higher prices.
- Specifications: Rear-wheel drive cars are more expensive, and curb weight influences price, especially with diesel fuel types.
- Random Forest Algorithm achieved the highest accuracy with an R^2 score of 0.87.
- Apply deep learning algorithms for price prediction.
- Explore price differences between Japanese and non-Japanese cars.