# **Title**: **1985 Auto Imports Database**

# **Problem Statement**

* Prepare a complete data analysis report on the given data.

* Create a predictive model by applying some data science techniques for the price of cars with the available independent variables. That should help the management to understand how exactly the prices vary with the independent variables. They can accordingly manipulate the design of the cars, the business strategy etc. to meet certain price levels




# **Source Information:**

   * Creator/Donor: Jeffrey C. Schlimmer (Jeffrey.Schlimmer@a.gp.cs.cmu.edu)
   * Date: 19 May 1987
   * Sources:
     1) 1985 Model Import Car and Truck Specifications, 1985 Ward's
        Automotive Yearbook.
     2) Personal Auto Manuals, Insurance Services Office, 160 Water
        Street, New York, NY 10038
     3) Insurance Collision Report, Insurance Institute for Highway
        Safety, Watergate 600, Washington, DC 20037


# **Description**

This data set consists of three types of entities:

(a) the specification of an auto in terms of various characteristics,

(b)its assigned insurance risk rating,

(c) its normalized losses in use as compared to other cars.  The second rating corresponds to the degree to which the auto is more risky than its price indicates.Cars are initially assigned a risk factor symbol associated with its price.   Then, if it is more risky (or less), this symbol is adjusted by moving it up (or down) the scale.  Actuarians call this process "symboling".  A value of +3 indicates that the auto isrisky, -3 that it is probably pretty safe.The third factor is the relative average loss payment per insured vehicle year.  This value is normalized for all autos within a particular size classification (two-door small, station wagons,sports/speciality, etc...), and represents the average loss per car per year.

-- Note: Several of the attributes in the database could be used as a
            "class" attribute.

* Number of Instances: 205

* Number of Attributes: 26 total
   -- 15 continuous
   -- 1 integer
   -- 10 nominal

* Attribute Information:     
    
  1. symboling:-3, -2, -1, 0, 1, 2, 3.
  2. normalized-losses:continuous from 65 to 256.
  3. make: alfa-romero, audi, bmw, chevrolet, dodge, honda,isuzu, jaguar, mazda, mercedes-benz, mercury,
  mitsubishi, nissan, peugot, plymouth, porsche renault, saab, subaru, toyota, volkswagen, volvo
  4. fuel-type:                diesel, gas.
  5. aspiration:               std, turbo.
  6. num-of-doors:             four, two.
  7. body-style:               hardtop, wagon, sedan, hatchback, convertible.
  8. drive-wheels:             4wd, fwd, rwd.
  9. engine-location:          front, rear.
 10. wheel-base:               continuous from 86.6 120.9.
 11. length:                   continuous from 141.1 to 208.1.
 12. width:                    continuous from 60.3 to 72.3.
 13. height:                   continuous from 47.8 to 59.8.
 14. curb-weight:              continuous from 1488 to 4066.
 15. engine-type:              dohc, dohcv, l, ohc, ohcf, ohcv, rotor.
 16. num-of-cylinders:         eight, five, four, six, three, twelve, two.
 17. engine-size:              continuous from 61 to 326.
 18. fuel-system:              1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi.
 19. bore:                     continuous from 2.54 to 3.94.
 20. stroke:                   continuous from 2.07 to 4.17.
 21. compression-ratio:        continuous from 7 to 23.
 22. horsepower:               continuous from 48 to 288.
 23. peak-rpm:                 continuous from 4150 to 6600.
 24. city-mpg:                 continuous from 13 to 49.
 25. highway-mpg:              continuous from 16 to 54.
 26. price:                    continuous from 5118 to 45400.

## **Task 1** : Prepare a complete data analysis report on the given data.

# **Import Basic libraries**

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt


# **Import Datasets as df**

In [2]:
df = pd.read_csv("/content/auto imports1.csv")

FileNotFoundError: [Errno 2] No such file or directory: '/content/auto imports1.csv'

In [None]:
df.to_csv(
    'auto_imports.csv',
    header=["symboling", "normalized-losses", "make", "fuel-type","aspiration","num-of-doors","body-style","drive-wheels","engine-location","wheel-base","length","width","height","curb-weight","engine-type","num-of-cylinders","engine-size","fuel-system","bore","stroke","compression-ratio","horsepower","peak-rpm","city-mpg","highway-mpg","price"],
    index=False)
# In dataset we don't have header so we add Dataset header

#**Domain Analysis**



*   Price is the Target.
*   Make,model,year,fuel,engine are independant variables.

*   Attributes like make, model, year, and body type can influence prices. Newer models and luxury brands tend to command higher prices.
*   This data is basically tell us about the Price by the management.

* They can accordingly manipulate the design of the cars, the business strategy etc.. to meet certain price levels.

**Symboling**: It Corresponds to a car's insurance risk level.

**Normalized Losses**: It is the relative average loss payment per insured vehicle year.

**Make:** It refers to the brand of the vehicle.

**Aspiration:** A naturally aspirated engine is an internal combustion engine in which air intake depends solely on atmospheric pressure.

**FuelSystem: **The fuel system in a vehicle is the combination of parts needed to carry fuel into and out of the engine.

**WheelBase:** A car's wheelbase is the distance between the centres of the front and rear wheels.

**BodyStyle:** A Car's Body Style refers to the shape and size of your car, and with a multitude of various body styles.

**HorsePower:** Horsepower is a measurement used to calculate how quickly the force is produced from a vehicle's engine.

**PeakRpm:** The normal RPM range for cars on highways is generally between 1500 rpm and 2000 rpm.

**Compression Ratio:** The compression ratio (CR) is defined as the ratio of the volume of the cylinder and its head space.

**Bore and Stroke:** An engine's bore is the diameter of each cylinder, while the stroke is the distance within the cylinder the piston travels.

**City mpg and Highway mpg:** The score a car will get on average in city conditions, with stopping and starting at lower speeds.The average a car will get while driving on an open stretch of road without stopping or starting, typically at a higher speed.







# **Basic Checks**

In [None]:
df.head() # first 5 rows

## Insights
* in this Dataset, Price is a Outcome column
* And price is dependent on all those features which are availble in Datasets

In [None]:
# shape
df.shape

In [None]:
# Columns
df.columns

In [None]:
df.info()

In [None]:
# data types
df.dtypes

In [None]:
df.describe().T

## Insights
* Maximum price of the vehicle is 45400
* minimum price of the vehicle is 5118.0


In [None]:
# check for missing values
df.isnull().sum()

# Exploratory Data Analysis
* Univariate Analysis
* Bivariate Analysis
* Multivarite Analysis

# Univariate

In [None]:
# Check the Distribution of data
plt.figure(figsize=(15,10),facecolor="white")
plotnumber=1
for column in df:
  if plotnumber<=25:
    ax=plt.subplot(5,5,plotnumber)
    sns.histplot(x=df[column])
    plt.xlabel(column,fontsize=8)
    plt.ylabel("price",fontsize=8)
  plotnumber+=1
plt.tight_layout()

#Insights



*  Car insurance risk level lies between -2 to +2 are the majority values.

*  82% of cars are Average payment loss per year.

*   More than 15% of cars prices belongs to the Toyota,Mazda,nissan and mitsubishi,honda,volkswagon,volvo,peugot,subaru are nearer to 5% and audi,benz,bmw,jaguar are less than 4%

*   Mostly Four doors are likely to use Two doors are slightly less in price.
*   Highly concentrate with Gas fuel comparing to diesel.


*  Standard Aspiration is More price level than the turbo aspiration.




*  Almost every car works better with front wheel drives(fwd) rather than the rear wheel drives(rwd)




*  All people are likely to use engine location is in front side. Thus the price value will be high comparing to rear side engine.


*   Wheel base distance from front wheel to rear wheel ranges from 95 to 102


*   Length of the cars average between 160 to 180 At the same time price will be high in these values. similarly High prices of cars are mostly in heights range from 50 to 55

*   Ohc type of engine is more values among the others like rotor,ohcf,dohc etc..

*   Price will be high which Contains four cylinders.
*  Engine Size should be vary from 100 to 150 in range.


*   Mpfi fuel system is affordable in price.

*   Peak_rpm is generally high in between 4800 to 5200 hence the price value is more for this range.
*   Compression Ratio is defined above 60%

*   Bore,stroke,horsepower ratios are in oscillatory in prices.









In [None]:
df.head()