In [3]:
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "2e84d979-6391-0006-aa01-8a741d5f7f2b"
      },
      "source": [
        "<h1><center>Exploratory Data Analysis using Python</center></h1>\n",
        "<h1><center>for</center></h1>\n",
        "![](http://www.indiareviewchannel.com/attachments/categories/productcategory/1/3-Automobile-Reviews-in-India.jpg)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "5c4a6aba-fdcc-9ef3-d7b2-46512e54f273"
      },
      "outputs": [],
      "source": [
        "# import libraries\n",
        "import pandas as pd\n",
        "import numpy as np"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "6f081a5b-f0c8-dff0-a08b-47d1f7fe52e8"
      },
      "source": [
        "### Reding the data"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "13f82a97-67be-4358-cc15-1be56e2205bd"
      },
      "outputs": [],
      "source": [
        "automobile = pd.read_csv('../input/Automobile_data.txt')\n",
        "automobile.head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "066f2a29-adb3-020c-906f-cb2219838406"
      },
      "source": [
        "### Getting the data types of the data set"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "04b575b3-e2bf-09cc-0a1c-f027018a75aa"
      },
      "outputs": [],
      "source": [
        "automobile.dtypes"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "df57a8ad-58ff-d091-1070-aa6e488bb07b"
      },
      "source": [
        "### Statistics of the data set"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "fd9a6f7b-62c0-fc33-336b-1ecc70cc3f97"
      },
      "outputs": [],
      "source": [
        "automobile.describe()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "8b7f2a3e-6fad-3979-06a8-fa04d8276890"
      },
      "source": [
        "## Cleaning of the data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "9377bfb6-6a97-7529-15ab-bf6000c3be85"
      },
      "source": [
        "### Find out if there are null fields"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "5be9f7ab-5f41-e775-3eca-0a3b56ddad71"
      },
      "outputs": [],
      "source": [
        "automobile.isnull().sum()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "c01ae765-9b5a-ab16-9318-c9404b0d1521"
      },
      "source": [
        "### Cleaning the normalized losses field"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "333b2232-3b93-8420-01f7-57810c6fa760"
      },
      "outputs": [],
      "source": [
        "# Find out number of records having '?' value for normalized losses\n",
        "automobile['normalized-losses'].loc[automobile['normalized-losses'] == '?'].count()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "ec0c7239-223b-c291-eedf-3350376ef616"
      },
      "outputs": [],
      "source": [
        "# Setting the missing value to mean of normalized losses and conver the datatype to integer\n",
        "nl = automobile['normalized-losses'].loc[automobile['normalized-losses'] != '?']\n",
        "nlmean = nl.astype(str).astype(int).mean()\n",
        "automobile['normalized-losses'] = automobile['normalized-losses'].replace('?',nlmean).astype(int)\n",
        "automobile['normalized-losses'].head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "18110d30-f2a2-5fa1-0235-adef7d0e2e72"
      },
      "source": [
        "### Cleaning the price data"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "a663e17b-2a20-2958-cbef-bfd28746b4d2"
      },
      "outputs": [],
      "source": [
        "# Find out the number of values which are not numeric\n",
        "automobile['price'].str.isnumeric().value_counts()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "7b918e8d-241f-ff05-ea27-ddf12267c3c4"
      },
      "outputs": [],
      "source": [
        "# List out the values which are not numeric\n",
        "automobile['price'].loc[automobile['price'].str.isnumeric() == False]"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "a521bbac-bcc7-fb81-baf5-21024c1e1e99"
      },
      "outputs": [],
      "source": [
        "#Setting the missing value to mean of price and convert the datatype to integer\n",
        "price = automobile['price'].loc[automobile['price'] != '?']\n",
        "pmean = price.astype(str).astype(int).mean()\n",
        "automobile['price'] = automobile['price'].replace('?',pmean).astype(int)\n",
        "automobile['price'].head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "c548c14d-843f-f599-3dd5-b724212d0631"
      },
      "source": [
        "### Cleaning the horsepower"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "c634e83a-19c3-fdb3-aa1a-095fa1dd8b45"
      },
      "outputs": [],
      "source": [
        "# Checking the numberic and replacing with mean value and conver the datatype to integer\n",
        "automobile['horsepower'].str.isnumeric().value_counts()\n",
        "horsepower = automobile['horsepower'].loc[automobile['horsepower'] != '?']\n",
        "hpmean = horsepower.astype(str).astype(int).mean()\n",
        "automobile['horsepower'] = automobile['horsepower'].replace('?',pmean).astype(int)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "f7186fed-73d0-af39-65e6-9003186acb34"
      },
      "outputs": [],
      "source": [
        "#Checking the outlier of horsepower\n",
        "automobile.loc[automobile['horsepower'] > 10000]"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "a8a06596-1720-af9d-f75b-00dee2d9b6b9"
      },
      "outputs": [],
      "source": [
        "#Excluding the outlier data for horsepower\n",
        "automobile[np.abs(automobile.horsepower-automobile.horsepower.mean())<=(3*automobile.horsepower.std())]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "ea2ae019-515f-82da-5037-0f3a0b50c839"
      },
      "source": [
        "### Cleaning bore"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "f6a806e4-69b9-a283-6f0d-03ef9cb2ff6b"
      },
      "outputs": [],
      "source": [
        "# Find out the number of invalid value\n",
        "automobile['bore'].loc[automobile['bore'] == '?']"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "6befddfc-1bc3-34ba-6840-9f9541e99fe0"
      },
      "outputs": [],
      "source": [
        "# Replace the non-numeric value to null and conver the datatype\n",
        "automobile['bore'] = pd.to_numeric(automobile['bore'],errors='coerce')\n",
        "automobile.dtypes"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "91590927-82b5-e9d1-5c0e-2e207987fe83"
      },
      "source": [
        "### Cleaning the stroke"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "a5d7d659-f854-d94b-7a82-c5c66568c19e"
      },
      "outputs": [],
      "source": [
        "# Replace the non-number value to null and convert the datatype\n",
        "automobile['stroke'] = pd.to_numeric(automobile['stroke'],errors='coerce')\n",
        "automobile.dtypes"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "d745eb3c-f00b-7ae6-76b0-802c4443c5ba"
      },
      "source": [
        "### Cleaning the peak rpm data"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "03126a55-d05b-3dc0-5fd9-ba8e6ab081e1"
      },
      "outputs": [],
      "source": [
        "# Convert the non-numeric data to null and convert the datatype\n",
        "automobile['peak-rpm'] = pd.to_numeric(automobile['peak-rpm'],errors='coerce')\n",
        "automobile.dtypes"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "2b639c3f-9840-0a5e-9172-7ebe11700e74"
      },
      "source": [
        "### Cleaning the num-of-doors data"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "c69948d1-d5b3-db65-1e2a-3f77ea1334a3"
      },
      "outputs": [],
      "source": [
        "# remove the records which are having the value '?'\n",
        "automobile['num-of-doors'].loc[automobile['num-of-doors'] == '?']\n",
        "automobile = automobile[automobile['num-of-doors'] != '?']\n",
        "automobile['num-of-doors'].loc[automobile['num-of-doors'] == '?']"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "2facc5d9-2e64-031c-cc73-cc64e26a821e"
      },
      "source": [
        "## Univariate Analysis"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "2aa3c4d4-dbb7-3301-f890-2befe2494621"
      },
      "source": [
        "### Import libraries"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "9ac95c05-495d-09f7-c34b-da7cec42f617",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n",
        "% matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "e9b5a050-7702-263f-cce4-962af6f1c1a9"
      },
      "source": [
        "### Vehicle make frequency diagram"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "88c771e9-3993-73cf-3913-08aba66cd6f7"
      },
      "outputs": [],
      "source": [
        "automobile.make.value_counts().nlargest(10).plot(kind='bar', figsize=(15,5))\n",
        "plt.title(\"Number of vehicles by make\")\n",
        "plt.ylabel('Number of vehicles')\n",
        "plt.xlabel('Make');"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "7ab38278-473b-8631-b490-9f4fb09bd7da"
      },
      "source": [
        "### Insurance risk ratings Histogram"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "ee741270-2b0b-e34c-914f-5b8c6ddcc3c8"
      },
      "outputs": [],
      "source": [
        "automobile.symboling.hist(bins=6,color='green');\n",
        "plt.title(\"Insurance risk ratings of vehicles\")\n",
        "plt.ylabel('Number of vehicles')\n",
        "plt.xlabel('Risk rating');"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "f2110ef4-04d2-61bd-7339-29e78836cc75"
      },
      "source": [
        "### Normalized losses histogram"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "9f4ae3d7-3987-cddc-6819-1f8a17bdba6f"
      },
      "outputs": [],
      "source": [
        "automobile['normalized-losses'].hist(bins=5,color='orange');\n",
        "plt.title(\"Normalized losses of vehicles\")\n",
        "plt.ylabel('Number of vehicles')\n",
        "plt.xlabel('Normalized losses');"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "0392c750-f18d-e1cd-3048-45941dc2a5c9"
      },
      "source": [
        "### Fuel type bar chart"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "3df43e85-0b46-d459-2442-e0888ebb97a7"
      },
      "outputs": [],
      "source": [
        "automobile['fuel-type'].value_counts().plot(kind='bar',color='purple')\n",
        "plt.title(\"Fuel type frequence diagram\")\n",
        "plt.ylabel('Number of vehicles')\n",
        "plt.xlabel('Fuel type');"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "6a012f52-60a5-7307-7dec-836843320e1c"
      },
      "source": [
        "### Fuel type pie diagram"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "5966c19b-9520-75c3-f5e2-fa8252895182"
      },
      "outputs": [],
      "source": [
        "automobile['aspiration'].value_counts().plot.pie(figsize=(6, 6), autopct='%.2f')\n",
        "plt.title(\"Fuel type pie diagram\")\n",
        "plt.ylabel('Number of vehicles')\n",
        "plt.xlabel('Fuel type');"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "144bcaa8-a450-cc0e-e150-dfb64af11837"
      },
      "source": [
        "### Horse power histogram"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "40f29aa6-4882-278d-2236-7c585529c186"
      },
      "outputs": [],
      "source": [
        "automobile.horsepower[np.abs(automobile.horsepower-automobile.horsepower.mean())<=(3*automobile.horsepower.std())].hist(bins=5,color='red');\n",
        "plt.title(\"Horse power histogram\")\n",
        "plt.ylabel('Number of vehicles')\n",
        "plt.xlabel('Horse power');"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "9df372aa-c65c-baa7-7c08-56c416625874"
      },
      "source": [
        "### Curb weight histogram"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "7387c404-0a4a-914b-2ab6-84fa57efed76"
      },
      "outputs": [],
      "source": [
        "automobile['curb-weight'].hist(bins=5,color='brown');\n",
        "plt.title(\"Curb weight histogram\")\n",
        "plt.ylabel('Number of vehicles')\n",
        "plt.xlabel('Curb weight');"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "fde40b33-fdbc-ceb0-55b3-ffe26cbf945b"
      },
      "source": [
        "### Drive wheels bar chart"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "78916ae3-5d4e-89f8-888d-1f14f536f34f"
      },
      "outputs": [],
      "source": [
        "automobile['drive-wheels'].value_counts().plot(kind='bar',color='grey')\n",
        "plt.title(\"Drive wheels diagram\")\n",
        "plt.ylabel('Number of vehicles')\n",
        "plt.xlabel('Drive wheels');"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "b219d2cb-9242-3c14-513d-f7f0e73e9edb"
      },
      "source": [
        "### Number of doors bar chart"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "b7f2134d-9272-c793-f0d9-416bcdbeb0ea"
      },
      "outputs": [],
      "source": [
        "automobile['num-of-doors'].value_counts().plot(kind='bar',color='purple')\n",
        "plt.title(\"Number of doors frequency diagram\")\n",
        "plt.ylabel('Number of vehicles')\n",
        "plt.xlabel('Number of doors');"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "ad7d35fe-f132-35e3-3264-08bfda9da170"
      },
      "source": [
        "### Findings\n",
        "We have taken some key features of the automobile dataset for this analysis and below are our findings.\n",
        "1. Toyota is the make of the car which has most number of vehicles with more than 40% than the 2nd highest Nissan\n",
        "2. Most preferred fuel type for the customer is standard vs trubo having more than 80% of the choice\n",
        "3. For drive wheels, front wheel drive has most number of cars followed by rear wheel and four wheel. There are very less number of cars for four wheel drive.\n",
        "4. Curb weight of the cars are distributed between 1500 and 4000 approximately\n",
        "5. Symboling or the insurance risk rating have the ratings between -3 and 3 however for our dataset it starts from -2. There are more cars in the range of 0 and 1.\n",
        "6. Normalized losses which is the average loss payment per insured vehicle year is has more number of cars in the range between 65 and 150."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "4c9689c2-9705-debf-ba98-05b6c9bcc368"
      },
      "source": [
        "## Correlation Analysis"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "2e7f0639-69cb-910a-d65b-969c0f3e97b1"
      },
      "source": [
        "Findings: There are some good inferences we can take it from the correlation heat map.\n",
        "1. Price is more correlated with engine size and curb weight of the car \n",
        "2. Curb weight is mostly correlated with engine size, length, width and wheel based which is expected as these adds up the weight of the car\n",
        "3. Wheel base is highly correlated with length and width of the car\n",
        "4. Symboling and normalized car are correlated than the other fields"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "6da7c560-6114-8218-4e8b-25d535b111a9"
      },
      "outputs": [],
      "source": [
        "import seaborn as sns\n",
        "corr = automobile.corr()\n",
        "sns.set_context(\"notebook\", font_scale=1.0, rc={\"lines.linewidth\": 2.5})\n",
        "plt.figure(figsize=(13,7))\n",
        "a = sns.heatmap(corr, annot=True, fmt='.2f')\n",
        "rotx = a.set_xticklabels(a.get_xticklabels(), rotation=90)\n",
        "roty = a.set_yticklabels(a.get_yticklabels(), rotation=30)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "8fb85539-4485-75cc-b693-18d5089e3f86"
      },
      "source": [
        "## Bivariate Analysis"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "0589b913-e8d2-e000-cabb-8f5c11cd0f10"
      },
      "source": [
        "### Boxplot of Price and make\n",
        "    Findings: Below are our findings on the make and price of the car\n",
        "\t\t○ The most expensive car is manufacture by Mercedes benz and the least expensive is Chevrolet\n",
        "\t\t○ The premium cars costing more than 20000 are BMW, Jaquar, Mercedes benz and Porsche\n",
        "\t\t○ Less expensive cars costing less than 10000 are Chevrolet, Dodge, Honda, Mitsubishi, Plymoth and Subaru\n",
        "\t\t○ Rest of the cars are in the midrange between 10000 and 20000 which has the highest number of cars"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "5d6245ac-a641-6b4a-6f7d-8da51281f1c9"
      },
      "outputs": [],
      "source": [
        "plt.rcParams['figure.figsize']=(23,10)\n",
        "ax = sns.boxplot(x=\"make\", y=\"price\", data=automobile)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "583309dc-b66c-9e42-f431-9733cd6da82a"
      },
      "source": [
        "### Scatter plot of price and engine size\n",
        "\n",
        "Findings: The more the engine size the costlier the price is"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "b0f628e2-3ae3-dda3-4ade-cdffce49bbd9"
      },
      "outputs": [],
      "source": [
        "g = sns.lmplot('price',\"engine-size\", automobile);"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "d67b9339-9587-b97b-5a99-4a6f5acdb7b8"
      },
      "source": [
        "### Scatter plot of normalized losses and symboling\n",
        "Findings: From the scattered plot, it's very evident that the lesser the rating lesser the normalized loss. It looks like the negative ratings are better for the car which has lesser losses."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "df237676-1fab-5286-bbe6-f855b7e01856"
      },
      "outputs": [],
      "source": [
        "g = sns.lmplot('normalized-losses',\"symboling\", automobile);"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "2684e5ef-e6b0-365c-dd08-154100654290"
      },
      "source": [
        "### Scatter plot of Engine size and Peak RPM"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "bfbb5eb6-1f05-f384-d537-59a5fcbe944d"
      },
      "outputs": [],
      "source": [
        "plt.scatter(automobile['engine-size'],automobile['peak-rpm'])\n",
        "plt.xlabel('Engine size')\n",
        "plt.ylabel('Peak RPM');"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "f33e18dc-d5a9-c990-1c7a-4c3aad1f0631"
      },
      "source": [
        "### Scatter plot of City and Highway MPG, Curb weight based on Make of the car\n",
        "\n",
        "Based on the scatter plot and the provided data set, it is clear that for both city and highway mileage of the automobile is inversely proportional to the curb weight. \n",
        "\n",
        "Heavier the Automobile less is the mileage for both City and Highway"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "5443bf09-eb03-9e48-9300-ec5ba6b9c416"
      },
      "outputs": [],
      "source": [
        "g = sns.lmplot('city-mpg',\"curb-weight\", automobile, hue=\"make\", fit_reg=False);"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "92070dac-04db-bd93-67e7-f2b412f0be51"
      },
      "outputs": [],
      "source": [
        "g = sns.lmplot('highway-mpg',\"curb-weight\", automobile, hue=\"make\",fit_reg=False);"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "6db38b52-1d4e-3861-7979-200d35626ee0"
      },
      "source": [
        "### Drive wheels and City MPG bar chart"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "0c348025-8f3b-87de-d67b-c98c8b90befa"
      },
      "outputs": [],
      "source": [
        "automobile.groupby('drive-wheels')['city-mpg'].mean().plot(kind='bar', color = 'peru');\n",
        "plt.title(\"Drive wheels City MPG\")\n",
        "plt.ylabel('City MPG')\n",
        "plt.xlabel('Drive wheels');"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "bd9bc4d3-1fe6-e5f8-21d9-86765a3af274"
      },
      "source": [
        "### Drive wheels and Highway MPG bar chart"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "58324033-5bd9-8e22-601b-3546ca92a788"
      },
      "outputs": [],
      "source": [
        "automobile.groupby('drive-wheels')['highway-mpg'].mean().plot(kind='bar', color = 'peru');\n",
        "plt.title(\"Drive wheels Highway MPG\")\n",
        "plt.ylabel('Highway MPG')\n",
        "plt.xlabel('Drive wheels');"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "6a0de673-c997-51fa-d3a3-ad875dee9f3d"
      },
      "source": [
        "## Boxplot of Drive wheels and Price\n",
        "\n",
        "Findings: It's very evident that the Real wheel drive cars are most expensive and front wheel is least expensive cars. Four wheel drive cars are little higher than the front wheel drive cars. There is very less number of four wheel drive cars in our dataset so this picture might not be very accurate."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "_cell_guid": "debd220e-2373-005b-58a4-1be3ed77bb3c"
      },
      "outputs": [],
      "source": [
        "plt.rcParams['figure.figsize']=(10,5)\n",
        "ax = sns.boxplot(x=\"drive-wheels\", y=\"price\", data=automobile)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "0dc71cea-09e8-2cf7-ea08-82079c4a052f"
      },
      "source": [
        "### Normalized losses based on body style and no. of doors\n",
        "\n",
        "Findings: As we understand the normalized loss which is the average loss payment per insured vehicle is calculated with many features of the cars which includes body style and no. of doors. Normalized losses are distributed across different body style but the two door cars has more number of losses than the four door cars."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "3e4aa86f-411b-069c-9f4b-d4464f41320c"
      },
      "outputs": [],
      "source": [
        "pd.pivot_table(automobile,index=['body-style','num-of-doors'], values='normalized-losses').plot(kind='bar',color='purple')\n",
        "plt.title(\"Normalized losses based on body style and no. of doors\")\n",
        "plt.ylabel('Normalized losses')\n",
        "plt.xlabel('Body style and No. of doors');"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "ceb99559-a9b7-4c2d-c012-a98e23a62474"
      },
      "source": [
        "## Conclusion\n",
        "\n",
        "Analysis of the data set provides \n",
        "\n",
        "- How the data set are distributed\n",
        "- Correlation between different fields and how they are related\n",
        "- Normalized loss of the manufacturer \n",
        "- Symboling : Cars are initially assigned a risk factor symbol associated with its price\n",
        "- Mileage : Mileage based on City and Highway driving for various make and attributes\n",
        "- Price : Factors affecting Price of the Automobile.\n",
        "- Importance of drive wheels and curb weight"
      ]
    }
  ],
  "metadata": {
    "_change_revision": 0,
    "_is_fork": false,
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.0"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}

NameError: name 'true' is not defined