#### Inspiration
Some of the answers this dataset can help unwind is:

Which car makes and models are popular and in which cities

What is the typical fare of car rental in various major cities

Users can also explore if the ratings on the sites have any co-relation or do they appear suspicious as most are close to 5 ratings.

#### Take a Quick Look at the Data Structure

Let’s take a look at the top five rows using the DataFrame’s head() method

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()

data=pd.read_csv('../input/cornell-car-rental-dataset/CarRentalData.csv')
data.head()

Each row represents one car. There are 15 attributes : fuel Type,rating ,renter Trips Taken, review Count ,City, Country,longitude, latitude, State, Owener id, rate daily, vehicle make, vehicle model , vehicle type,and vehicle year.

The info() method is useful to get a quick description of the data, in particular the total number of rows, and each attribute’s type and number of non-null values

In [None]:
data.info()

There are 5851 instances in the dataset, which means that it is fairly small by
Machine Learning standards, but it’s perfect to get started.

Notice that the `fuelType` attribute has only 5776 non-null values, meaning that 75 car are missing
this feature. And `rating` attribute has only 5350 non-null values, meaning that 501 car are missing
rating from customers
We will need to take care of this later.

In [None]:
print(data.isnull().sum())


When you looked at the top five rows, you
probably noticed that the values in the `fuelType` column were repetitive,
which means that it is probably a categorical attribute.
You can find out what categories
exist and how many districts belong to each category by using the
value_counts() method:

In [None]:
data["fuelType"].value_counts()

In [None]:
data["location.country"].value_counts()

Since the data is collected in USA, It is better to delete this column

In [None]:
data.drop("location.country", axis=1, inplace=True)

Let’s look at the other fields. The describe() method shows a summary of the
numerical attributes

In [None]:
data.describe()

The count, **mean**, **min**, and **max** rows are self-explanatory. Note that the null values are ignored (so, for example, count of rating is `5350`, not `5850`).
The std row shows the standard deviation, which measures how dispersed the values are. The 25%, 50%, and 75% rows show the corresponding percentiles: a percentile indicates the value below which a given percentage of observations in a group of observations falls.
For example, 25% of the cars have a `rate.daily` lower than 45, while 50% are lower than 69 and 75% are lower than 110. These are often called the 25th percentile (or 1st quartile), the median, and the 75th percentile (or 3rd quartile).

**Discrete/ Categorical Data:** discrete data is quantitative data that can be counted and has a finite number of possible values or data which may be divided into groups e.g. days in a week, number of months in a year, sex (Male/Female/Others), Grades (High/Medium/Low), etc.

In [None]:
# get number of categories value 
print("Number of Categories in: ")
for ColName in data[['fuelType','location.city','location.state','vehicle.make','vehicle.model','vehicle.year']]:
    print("{} = {}".format(ColName,len(data[ColName].unique())))
    

In [None]:
median=data["rating"].median()
data["rating"].fillna(median, inplace=True)

In [None]:
data.info()

It's clear that the majority of car's fueltype is gasoline. Let's go ahead and fill in the missing values with GASOLINE.

In [None]:
data1 = data.replace(np.nan, 'GASOLINE', regex=True)

In [None]:
data1 = data1.rename(columns={'location.latitude': 'latitude', 'location.longitude': 'longitude',
                             'rate.daily': 'rate_daily','vehicle.year': 'vehicle_year'})
data1.head()


### Plotting the data on a map with Folium
The data can be presented on a map, showing where is the car parking , with points radius based on number of trips taken 

**Folium** is a Python library used for visualizing geospatial data. It is easy to use and yet a powerful library. Folium is a Python wrapper for Leaflet. js which is a leading open-source JavaScript library for plotting interactive maps

In [None]:
import folium
from folium.plugins import HeatMap
center = [35.582889, -99.632773]  #data.describe(mean)
m = folium.Map([data1.latitude.mean(), data1.longitude.mean()], zoom_start=4,center=center)
for index, row in data1.iterrows():
    folium.CircleMarker([row['latitude'], row['longitude']],
                        radius=row['renterTripsTaken']/10,
                        fill_color="#3db7e4", 
                       ).add_to(m)
    
points = data1[['latitude', 'longitude']].values
m.add_children(HeatMap(points, radius=15)) # plot heatmap
m.save('map.html')
m

#### Histogram of Rental Car Rating


In [None]:
f, ax = plt.subplots(figsize=(18, 7))
sns.histplot(data=data1, x="rating", binwidth=.01)
ax.set_ylim(0,300)
ax.set_xlim(2,5)
plt.title('Rental Car Rating')
plt.show()
plt.savefig('Rental Car Rating.png', format='png')

#### Histogram of vehicle_year

In [None]:
f, ax = plt.subplots(figsize=(18, 7))
sns.histplot(data=data1, x="vehicle_year")
ax.set_xlim(1990,2021)
plt.title('vehicle year')
plt.show()
plt.savefig('vehicle year.png', format='png')

In [None]:
labels=data1['fuelType'].value_counts().index
values=data1['fuelType'].value_counts().values

#visualization
plt.figure(figsize=(7,7))
plt.pie(values ,labels = labels ,autopct='%1.1f%%')
plt.title('fuelType')
plt.show()
plt.savefig('Fuel Type.png', format='png')

In [None]:
labels=data1['vehicle.type'].value_counts().index
values=data1['vehicle.type'].value_counts().values

#visualization
plt.figure(figsize=(7,7))
plt.pie(values ,labels = labels ,autopct='%1.1f%%')
plt.title('Vehicle Type')
plt.show()
plt.savefig('Vehicle Type.png', format='png')

In [None]:
labels=data1['vehicle.make'].value_counts().index
f, ax = plt.subplots(figsize=(18, 7))
sns.countplot(x='vehicle.make', data=data1,
              order = labels,
              #hue='vehicle.year'
              palette="BuGn_r"
           )
plt.xticks(rotation= 45,fontsize=7 )
ax.set_ylabel('count', fontsize=15, color='b')
ax.set_xlabel('make of the vehicle', fontsize=14, color='b')
#plt.savefig('make of the vehicle.png', format='png')
plt.savefig('myimage.svg', format='svg', dpi=1200)

In [None]:
labels=data1['location.state'].value_counts().index
f, ax = plt.subplots(figsize=(18, 7))
sns.countplot(x='location.state', data=data1,
              order = labels,
              #hue='vehicle.year'
              palette="Set2"
           )
plt.xticks(rotation= 45,fontsize=12 )
ax.set_ylabel('Car count', fontsize=15, color='r')
ax.set_xlabel('location.state', fontsize=14, color='r')
#plt.savefig('make of the vehicle.png', format='png')
plt.savefig('Car count per state', format='svg', dpi=1200)

California, Florida,and Texas are top three states by clear difference from the rest of the states . 

In [None]:
import plotly_express as px
data_make_model = data1.groupby(['vehicle.make', 'vehicle.model']).size().reset_index()
data_make_model.rename(columns = {0:'model_count'}, inplace=True)
data_make_model['make_count'] = data_make_model['vehicle.make'].apply(
    lambda x : data_make_model[data_make_model['vehicle.make'] == x]['model_count'].sum())
data_make_model.sort_values(by = 'make_count', ascending=False, inplace=True)
fig =px.scatter(data_make_model[data_make_model['make_count'] >45],
             x = 'vehicle.make', y='model_count', color = 'vehicle.model',width=1100, height=700,
                title='Make and Model of Top Most Rented Cars')
fig.show()


In [None]:
fig =px.scatter(data1,
             x = 'vehicle.make', y='rate_daily', color = 'vehicle.model',width=1100, height=700,
                title='Daily rate of cars')
fig.show()


> ## Dashboard

#### Objective

Based on the data from shared,i have built a Dashboard. The purpose of this dashboard is to understand the distribution of cars to all states and different cities based on the car model, year of manufacture, and daily rate,etc...

#### Usage

The top part of the dashboard is to control the car model to be displayed, the year of manufacture, and the range of trips taken , additional to fuel Type and Vehcile type filters . The bottom part of the dashboard focuses on the relationship between the different models of cars and the daily rental value with a different value with the number of trips .

#### How I made this?

The challenging part is not making the dashboard. Its the preparation of the data required to be feed into the dashboard! Once you you prepare the data making dashboard is just like another BI tool. 

Fortunately, the only data prepartion here is **Concate** the `lang` and `long` columns itno one coulmn to display it on map.

**From the Limits of Google Maps section of the Google Data Studio Google Maps reference page:
Google Maps won't appear in embedded reports.** 

But you can explore all the dashboard details here:

Stand-alone dashboard url:https://datastudio.google.com/reporting/6ddc4bc8-e881-4983-a37f-51e057485122

In [None]:
from IPython.display import IFrame
IFrame('https://datastudio.google.com/embed/reporting/6ddc4bc8-e881-4983-a37f-51e057485122/page/OFP8B', width='100%', height=900)

In [None]:
from IPython.display import Image
Image("../input/car-rental-dashboard/Car dashboard.JPG")