# DougScore quickstart and EDA

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Dataset

In [None]:
df = pd.read_csv('../input/doug-demuro-dougscore/DougScore.csv')
print(df.shape)
df.head()

In [None]:
df.tail()

In [None]:
df.describe()

In [None]:
df.describe(include='object')

A few things to note already:
* The csv is ranked by their `DougScore`
* The data is clean except for a missing value in `Year` and `Video Length`

In [None]:
df[df['Year'].isna()]

In [None]:
df[df['Video Length'].isna()]

## Groupby

A powerful tool is grouping data by year or brand to rank them.

In [None]:
# Ranking Year by average DougScore
group = df.groupby('Year')['DougScore'].mean() # Get mean
group.sort_values(ascending=False).head() # sort and print first 5 values

In [None]:
# Ranking Car brands by max styling
group = df.groupby('Brand')['Styling'].max() # Get max
group.sort_values(ascending=False).head() # sort and print first 5 values

In [None]:
# Ranking Car country by average Daily score
group = df.groupby('Vehicle Country')['Daily Total'].mean() # Get mean
group.sort_values(ascending=False).head() # sort and print first 5 values

## Plots

In [None]:
# Let's compare Vehicle Country based on daily total
group = df.groupby('Vehicle Country')['Daily Total'].mean()
group.plot(kind='bar', figsize=(14,5))

In [None]:
# Let's see how DougScore has improve over the years
group = df.groupby('Year')['DougScore'].mean()
group.plot(figsize=(14,6))

## Looking for a dream car?

In [None]:
df.columns

In [None]:
# Creating my score based on features that are multiplied by importance factor
df['MyScore'] = (
                  1 * df['Styling'] +
                  2 * df['Acceleration'] +
                  2 * df['Handling'] +
                  4 * df['Comfort'] +
                  5 * df['Quality'] +
                  3 * df['Practicality']
                 )

In [None]:
df2 = df.sort_values('MyScore', ascending=False)
df2.head()

Damn, those are luxurious and expensive. Let me see the top 10 brands I should focus on.

In [None]:
group = df.groupby('Brand')['MyScore'].mean()
choice = group.sort_values(ascending=False).head(10)
choice.plot(kind='bar', figsize=(14,5))
plt.ylabel('MyScore')
plt.show()

**Thanks for reading to the end. Upvote if you enjoyed it or find it helpful.**