# Part 1) b) Review listing count by manufacturer

Here we are going to sort the dataframe of the results in Part 1, to see the spread of manufacturer listings.

In [23]:
import pandas as pd
import numpy as np

### Import .csv file

In [24]:
df = pd.read_csv('csv_files/brand_results.csv')

### Check format

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,Results
0,Alfa-Romeo,Showing 136 results\n
1,Aston-Martin,Showing 42 results\n
2,Audi,"Showing 2,837 results\n"
3,Austin,Error
4,Bentley,Showing 57 results\n


### Rename 'unnamed' column and set index 

In [4]:
df.rename(columns = {'Unnamed: 0':'Brand'}, inplace = True)

In [5]:
df.set_index('Brand', inplace=True)

### Create new column for listing count as a numerical field

In [6]:
df['Count'] = df['Results'].str[9:-9]

In [7]:
df['Count'] = df['Count'].str.replace(',','')

In [8]:
df['Count'] = df['Count'].replace('', np.nan, regex=True)

In [9]:
df['Count'] = df['Count'].astype(float)

### Sort dataframe column by listing count

In [18]:
df.sort_values(by=['Count'], ascending=False).head()

Unnamed: 0_level_0,Results,Count
Brand,Unnamed: 1_level_1,Unnamed: 2_level_1
Toyota,"Showing 15,228 results\n",15228.0
Nissan,"Showing 7,320 results\n",7320.0
Mazda,"Showing 6,755 results\n",6755.0
Ford,"Showing 4,603 results\n",4603.0
Mitsubishi,"Showing 4,244 results\n",4244.0


### Count total number of listings, and calculate percentage of total

In [19]:
df['Count'].sum()

78549.0

In [20]:
df['Percentage'] = df['Count'] / 78549 * 100

In [22]:
df.sort_values(by=['Count'], ascending=False).head()

Unnamed: 0_level_0,Results,Count,Percentage
Brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Toyota,"Showing 15,228 results\n",15228.0,19.386625
Nissan,"Showing 7,320 results\n",7320.0,9.319024
Mazda,"Showing 6,755 results\n",6755.0,8.599728
Ford,"Showing 4,603 results\n",4603.0,5.860036
Mitsubishi,"Showing 4,244 results\n",4244.0,5.402997


# Conclusion:

To produce statistcially signidficant results for this project, we need lots of homogenous data points (i.e. make/brand combinations) that repeat across the country.

Whilst it is possible for a manufacturer to produce a very high volumne of a small number of models, as the top 5 manufacturers make up nearly 50% of listings (48.5%), we can safely exclude the remaining brands entirely from future analysis. 

The most sold models would be expected to be included in this set, and should be representative of any national trends.