## Just playing around with the data

Lesson learned: 
* sqlite3 is super limited. If I were to redo this I'd save it all in a MySQL or pSQL database tp have access to more core functionalities.
* plotly is nice for graphs, however it is hard to show notebooks that uses it in github. One must use nbviewer.

In [33]:
import numpy as np
import pandas as pd
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.tools import FigureFactory as FF 
from IPython.display import display, display_pretty, Javascript, HTML
import qgrid 
qgrid.nbinstall(overwrite=True)

In [8]:
%load_ext sql
%config SqlMagic.autopandas=True
%matplotlib inline

# %qtconsole

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [6]:
%sql sqlite:///data/sales.db

u'Connected: None@data/sales.db'

### Some usefule SQL queries

_Note that sqlite3 is quite limited as a sql db. e.g. it doesn't support variables and a lot of functions (mean, std..) offered by other db (MySQL, postgreSQL, etc.). So queries below are simple, or I use python to access more complex information.

#### Get unique entries for all dates (either new, or sold cars)
```SQL
select count(line_id) ss, min(line_id), * 
from salesVR 
group by year, make, model, trim, mileage 
having ss = 1 
order by make, model, trim, year;
```

#### Get unique entries for all dates (either new, or sold cars)
```SQL
select date, count(ss) from
(select count(line_id) ss, min(line_id), * 
from salesVR 
group by year, make, model, trim, mileage 
having ss = 1) A
group by date
;
```
        * 2016-09-12  260 (cars sold in 3 days..?)
        * 2016-09-15  250 (new cars added to catalog in 3 days?)
   

## See what cars have been sold and added each days

I'm assuming, perhaps simplistically, that cars taken of of the catalog have been sold. 

## Plot Brand average cost 
First get list of individual cars, then average per make. Most cars will show up every day because they haven't been sold. The price for some of these cars are changing so I first take the average over the 4 days for each cars.

To find unique cars I group by year, make, model, trim, mileage. This assumes that mileage doesn't change but that price can. 

_ Values for Trader 1 are very biased towards more expensive cars as the search only outputs the top 1000 cars, sorted from most expensice to least, out of ~14000 cars._ **So this is just a quick demonstration, not an in-depth analysis.**

### VR
```SQL
select make, count(make) num_cars, sum(price)/count(price) ave_price, 
        max(price) max_price, min(price) min_price
from 
(select make, sum(price)/count(price) price from salesVR
group by year, make, model, trim, mileage) DAY
group by make
order by ave_price;
```

### Trader 1
```SQL
select make, sum(AVE)/count(AVE) ave_price from
(select count(make) CC, make, type, mileage, sum(price)/count(price) AVE
from sales
group by
make, type, mileage having mileage not like 'Null') IND
group by make
order by ave_price;
```

In [27]:
make_ave_price_VR = %sql select make, count(make) num_cars, sum(price)/count(price) ave_price from (select make, sum(price)/count(price) price from salesVR group by year, make, model, trim, mileage) DAY group by make order by ave_price;
make_ave_price_T1 = %sql select make, sum(AVE)/count(AVE) ave_price from (select count(make) CC, make, type, mileage, sum(price)/count(price) AVE from sales group by make, type, mileage having mileage not like 'Null') IND group by make order by ave_price;

#Plot VR
data = [go.Bar(
            x=make_ave_price_VR["make"],
            y=make_ave_price_VR["ave_price"]
    )]

layout = go.Layout(
    title = 'VR - Average Car Price per Make (US$)')

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='VR_average_make')

Done.
Done.


In [25]:
#Plot T1
data = [go.Bar(
            x=make_ave_price_T1["make"],
            y=make_ave_price_T1["ave_price"]
    )]

layout = go.Layout(
    title = 'Trader 1 - Average Car Price per Make (US$)')

fig2 = go.Figure(data=data, layout=layout)
py.iplot(fig2, filename='VR_average_make')

### Compare average prices for all brands
Use temp tables from above. Also sqlite doesn't do full outer joins, so we have to get _**creative**_.

```SQL
select coalesce(makeVR,makeT1) make , price_VR, price_T1 from
(select VR.make makeVR, T1.make makeT1, VR.ave_price_VR price_VR, T1.ave_price_T1 price_T1
from
(select make, sum(price)/count(price) ave_price_VR
from 
(select make, sum(price)/count(price) price from salesVR
group by year, make, model, trim, mileage) DAY
group by make) VR
left join
(select make, sum(AVE)/count(AVE) ave_price_T1 from
(select count(make) CC, make, type, mileage, sum(price)/count(price) AVE
from sales
group by
make, type, mileage having mileage not like 'Null') IND
group by make) T1
on Vr.make = T1.make
UNION ALL
select VRb.make, T1b.make, VRb.ave_price_VR, T1b.ave_price_T1
from
(select make, sum(AVE)/count(AVE) ave_price_T1 from
(select count(make) CC, make, type, mileage, sum(price)/count(price) AVE
from sales
group by
make, type, mileage having mileage not like 'Null') IND
group by make) T1b
left join
(select make, sum(price)/count(price) ave_price_VR
from 
(select make, sum(price)/count(price) price from salesVR
group by year, make, model, trim, mileage) DAY
group by make) VRb
on VRb.make = T1b.make
where VRb.make IS NULL)
order by make
;
```

In [37]:
compare_prices = %sql select coalesce(makeVR,makeT1) make , price_VR, price_T1 from (select VR.make makeVR, T1.make makeT1, VR.ave_price_VR price_VR, T1.ave_price_T1 price_T1 from (select make, sum(price)/count(price) ave_price_VR from  (select make, sum(price)/count(price) price from salesVR group by year, make, model, trim, mileage) DAY group by make) VR left join (select make, sum(AVE)/count(AVE) ave_price_T1 from (select count(make) CC, make, type, mileage, sum(price)/count(price) AVE from sales group by make, type, mileage having mileage not like 'Null') IND group by make) T1 on Vr.make = T1.make UNION ALL select VRb.make, T1b.make, VRb.ave_price_VR, T1b.ave_price_T1 from (select make, sum(AVE)/count(AVE) ave_price_T1 from (select count(make) CC, make, type, mileage, sum(price)/count(price) AVE from sales group by make, type, mileage having mileage not like 'Null') IND group by make) T1b left join (select make, sum(price)/count(price) ave_price_VR from  (select make, sum(price)/count(price) price from salesVR group by year, make, model, trim, mileage) DAY group by make) VRb on VRb.make = T1b.make where VRb.make IS NULL) order by make;
# qgrid.show_grid(compare_prices)
compare_prices.set_index("make", drop=True,inplace = True)
table = FF.create_table(compare_prices, index=True, index_title = 'Make')
py.iplot(table, filename='linked_table')

Done.
