# Demo of Cars data

Flashback to the 1970s, when cars were big, heavy and used lots of gas. The Auto MPG sample data set is a collection of 398 automobile records from 1970 to 1982. It contains attributes like car name, MPG, number of cylinders, horsepower and weight.

We are going to use this data set to see how to load a CSV from the Object Storage and use it to make an analysis with some charts.

You can use this data to practice some useful analysis techniques and visualizations that you can then apply to your own data sets.

### This is a local preview of the CSV

![cars CSV](https://github.com/nachoad/WDP-Quick-Start/blob/master/imgs/cars-data-csv.png?raw=true)

## 1. Credentials for Object Storage

In [1]:
credentials = {
 # insert your credentials from the right DSX panel > "Find and add Data" > cars.csv > "Insert Credentials"
}

## 2. Function for loading CSV file from Object Storage

In [2]:
import requests, StringIO, json

def get_file_content(credentials):
    """For given credentials, this functions returns a StringIO object containg the file content 
    from the associated Bluemix Object Storage V3."""

    url1 = ''.join([credentials['auth_url'], '/v3/auth/tokens'])
    data = {'auth': {'identity': {'methods': ['password'],
            'password': {'user': {'name': credentials['username'],'domain': {'id': credentials['domain_id']},
            'password': credentials['password']}}}}}
    headers1 = {'Content-Type': 'application/json'}
    resp1 = requests.post(url=url1, data=json.dumps(data), headers=headers1)
    resp1_body = resp1.json()
    for e1 in resp1_body['token']['catalog']:
        if(e1['type']=='object-store'):
            for e2 in e1['endpoints']:
                if(e2['interface']=='public'and e2['region']==credentials['region']):
                    url2 = ''.join([e2['url'],'/', credentials['container'], '/', credentials['filename']])
    s_subject_token = resp1.headers['x-subject-token']
    headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'application/json'}
    resp2 = requests.get(url=url2, headers=headers2)
    return StringIO.StringIO(resp2.content)

## 3. The Data

We read the data into a pandas data frame. In this case we are grabbing some data that represents cars.
We read it in and call the brunel `use` method to ensure the names are usable

In [3]:
import pandas as pd
import brunel

cars = pd.read_csv(get_file_content(credentials))

cars.head(6)

Unnamed: 0,mpg,cylinders,engine,horsepower,weight,acceleration,year,origin,name
0,18,8,307,130,3504,12.0,70,American,chevrolet chevelle malibu
1,15,8,350,165,3693,11.5,70,American,buick skylark 320
2,18,8,318,150,3436,11.0,70,American,plymouth satellite
3,16,8,304,150,3433,12.0,70,American,amc rebel sst
4,17,8,302,140,3449,10.5,70,American,ford torino
5,15,8,429,198,4341,10.0,70,American,ford galaxie 500


## 4. Basics
We import the Brunel module and create a couple of simple scatterplots.
We use the brunel magic to do so

The basic format of each call to Brunel is simple; whether it is a single line or a set of lines (a cell magic),
they are concatenated together, and the result interprested as one command.

This command must start with an `ACTION`, but may have a set of options at the end specified as `ACTION :: OPTIONS`.

`ACTION` is the Brunel action string; `OPTIONS` are `key=value` pairs:
 * `data` defines the pandas dataframe to use. If not specified, the pandas data that best fits the action command will be used
 * `width` and `height` may be supplied to set the resulting size

For details on the Brunel Action languages, see the [Online Docs on Bluemix](http://kubriktrainer.stage1.mybluemix.net/KubrikTrainer/ActionDocs/)

In [4]:
%brunel data('cars') x(mpg) y(horsepower) color(origin) filter(horsepower)  :: width=800, height=300

Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"


<IPython.core.display.Javascript object>

In [5]:
%brunel data('cars') x(horsepower) y(weight) color(origin) tooltip(name) filter(year)   :: width=800, height=300

Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"


<IPython.core.display.Javascript object>

In [8]:
%brunel data('cars') edge yrange(origin, year) chord size(#count) color(origin) :: width=500, height=400

<IPython.core.display.Javascript object>

In [9]:
%brunel data('cars') treemap x(origin, year, cylinders) color(mpg) mean(mpg) size(#count) label(cylinders) tooltip(#all):: width=900, height=600

<IPython.core.display.Javascript object>

## 5. Using the Dataframe
Since Brunel uses the data frame, we can modify or add to that object to show data in different ways. In the following example we apply a function that takes a name and sees if it matches one of a set of sub-strings. We map this function to the car names to create a new column consisting of the names that match either "Ford" or "Buick", and use that in our Brunel action.

Because the Brunel action is long -- we are adding some CSS styling, we split it into two parts for convenience. 

In [10]:
def identify(x, search): 
    for y in search: 
        if y.lower() in x.lower(): return y
    return None

cars['Type'] = cars.name.map(lambda x: identify(x, ["Ford", "Buick"]))

In [11]:
%%brunel data('cars') x(engine) y(mpg) color(Type)  style('size:50%; fill:#eee') +
     x(engine) y(mpg) color(Type) text style('text {font-size:14; font-weight:bold; fill:darker}')
    :: width=800, height=800

<IPython.core.display.Javascript object>

### Data set reference
Data set reference
Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.