# Machine Learning with Spark ML

### In this notebook, we will explore machine learning using Spark ML. We will exploit Spark ML's high-level APIs built on top of DataFrames to create and tune machine learning pipelines. Spark ML Pipelines enable combining multiple algorithms into a single pipeline or workflow. We will utilize Spark ML's feature transformers to convert, modify and scale the features that will be used to develop the machine learning model. Finally, we will evaluate and cross validate our model to demonstrate the process of determining a best fit model and load the results in the database.

### We are using machine learning to try to predict records that a human has not seen or vetted before. We will use these predictions to sort the highest priority records for a human to look at. We will use as a training set for the algorithm fake data that has been vetted by an analyst as high, medium or low.¶

### We will use generated travel data that has been examined for patterns of Human Trafficking from DB2 Warehouse to do the machine learning.  We loaded this data in Lab 1.



## Table of contents

1. [Create Version](#version)
1. [Install Packages](#install)
1. [Connect to Database](#database)
1. [Transform the data](#transform)
1. [Feature Engineering](#engineering)
1. [Model the data](#model)
1. [Setup the Pipeline](#pipeline)
1. [Train the model](#train)
1. [Evaluate results](#evaluate)
1. [Hyperparameter Tuning](#tuning)
1. [Score the records](#score)
1. [Insert Credentials](#credentials)
1. [Write Results](#write)
1. [Create New Version](#version2)
1. [Schedule Job](#schedule)
1. [Revert to Version](#revert)
1. [Even More Help](#help)


<a id="version"></a>
## Create Version 

Save a version of the notebook by selecting <b>File</b> > <b>Save Version</b> 
<img alt="IBM Bluemix.Get started now" src="https://raw.githubusercontent.com/jpatter/LMCO/master/Lab-1/images/FileOptions.PNG" > or by selecting the <b>Versions</b> icon and selecting <b>Save Version</b>. <img alt="IBM Bluemix.Get started now" src="https://raw.githubusercontent.com/jpatter/LMCO/master/Lab-1/images/versions-button.png" ><br>
You can have up to ten (10) versions of a notebook.   Notebook versions are saved in a FIFO manner.

## Verify Spark version and existence of Spark

In [1]:
print('The spark version is {}.'.format(spark.version))

The spark version is 2.1.2.


<a id="install"></a>
## Install pixiedust.  With this package we can do some nice visualizations.

In [2]:
!pip install --trusted-host pypi.python.org JayDeBeApi==0.2.0 --user
!pip install --trusted-host pypi.python.org --user --upgrade ibmdbpy
!pip install --trusted-host pypi.python.org --user --upgrade pixiedust

Requirement already up-to-date: ibmdbpy in /usr/local/src/conda3_runtime.v32/home/envs/DSX-Python35-Spark/lib/python3.5/site-packages
Requirement already up-to-date: future in /usr/local/src/conda3_runtime.v32/home/envs/DSX-Python35-Spark/lib/python3.5/site-packages (from ibmdbpy)
Requirement already up-to-date: numpy in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sf38-b8a2c1a38b5911-cd6445c4de1b/.local/lib/python3.5/site-packages (from ibmdbpy)
Requirement already up-to-date: lazy in /usr/local/src/conda3_runtime.v32/home/envs/DSX-Python35-Spark/lib/python3.5/site-packages (from ibmdbpy)
Requirement already up-to-date: pandas in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sf38-b8a2c1a38b5911-cd6445c4de1b/.local/lib/python3.5/site-packages (from ibmdbpy)
Collecting pypyodbc (from ibmdbpy)
Requirement already up-to-date: six in /usr/local/src/conda3_runtime.v32/home/envs/DSX-Python35-Spark/lib/python3.5/site-packages (from ibmdbpy)
Requirement already up-to-date: python-dateutil>=2 

## Import the required libraries

In [3]:
# Imports for DB2 Warehouse
import jaydebeapi
from ibmdbpy import IdaDataBase
from ibmdbpy import IdaDataFrame

#Imports for Spark
from pyspark.ml.feature import StringIndexer, IndexToString
from pyspark.ml.feature import Bucketizer
from pyspark.mllib.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
from pyspark.ml import Pipeline
from pyspark.ml.feature import Normalizer
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator
from pyspark.ml.classification import NaiveBayes, DecisionTreeClassifier
from pyspark.sql.functions import year
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator

# Imports for pixiedust
from pixiedust.display import *

Pixiedust database opened successfully


<a id="database"></a>
## Connect to the database and read in our data

Select the <b>Find and Add Data</b> icon <br>
<img alt="IBM Bluemix.Get started now" src="https://raw.githubusercontent.com/jpatter/LMCO/master/Lab-1/images/connections-button.png" >

Select the <b>Connections</b> view and then <b>Insert to code</b>.

<img alt="IBM Bluemix.Get started now" src="https://raw.githubusercontent.com/jpatter/LMCO/master/Lab-1/images/InsertToCode.PNG" >

Select <b>Insert SparkSession DataFrame</b> and select the schema (will start with DASH but will likely NOT be the same value you see in the image) and table (should only be one). Then select <b>Insert Code</b>.

<img alt="IBM Bluemix.Get started now" src="https://raw.githubusercontent.com/jpatter/LMCO/master/Lab-1/images/InsertCode.PNG" ><br>
Rename the result to <b>trafficking_df</b> to ensure compliance with the following cells.



In [4]:
# enter this value 
table_postfix = ""

In [5]:
# Insert SparkSession DataFrame here
# make CERTAIN to rename to trafficking_df

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

# @hidden_cell
# The following code is used to access your data and contains your credentials.
# You might want to remove those credentials before you share your notebook.

db2_properties = {
    'driver': 'com.ibm.db2.jcc.DB2Driver',
    'jdbcurl': 'jdbc:db2://awh-yp-small03.services.dal.bluemix.net:50000/BLUDB',
    'user': 'dash100756',
    'password': '4uc8Ud_BA_dJ'
}

if not table_postfix:
    print("You MUST enter a value for table_postfix to proceed.   It should be the same value you entered in Lab-1")
else:
    table_name = db2_properties['user'] + "." + "FEMALE_HUMAN_TRAFFICKING_" + table_postfix
    trafficking_df = spark.read.jdbc(db2_properties['jdbcurl'], table_name, properties=db2_properties)
    trafficking_df.head()

<a id="transform"></a>
## Identify our labels and transform 

We will use the 'VETTING_LEVEL' column as a label for training the machine learning model.  This is where our analyst has marked the data as vetted.  

Spark ML requires that that the labels are data type Double, so we will cast the  column as Double (it was inferred as Integer when read into Spark).

withColumn() is a Spark SQL way to manipulate a dataframe.  Since an RDD is immutable, we create a new RDD each time we transform.  This code creates a new column VettingTemp and sets it to the values in "VETTING_LEVEL" cast to a Double.    It then drops column VETTING_LEVEL and renames column VettingTemp to VETTING_LEVEL.

In [6]:
DataWithLabels = (trafficking_df.withColumn("VettingTemp", trafficking_df["VETTING_LEVEL"]
    .cast("Double")).drop("VETTING_LEVEL").withColumnRenamed("VettingTemp", "VETTING_LEVEL"))

We want to use year of birth intead of date of birth in our learning.  

Another way to transform an rdd in Spark is using SQL Syntax.  Here, we will be adding a new field, BIRTH_YEAR to our vetting set.  We will also just select the fields we need.

In [7]:
DataWithLabels.createOrReplaceTempView("VettingData")
AllVettingData = sqlContext.sql ("SELECT UUID, VETTING_LEVEL, NAME, OCCUPATION, COUNTRIES_VISITED_COUNT, PASSPORT_COUNTRY_CODE, GENDER, year(BIRTH_DATE) as BIRTH_YEAR, 1 as Counter FROM VettingData")
FilteredVettingData = AllVettingData.filter("VETTING_LEVEL==100")

FilteredVettingData.count()

907

Use pixiedust to visually explore the data.

In [8]:
display(AllVettingData)

Now, let's look at the data we have:

VETTING_LEVEL is in four different statuses:

    10 - HIGH
    
    20 - MEDIUM
    
    30 - LOW
    
    100 - Unlabeled


Print the total number of vetting statuses 

In [9]:
print('The number of rows labeled high is {}.'.format(AllVettingData.filter(AllVettingData['VETTING_LEVEL'] == 10).count()))
print('The number of rows labeled medium is {}.'.format(AllVettingData.filter(AllVettingData['VETTING_LEVEL'] == 20).count()))
print('The number of rows labeled low is {}.'.format(AllVettingData.filter(AllVettingData['VETTING_LEVEL'] == 30).count()))
print('The number of unlabeled rows is {}.'.format(AllVettingData.filter(AllVettingData['VETTING_LEVEL'] == 100).count()))

The number of rows labeled high is 42.
The number of rows labeled medium is 40.
The number of rows labeled low is 96.
The number of unlabeled rows is 907.


The majority of the data has not been labeled (VETTING_LABEL=100 means unvetted).  We can not use it for our training data, so filter it out.
Print the total number of rows.

In [10]:
LabeledVettingData=AllVettingData.filter("VETTING_LEVEL != 100")
LabeledVettingData.count()

178

<a id="engineering"></a>
## Feature Engineering.
### A feature is the elements of the data that we are using in our learning.  We need to transform each one of our features into a format that SparkML can use it.
More about the choices for feature engineering can be found here:
http://spark.apache.org/docs/2.0.0/ml-features.html#stringindexer


The first thing we will do is transform our labels (VETTING_LEVEL) into a format that we can use in the algorithm, and then get back to 'human readable' from in the end. The ML models require that the labels are in a column called 'label'.    The converter helps us transform these back in the end.



In [11]:
labelIndexer = StringIndexer(inputCol="VETTING_LEVEL", outputCol="label", handleInvalid="error")
labelModel = labelIndexer.fit(LabeledVettingData)
converter = IndexToString(inputCol="prediction", outputCol="predCategory", labels=labelModel.labels)

Next, we will process all of the features we will use. While there are a variety of choices for transforming elements, we will treat each as a String using the StringIndexer.

StringIndexer is a transformer that encodes a string column to a column of indices. The indices are ordered by value frequencies, so the most frequent value gets index 0. If the input column is numeric, it is cast to string first.

For our vetting dataset, we are interested in all string based features so we will use the StringIndexer for them.  We need to use 'handleInvalid="skip"' because not all values have been validated in our vetting set.  That means the algorithms will skip these records.

In [12]:
occupationIndexer = StringIndexer(inputCol="OCCUPATION", outputCol="occupationIndex", handleInvalid="skip")
countryIndexer = StringIndexer(inputCol="PASSPORT_COUNTRY_CODE", outputCol="countryIndex", handleInvalid="skip")
genderIndexer = StringIndexer(inputCol="GENDER", outputCol="genderIndex", handleInvalid="skip")
yearOfBirthIndexer = StringIndexer(inputCol="BIRTH_YEAR", outputCol="birthYearIndex", handleInvalid="skip")

Now, put all of our features into a simple array using a VectorAssembler.

Note that COUNTRIIES_VISITED_COUNT is already a numeric, so we can just put that in the array as is.


In [13]:
vecAssembler = VectorAssembler(inputCols=["occupationIndex","countryIndex","genderIndex", "birthYearIndex", "COUNTRIES_VISITED_COUNT"], outputCol="features")

Normalizer will help us normalize the features into a standard frmat.  It can help us improve the behavior of the learning algorithms.


In [14]:
normalizer = Normalizer(inputCol="features", outputCol="normFeatures", p=1.0)

<a id="model"></a>
## Declare the model that we want to use

The model here is Naive Bayes.  It will output each prediction into a 'prediction' column.  Naive Bayes  is a probabistic model that learns based on previous decisions.  We will take a best guess at the paramater 'smoothing'- SparkML will help us tune it later!



In [15]:
nb = NaiveBayes(smoothing=1.0, modelType="multinomial", labelCol="label", predictionCol="prediction")

<a id="pipeline"></a>
## Setup the Pipeline

The pipeline is the guts of the algorithm that strings all the work we've done together.

The stages are run in order and the input DataFrame is transformed as it passes through each stage.   First, comes the feature transformations, then the assembler to put them togather into one DF.  We pass that into the model. 

In machine learning, it is common to run a sequence of algorithms to process and learn from data, so this can get as complex as we want to make it!

In [16]:
pipeline = Pipeline(stages=[labelIndexer,occupationIndexer,countryIndexer, genderIndexer, yearOfBirthIndexer, vecAssembler, normalizer, nb, converter])

<a id="train"></a>
## Train the model

We will split it into training data which is marked and test data which will be used to test the efficiency of the algorithms.

It is common to split the split up the data randomly into 70% for training and 30% for testing.  If we were to use a bigger training set, we might use an 80% / 20% split.

In [17]:
train, test = LabeledVettingData.randomSplit([70.0,30.0], seed=1)
train.cache()
test.cache()
print('The number of records in the training data set is {}.'.format(train.count()))
print('The number of rows labeled high is {}.'.format(train.filter(train['VETTING_LEVEL'] == 10).count()))
print('The number of rows labeled medium is {}.'.format(train.filter(train['VETTING_LEVEL'] == 20).count()))
print('The number of rows labeled low is {}.'.format(train.filter(train['VETTING_LEVEL'] == 30).count()))
print('')

print('The number of records in the test data set is {}.'.format(test.count()))
print('The number of rows labeled high is {}.'.format(test.filter(test['VETTING_LEVEL'] == 10).count()))
print('The number of rows labeled medium is {}.'.format(test.filter(test['VETTING_LEVEL'] == 20).count()))
print('The number of rows labeled low is {}.'.format(test.filter(test['VETTING_LEVEL'] == 30).count()))

The number of records in the training data set is 129.
The number of rows labeled high is 31.
The number of rows labeled medium is 34.
The number of rows labeled low is 64.

The number of records in the test data set is 49.
The number of rows labeled high is 11.
The number of rows labeled medium is 6.
The number of rows labeled low is 32.


 Fit the pipeline to the training data.  This will run the data through the algorithm to train it based on our labled data.
 
<div class="panel-group" id="accordion-3">
  <div class="panel panel-default">
    <div class="panel-heading">
      <h4 class="panel-title">
        <a data-toggle="collapse" data-parent="#accordion-3" href="#collapse-3">
        Solution</a>
      </h4>
    </div>
    <div id="collapse-3" class="panel-collapse collapse">
      <div class="panel-body">Type (or copy) the following in the cell below: <br>
          model = pipeline.fit(train)<br>
      </div>
    </div>
  </div>

In [18]:
# Fit the pipeline to the training data assigning the result to a variable called 'model'.
model = pipeline.

Make predictions on documents in the Test data set.  This will test the model based on the 30% data we have left in reserve.  Keep in mind that the model has not seen the data in the test data set.

<div class="panel-group" id="accordion-4">
  <div class="panel panel-default">
    <div class="panel-heading">
      <h4 class="panel-title">
        <a data-toggle="collapse" data-parent="#accordion-4" href="#collapse-4">
        Solution</a>
      </h4>
    </div>
    <div id="collapse-4" class="panel-collapse collapse">
      <div class="panel-body">Type (or copy) the following in the cell below: <br>
          predictions = model.transform(test)<br>
      </div>
    </div>
  </div>

In [19]:
# Make predictions on the test data assigning the result to a variable called 'predictions'.
predictions = model.

<a id="evaluate"></a>
## Show and Evaluate Results

Note that we only got a small sample of the results back because we have a very small amount of training data. 

In [20]:
predictions.count()

8

SparkML has automated ways to look at result quality called Evaluators.  More information can be found here:
http://spark.apache.org/docs/latest/mllib-evaluation-metrics.html

For simplicity here, we will use a a common evaluation method called Reciever Operator Characteristic.  This genenerally is used for binary classifiers, but we will use it because we only have 3 levels of prediction.

The curve is created by plotting the true positive rate against the false positive rate at various threshold settings. The ROC curve is thus the sensitivity as a function of fall-out. The area under the ROC curve is useful for comparing and selecting the best machine learning model for a given data set. A model with an area under the ROC curve score near 1 has very good performance. A model with a score near 0.5 is about as good as flipping a coin.

In [21]:
evaluator = BinaryClassificationEvaluator().setLabelCol("label").setMetricName("areaUnderROC")
print('Area under the ROC curve = {}.'.format(evaluator.evaluate(predictions)))

Area under the ROC curve = 0.6875.


<a id="tuning"></a>
## Automatic Algorithm Tuning - Also Called  Hyperparameter Tuning


Spark ML algorithms provide many hyperparameters for tuning models. These hyperparameters are distinct from the model parameters being optimized by Spark ML itself.  Hyperparameter tuning is accomplished by choosing the best set of parameters based on model performance on test data that the model was not trained with. All combinations of hyperparameters specified will be tried in order to find the one that leads to the model with the best evaluation result.


First we will build a paramater grid to tell SparkML what to change in its testing.  Note that we are changing all the paramaters we setup in our pipeline before - the 'smoothing' in our model, and the normalizer parameter.

In [22]:
paramGrid = (ParamGridBuilder().addGrid(nb.smoothing, [0.25, 0.5, 0.75])
                 .addGrid(normalizer.p, [1.0, 2.0]).build())

Now, create a cross validator to tune the pipeline with the generated parameter grid.  Cross-validation attempts to fit the underlying estimator with user-specified combinations of parameters, cross-evaluate the fitted models, and output the best one.  

In [23]:
cv = CrossValidator().setEstimator(pipeline).setEvaluator(evaluator).setEstimatorParamMaps(paramGrid).setNumFolds(10)

Next, we will run the models through the grid we set above.  It runs Cross-evaluate the ML Pipeline to find the best model.  Note that since runs the model several times, it takes a few minutes to run.

In [24]:
cvModel = cv.fit(train)
print('Area under the ROC curve for best fitted model = {}.'.format(evaluator.evaluate(cvModel.transform(test))))

Area under the ROC curve for best fitted model = 0.75.


Let's see what improvement we achieve by tuning the hyperparameters using cross-evaluation 

In [25]:
print('Area under the ROC curve for non-tuned model = {}.'.format(evaluator.evaluate(predictions)))
print('Area under the ROC curve for best fitted model = {}.'.format(evaluator.evaluate(cvModel.transform(test))))
print('Improvement = {0:0.2f}%'.format((evaluator.evaluate(cvModel.transform(test)) - evaluator.evaluate(predictions)) *100 / evaluator.evaluate(predictions)))

Area under the ROC curve for non-tuned model = 0.6875.
Area under the ROC curve for best fitted model = 0.75.
Improvement = 9.09%


We did a bit better with the new params!  Let's use "cvModel" instead of "model" below, because SparkML told us it was the best result.

<a id="score"></a>
## Score the remaining records that were unscored, and load them into a new table in the database.

First, we want to only get the unvetted records.

In [26]:
NewVettingData=AllVettingData.filter("VETTING_LEVEL == 100")

Next, transform the new model with the new vetting records

In [27]:
newPreds = cvModel.transform(NewVettingData)

 Show the data we have predicted and some of the fields in the data.  

In [28]:
newPreds.select("UUID", "prediction", "predCategory", "probability", "NAME", "GENDER", "COUNTRIES_VISITED_COUNT", "PASSPORT_COUNTRY_CODE" ).show()

+--------------------+----------+------------+--------------------+--------------------+------+-----------------------+---------------------+
|                UUID|prediction|predCategory|         probability|                NAME|GENDER|COUNTRIES_VISITED_COUNT|PASSPORT_COUNTRY_CODE|
+--------------------+----------+------------+--------------------+--------------------+------+-----------------------+---------------------+
|5fcbbf15-8268-430...|       0.0|        30.0|[0.63905924852249...|Stacey Courtney G...|     F|                      1|                   GH|
|db60dea1-442a-414...|       1.0|        10.0|[0.04395975536112...|     Sandry Santiago|     F|                      6|                   GH|
|dd39cdbe-84f4-4a6...|       2.0|        20.0|[0.00195414590268...|          Debbie Kim|     F|                      4|                   GH|
|652f9d66-c58f-49e...|       0.0|        30.0|[0.97219543757291...|  Sarah Kimme Miller|     F|                      5|                   GH|
|52da6

Remember that VETTING_LEVEL is in three different statuses:


10- HIGH

20- MEDIUM

30 - LOW


Let's print the total number of vetting statuses that we predicted.  The actual predicted data is low because we only have a few vetted records.  Remember that we had to 'skip' and features that were not in our trained data, so if we didn't have someone who was born in a certain year in our training data, we won't be able to predict a result.

In [29]:
print('The number of records in the unvetted data set is {}.'.format(newPreds.count()))
print('The number of rows labeled high is {}.'.format(newPreds.filter(newPreds['predCategory'] == 10).count()))
print('The number of rows labeled medium is {}.'.format(newPreds.filter(newPreds['predCategory'] == 20).count()))
print('The number of rows labeled low is {}.'.format(newPreds.filter(newPreds['predCategory'] == 30).count()))

The number of records in the unvetted data set is 117.
The number of rows labeled high is 47.
The number of rows labeled medium is 33.
The number of rows labeled low is 37.


<a id="write"></a>
## Write Results
Now, downselect all the values we need to join in our next lab to display the results, and write to the database.  We will only load the unique ID and the prediction into our new table in DB2 Warehouse.  We'll call the table "FEMALE_HUMAN_TRAFFICKING_< YOUR POSTFIX VALUE >_ML_RESULTS"

In [30]:
valuesToWrite= newPreds.select("UUID",  "predCategory")
valuesToWrite.write.jdbc(db2_properties['jdbcurl'], table_name + "_ML_RESULTS",
                         properties = {"user" : db2_properties["user"], "password" : db2_properties["password"]},
                         mode="overwrite")

<a id="version2"></a>
## Create Version 

Save a new version of the notebook by selecting <b>File</b> > <b>Save Version</b> 
<img alt="IBM Bluemix.Get started now" src="https://raw.githubusercontent.com/jpatter/LMCO/master/Lab-1/images/FileOptions.PNG" > or by selecting the <b>Versions</b> icon and selecting <b>Save Version</b>. <img alt="IBM Bluemix.Get started now" src="https://raw.githubusercontent.com/jpatter/LMCO/master/Lab-1/images/versions-button.png" ><br>
You can have up to ten (10) versions of a notebook.   Notebook versions are saved in a FIFO manner.

<a id="schedule"></a>
## Schedule Job
You can schedule a notebook version to run at specified intervals.   If a notebook version does not yet exist, one will be created for you.  If the notebook kernel was stopped when scheduled to run, it will be started.

To schedule a notebook, select the <b>Schedule</b> icon.

<img alt="IBM Bluemix.Get started now" src="https://raw.githubusercontent.com/jpatter/LMCO/master/Lab-1/images/schedule-button.png" >

Give a name to the job and pick the time period to run it.   All time periods are for the timezone of the <b>browser</b> NOT the timezone of the server where the notebook is running.

<img alt="IBM Bluemix.Get started now" src="https://raw.githubusercontent.com/jpatter/LMCO/master/Lab-1/images/Schedule-Window.PNG" >

## Download notebook

Notebooks can be downloaded in notebook (.ipynb), Python (.py), HTML (.html), markdown (.md) or reST (.rst) format.  Use <b>File</b> > <b>Download as</b> to download the notebook in any of the formats.

<img alt="IBM Bluemix.Get started now" src="https://raw.githubusercontent.com/jpatter/LMCO/master/Lab-1/images/FileOptions.PNG" >

<a id="revert"></a>
## Revert to version 
Revert to the version you saved at the beginning of this lab.   There are two ways to do this.   First, select <b>File</b> > <b>Revert to Version</b> and choose the version you created at the beginning of the lab (versions are timestamped).
<img alt="IBM Bluemix.Get started now" src="https://raw.githubusercontent.com/jpatter/LMCO/master/Lab-1/images/FileOptions.PNG" >

The second way is to select the <b>Versions</b> icon 
<img alt="IBM Bluemix.Get started now" src="https://raw.githubusercontent.com/jpatter/LMCO/master/Lab-1/images/versions-button.png" ><br>
and then select the version you wish to revert to.   You can also delete versions from here.
<img alt="IBM Bluemix.Get started now" src="https://raw.githubusercontent.com/jpatter/LMCO/master/Lab-1/images/Versions.PNG" >

<a id="help"></a>
## Even more help

Select the <b>Find Resources in the Community</b> link to display a search bar, documentation hotlinks, and a link to Stack Overflow's Data Science Experience section.

<img alt="IBM Bluemix.Get started now" src="https://raw.githubusercontent.com/jpatter/LMCO/master/Lab-1/images/community-button.png" >

<img alt="IBM Bluemix.Get started now" src="https://raw.githubusercontent.com/jpatter/LMCO/master/Lab-1/images/Community-Resources.PNG" >