# Logistic Regression

Let's see an example of how to run a logistic regression with Python and Spark! This is documentation example, we will quickly run through this and then show a more realistic example, afterwards, you will have another consulting project!

In [2]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('logregdoc').getOrCreate()

In [3]:
from pyspark.ml.classification import LogisticRegression

In [4]:
# Load data
data = spark.read.format("libsvm").load("/FileStore/tables/sample_libsvm_data.txt")

lr = LogisticRegression()

# Holdout sample
train_data, test_data = data.randomSplit([0.7,0.3])

# Fit the model
lrModel = lr.fit(train_data)

dataSummary = lrModel.summary

In [5]:
data.groupBy('label').count().show()

In [6]:
data.show(5)

In [7]:
dataSummary.predictions.show(5)

In [8]:
# Usually would do this on a separate test set!
predictionAndLabels = lrModel.evaluate(test_data)

In [9]:
predictionAndLabels.predictions.show(5)

In [10]:
predictionAndLabels = predictionAndLabels.predictions.select('label','prediction')

In [11]:
predictionAndLabels.show(5)

## Evaluators

Evaluators will be a very important part of our pipline when working with Machine Learning, let's see some basics for Logistic Regression, useful links:

https://spark.apache.org/docs/latest/api/python/pyspark.ml.html#pyspark.ml.evaluation.BinaryClassificationEvaluator

https://spark.apache.org/docs/latest/api/python/pyspark.ml.html#pyspark.ml.evaluation.MulticlassClassificationEvaluator

In [13]:
from pyspark.ml.evaluation import BinaryClassificationEvaluator,MulticlassClassificationEvaluator

In [14]:
evaluator = BinaryClassificationEvaluator(rawPredictionCol='prediction', labelCol='label')

In [15]:
# For multiclass
evaluator = MulticlassClassificationEvaluator(predictionCol='prediction', labelCol='label',
                                             metricName='accuracy')

In [16]:
acc = evaluator.evaluate(predictionAndLabels)

In [17]:
acc

Okay let's move on see some more examples!