# HTML display of Spark `DataFrame`

Load a DataFrame from a local `csv` file, and display it as readable HTML.


## Imports 

(But first turn off overly verbose Spark logging.)

In [None]:
import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.OFF)

import $ivy.`org.apache.spark::spark-sql:2.4.0` // Or use any other 2.x version here

import org.apache.spark.sql._





Get a Spark context:

In [None]:
val spark = {
  NotebookSparkSession.builder()
    .master("local[*]")
    .getOrCreate()
}

def sc = spark.sparkContext

## Define HTML display function

This solution is lifted directly from a comment in this issue:

https://github.com/almond-sh/almond/issues/180

Define an implicit class to display DataFrames as HTML:

In [None]:
implicit class RichDF(val ds:DataFrame) {
    def showHTML(limit:Int = 20, truncate: Int = 20) = {
        import xml.Utility.escape
        val data = ds.take(limit)
        val header = ds.schema.fieldNames.toSeq        
        val rows: Seq[Seq[String]] = data.map { row =>
          row.toSeq.map { cell =>
            val str = cell match {
              case null => "null"
              case binary: Array[Byte] => binary.map("%02X".format(_)).mkString("[", " ", "]")
              case array: Array[_] => array.mkString("[", ", ", "]")
              case seq: Seq[_] => seq.mkString("[", ", ", "]")
              case _ => cell.toString
            }
            if (truncate > 0 && str.length > truncate) {
              // do not show ellipses for strings shorter than 4 characters.
              if (truncate < 4) str.substring(0, truncate)
              else str.substring(0, truncate - 3) + "..."
            } else {
              str
            }
          }: Seq[String]
        }

        publish.html(s""" <table>
                <tr>
                 ${header.map(h => s"<th>${escape(h)}</th>").mkString}
                </tr>
                ${rows.map { row =>
                  s"<tr>${row.map{c => s"<td>${escape(c)}</td>" }.mkString}</tr>"
                }.mkString}
            </table>
        """)        
    }
}

## Load and display a `DataFrame`

Load data from a CSV file into a Spark `DataFrame`:

In [None]:
val localCsvFile = "train.csv"

val trainingSet =  spark.read.format("csv").option("inferSchema", "true").option("header", "true").load(localCsvFile)



In [None]:
trainingSet.showHTML()