##SparkContext

Spark is a serious software. It takes more time to start up and running simpler computations might take longer than expected. 
That's because all the optimizations that Spark has under its hood are designed for complicated operations with big data sets. 
That means that for simple or small problems Spark may actually perform worse than some other solutions!

In [2]:
# Verify SparkContext
print(sc)

# Print Spark version
print(sc.version)

## SparkSessions
Creating multiple SparkSessions and SparkContexts can cause issues, so it's best practice to use the SparkSession.builder.getOrCreate() method. 
This returns an existing SparkSession if there's already one in the environment, or creates a new one if necessary!

In [4]:
# Import SparkSession from pyspark.sql
from pyspark.sql import SparkSession

# Create my_spark
my_spark = SparkSession.builder.getOrCreate()

# Print my_spark
print(my_spark)

## Viewing tables

After creating a SparkSession, it is possible to see what data exists in the cluster!
The SparkSession has an attribute called catalog which lists all the data inside the cluster. This attribute has a few methods for extracting different pieces of information.

One of the most useful is the .listTables() method, which returns the names of all the tables in the cluster as a list.

In [6]:
# Print the tables in the catalog
print(spark.catalog.listTables())