## Installing and Importing the PySpark Library

To install the PySpark library:

In [None]:
!pip install pyspark

To import the PySpark library:

In [1]:
import pyspark

## Starting a Spark Session

importing SparkSession

In [2]:
from pyspark.sql import SparkSession

Creating a SparkSession instance

In [3]:
spark_instance = SparkSession.builder.appName('Practice').getOrCreate()

Checking Spark instance

In [3]:
spark_instance

## Read a dataset with PySpark

By default, the header is false. if we use the code `df_practice = spark_instance.read.csv("FilmLocations.csv")`, the columns will be called `c0`, `c1`, etc...

By default `inferSchema=False`, this means that all columns will be read as string. When `inferSchema=True`, then the column type will be automatically inferred by PySpark read function.

In [4]:
df_practice = spark_instance.read.csv("../datasets/FilmLocations.csv", header=True, inferSchema=True)
# An equivalent code will be
# df_practice = spark_instance.read.option('header', 'true').csv("FilmLocations.csv", inferSchema=True)

We can use the `show(n)` method to display the dataframe. `n` is `20` by default.

In [23]:
df_practice.show()

+--------------------+-----------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------+--------------------+--------------------+
|               Title|ReleaseYear|           Locations|            FunFacts|   ProductionCompany|         Distributor|            Director|              Writer|        Actor1|              Actor2|              Actor3|
+--------------------+-----------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------+--------------------+--------------------+
|                 180|       2011|Epic Roasthouse (...|                NULL|         SPI Cinemas|                NULL|            Jayendra|Umarji Anuradha, ...|      Siddarth|        Nithya Menon|         Priya Anand|
|                 180|       2011|Mason & Californi...|                NULL|         SPI Cinemas|                NULL|          

Or, the `head(n)` method to display the first `n` rows of the dataframe

In [24]:
df_practice.head(5)

[Row(Title='180', ReleaseYear='2011', Locations='Epic Roasthouse (399 Embarcadero)', FunFacts=None, ProductionCompany='SPI Cinemas', Distributor=None, Director='Jayendra', Writer='Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba ', Actor1='Siddarth', Actor2='Nithya Menon', Actor3='Priya Anand'),
 Row(Title='180', ReleaseYear='2011', Locations='Mason & California Streets (Nob Hill)', FunFacts=None, ProductionCompany='SPI Cinemas', Distributor=None, Director='Jayendra', Writer='Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba ', Actor1='Siddarth', Actor2='Nithya Menon', Actor3='Priya Anand'),
 Row(Title='180', ReleaseYear='2011', Locations='Justin Herman Plaza', FunFacts=None, ProductionCompany='SPI Cinemas', Distributor=None, Director='Jayendra', Writer='Umarji Anuradha, Jayendra, Aarthi Sriram, & Suba ', Actor1='Siddarth', Actor2='Nithya Menon', Actor3='Priya Anand'),
 Row(Title='180', ReleaseYear='2011', Locations='200 block Market Street', FunFacts=None, ProductionCompany='SPI Cinema

To get the column names,

In [27]:
df_practice.columns

['Title',
 'ReleaseYear',
 'Locations',
 'FunFacts',
 'ProductionCompany',
 'Distributor',
 'Director',
 'Writer',
 'Actor1',
 'Actor2',
 'Actor3']

If we want to see the DB schema, then we can use:

In [26]:
df_practice.printSchema()

root
 |-- Title: string (nullable = true)
 |-- ReleaseYear: string (nullable = true)
 |-- Locations: string (nullable = true)
 |-- FunFacts: string (nullable = true)
 |-- ProductionCompany: string (nullable = true)
 |-- Distributor: string (nullable = true)
 |-- Director: string (nullable = true)
 |-- Writer: string (nullable = true)
 |-- Actor1: string (nullable = true)
 |-- Actor2: string (nullable = true)
 |-- Actor3: string (nullable = true)



## Selecting Columns and Indexing

To select a single column using its name,

In [30]:
df_practice.select('ReleaseYear').show(10)

+-----------+
|ReleaseYear|
+-----------+
|       2011|
|       2011|
|       2011|
|       2011|
|       2011|
|       2011|
|       2011|
|       2011|
|       2005|
|       2015|
+-----------+
only showing top 10 rows



To select multiple columns,

In [32]:
df_practice.select(['Title', 'ReleaseYear']).show(10)

+--------------------+-----------+
|               Title|ReleaseYear|
+--------------------+-----------+
|                 180|       2011|
|                 180|       2011|
|                 180|       2011|
|                 180|       2011|
|                 180|       2011|
|                 180|       2011|
|                 180|       2011|
|                 180|       2011|
|24 Hours on Craig...|       2005|
|          Summertime|       2015|
+--------------------+-----------+
only showing top 10 rows



To check data types,

In [33]:
df_practice.dtypes

[('Title', 'string'),
 ('ReleaseYear', 'string'),
 ('Locations', 'string'),
 ('FunFacts', 'string'),
 ('ProductionCompany', 'string'),
 ('Distributor', 'string'),
 ('Director', 'string'),
 ('Writer', 'string'),
 ('Actor1', 'string'),
 ('Actor2', 'string'),
 ('Actor3', 'string')]

We see that the 'ReleaseYear' is still a string. We can cast the column type to an integer using the `withColumn()` method.

**N.B.** You can find various ways to cast a column type in [this link](https://sparkbyexamples.com/pyspark/pyspark-cast-column-type/).

In [7]:
df_practice.withColumn('ReleaseYear', df_practice.ReleaseYear.cast('int')).dtypes

[('Title', 'string'),
 ('ReleaseYear', 'int'),
 ('Locations', 'string'),
 ('FunFacts', 'string'),
 ('ProductionCompany', 'string'),
 ('Distributor', 'string'),
 ('Director', 'string'),
 ('Writer', 'string'),
 ('Actor1', 'string'),
 ('Actor2', 'string'),
 ('Actor3', 'string')]

The previous operation is not inplace.

In [6]:
df_practice.dtypes

[('Title', 'string'),
 ('ReleaseYear', 'string'),
 ('Locations', 'string'),
 ('FunFacts', 'string'),
 ('ProductionCompany', 'string'),
 ('Distributor', 'string'),
 ('Director', 'string'),
 ('Writer', 'string'),
 ('Actor1', 'string'),
 ('Actor2', 'string'),
 ('Actor3', 'string')]

To get a descriptive summary table of the dataframe,

In [36]:
df_practice.describe().show()

+-------+-----------------+------------------+--------------------+--------------------+--------------------+--------------------+--------------+--------------------+-------------+-------------+--------------+
|summary|            Title|       ReleaseYear|           Locations|            FunFacts|   ProductionCompany|         Distributor|      Director|              Writer|       Actor1|       Actor2|        Actor3|
+-------+-----------------+------------------+--------------------+--------------------+--------------------+--------------------+--------------+--------------------+-------------+-------------+--------------+
|  count|             3463|              3462|                3354|                 946|                3460|                3218|          3414|                3366|         3358|         3180|          2426|
|   mean|            288.0| 1999.609841827768|                NULL|                NULL|                NULL|                NULL|          NULL|               

To add a column,

In [39]:
new_df = df_practice.withColumn('ReleaseYear plus 2', df_practice['ReleaseYear']+2)

The `withColumn` method is not an inplace operation (i.e. does not change the original dataframe). We need to assign it to the `new_df` dataframe. The `withColumn` method will automatically attempt to cast the column from string to double and add 2 to it.

In [40]:
new_df.show(10)

+--------------------+-----------+--------------------+--------+--------------------+---------------+--------------------+--------------------+-------------+------------+--------------------+------------------+
|               Title|ReleaseYear|           Locations|FunFacts|   ProductionCompany|    Distributor|            Director|              Writer|       Actor1|      Actor2|              Actor3|ReleaseYear plus 2|
+--------------------+-----------+--------------------+--------+--------------------+---------------+--------------------+--------------------+-------------+------------+--------------------+------------------+
|                 180|       2011|Epic Roasthouse (...|    NULL|         SPI Cinemas|           NULL|            Jayendra|Umarji Anuradha, ...|     Siddarth|Nithya Menon|         Priya Anand|            2013.0|
|                 180|       2011|Mason & Californi...|    NULL|         SPI Cinemas|           NULL|            Jayendra|Umarji Anuradha, ...|     Siddarth

We see that in the previous code, PySpark casted 'ReleaseYear' to `double`. It makes more sense to cast it to `int`. We can, therefore, cast 'ReleaseYear' to `int` before doing the arithmetic operation.

In [8]:
df_practice.withColumn('ReleaseYear + 2', df_practice.ReleaseYear.cast('int')+2).show(10)

+--------------------+-----------+--------------------+--------+--------------------+---------------+--------------------+--------------------+-------------+------------+--------------------+------------------+
|               Title|ReleaseYear|           Locations|FunFacts|   ProductionCompany|    Distributor|            Director|              Writer|       Actor1|      Actor2|              Actor3|ReleaseYear plus 2|
+--------------------+-----------+--------------------+--------+--------------------+---------------+--------------------+--------------------+-------------+------------+--------------------+------------------+
|                 180|       2011|Epic Roasthouse (...|    NULL|         SPI Cinemas|           NULL|            Jayendra|Umarji Anuradha, ...|     Siddarth|Nithya Menon|         Priya Anand|              2013|
|                 180|       2011|Mason & Californi...|    NULL|         SPI Cinemas|           NULL|            Jayendra|Umarji Anuradha, ...|     Siddarth

To drop a column,

In [43]:
new_df = new_df.drop('ReleaseYear plus 2')

Again, this is not an inplace operation, so we need to assign it to the `new_df` variable.

In [44]:
new_df.show(10)

+--------------------+-----------+--------------------+--------+--------------------+---------------+--------------------+--------------------+-------------+------------+--------------------+
|               Title|ReleaseYear|           Locations|FunFacts|   ProductionCompany|    Distributor|            Director|              Writer|       Actor1|      Actor2|              Actor3|
+--------------------+-----------+--------------------+--------+--------------------+---------------+--------------------+--------------------+-------------+------------+--------------------+
|                 180|       2011|Epic Roasthouse (...|    NULL|         SPI Cinemas|           NULL|            Jayendra|Umarji Anuradha, ...|     Siddarth|Nithya Menon|         Priya Anand|
|                 180|       2011|Mason & Californi...|    NULL|         SPI Cinemas|           NULL|            Jayendra|Umarji Anuradha, ...|     Siddarth|Nithya Menon|         Priya Anand|
|                 180|       2011| Justi

To rename a column,

**N.B.** This is also not an inplace operation.

In [47]:
df_practice.withColumnRenamed('ReleaseYear', 'Year').show(10)

+--------------------+----+--------------------+--------+--------------------+---------------+--------------------+--------------------+-------------+------------+--------------------+
|               Title|Year|           Locations|FunFacts|   ProductionCompany|    Distributor|            Director|              Writer|       Actor1|      Actor2|              Actor3|
+--------------------+----+--------------------+--------+--------------------+---------------+--------------------+--------------------+-------------+------------+--------------------+
|                 180|2011|Epic Roasthouse (...|    NULL|         SPI Cinemas|           NULL|            Jayendra|Umarji Anuradha, ...|     Siddarth|Nithya Menon|         Priya Anand|
|                 180|2011|Mason & Californi...|    NULL|         SPI Cinemas|           NULL|            Jayendra|Umarji Anuradha, ...|     Siddarth|Nithya Menon|         Priya Anand|
|                 180|2011| Justin Herman Plaza|    NULL|         SPI Cinem

To drop rows win NAs,

By default, `na.drop()` takes `how='any'` and `thres=0` as values. This means that rows having any column as null will get removed.

In [54]:
df_practice.na.drop().show()

+--------------------+-----------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+-----------------+----------------+------------------+
|               Title|ReleaseYear|           Locations|            FunFacts|   ProductionCompany|         Distributor|            Director|              Writer|           Actor1|          Actor2|            Actor3|
+--------------------+-----------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+-----------------+----------------+------------------+
|  After the Thin Man|       1936|          Coit Tower|The Tower was fun...| Metro-Goldwyn Mayer| Metro-Goldwyn Mayer|       W.S. Van Dyke|    Frances Goodrich|   William Powell|       Myrna Loy|     James Stewart|
|      Basic Instinct|       1992|Transbay Terminal...|Built in 1939, th...|    Carolco Pictures|    TriStar Pictures|      Paul Verhoeven| 

`how` can have two values `any` or `all`.

As for `thres`, it assigns the minimum number of **non-null** values that a row can have in order to drop it.
In the following code, any row having less than 2 **non-null** value will be removed.

In [50]:
df_practice.na.drop(how='any', thresh=2).show(10)

+--------------------+-----------+--------------------+--------+--------------------+---------------+--------------------+--------------------+-------------+------------+--------------------+
|               Title|ReleaseYear|           Locations|FunFacts|   ProductionCompany|    Distributor|            Director|              Writer|       Actor1|      Actor2|              Actor3|
+--------------------+-----------+--------------------+--------+--------------------+---------------+--------------------+--------------------+-------------+------------+--------------------+
|                 180|       2011|Epic Roasthouse (...|    NULL|         SPI Cinemas|           NULL|            Jayendra|Umarji Anuradha, ...|     Siddarth|Nithya Menon|         Priya Anand|
|                 180|       2011|Mason & Californi...|    NULL|         SPI Cinemas|           NULL|            Jayendra|Umarji Anuradha, ...|     Siddarth|Nithya Menon|         Priya Anand|
|                 180|       2011| Justi

`subset` is another parameter. It is used to specify in which columns to look for null values in order to drop the row.

In the following code, only the nulls in column 'Distributor' are considered.

In [55]:
df_practice.na.drop(how='any', subset=['Distributor']).show(10)

+--------------------+-----------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------+--------------------+--------------------+
|               Title|ReleaseYear|           Locations|            FunFacts|   ProductionCompany|         Distributor|            Director|              Writer|        Actor1|              Actor2|              Actor3|
+--------------------+-----------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------+--------------------+--------------------+
|24 Hours on Craig...|       2005|                NULL|                NULL|Yerba Buena Produ...|     Zealot Pictures|Michael Ferris Gi...|                 N/A| Craig Newmark|                NULL|                NULL|
|          Summertime|       2015|       Alamo Square |                NULL|Creative Monster ...|      7 Distribution|    Gabrie

Now, to fill missing values, we use `na.fill()`.

The following code fills the nulls in columns 'Locations' and 'Distributor' with 'N/A'.

In [56]:
df_practice.na.fill('N/A', ['Locations', 'Distributor']).show(10)

+--------------------+-----------+--------------------+--------+--------------------+---------------+--------------------+--------------------+-------------+------------+--------------------+
|               Title|ReleaseYear|           Locations|FunFacts|   ProductionCompany|    Distributor|            Director|              Writer|       Actor1|      Actor2|              Actor3|
+--------------------+-----------+--------------------+--------+--------------------+---------------+--------------------+--------------------+-------------+------------+--------------------+
|                 180|       2011|Epic Roasthouse (...|    NULL|         SPI Cinemas|            N/A|            Jayendra|Umarji Anuradha, ...|     Siddarth|Nithya Menon|         Priya Anand|
|                 180|       2011|Mason & Californi...|    NULL|         SPI Cinemas|            N/A|            Jayendra|Umarji Anuradha, ...|     Siddarth|Nithya Menon|         Priya Anand|
|                 180|       2011| Justi

Now, we will fill in the missing values with an imputer.

First, we import the imputer from PySpark library.

In [10]:
from pyspark.ml.feature import Imputer

Then, we create an imputer instance,

**N.B.** `Strategy` can be `mean`, `mode`, `median`. You can read more in this [link](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.Imputer.html).

In [73]:
imputer = Imputer(
    inputCols=['Title', 'ReleaseYear', 'ProductionCompany', 'Director', 'Distributor', 'Writer'],
    outputCols=["{}_imputed".format(c) for c in ['Title', 'ReleaseYear', 'ProductionCompany', 'Director', 'Distributor', 'Writer']]
).setStrategy("mode")

The previous code will produce an error since imputer works with numerical values. The previous code serves only as an example to how we can create an imputer instance.

In the following code, we introduce another, and maybe better, way.

We start by first creating an instance.

In [37]:
model = Imputer()
# print(model.getInputCols()) will produce an error since InputCols is not specified yet.

Then, we can set the parameter of our model.

In [38]:
# Method 1
model.setInputCols(['ReleaseYear'])
model.setOutputCols(['ReleaseYear_Imputed'])
model.setStrategy('mean')

Imputer_f09c57f735af

In [68]:
# Method 2
model.setParams(inputCols=['ReleaseYear'], outputCols=['ReleaseYear_Imputed'], strategy='mean')

Imputer_f09c57f735af

In [70]:
# Check Params
model._paramMap

{Param(parent='Imputer_f09c57f735af', name='inputCols', doc='input column names.'): ['ReleaseYear'],
 Param(parent='Imputer_f09c57f735af', name='outputCols', doc='output column names.'): ['ReleaseYear_Imputed'],
 Param(parent='Imputer_f09c57f735af', name='strategy', doc='strategy for imputation. If mean, then replace missing values using the mean value of the feature. If median, then replace missing values using the median value of the feature. If mode, then replace missing using the most frequent value of the feature.'): 'mean'}

Now, we fit and transform the data.

In [71]:
# We first cast to int
df_imputer = df_practice.withColumn('ReleaseYear', df_practice.ReleaseYear.cast('int'))

In [78]:
# Then, fit
model.fit(df_imputer)

ImputerModel: uid=Imputer_f09c57f735af, strategy=mean, missingValue=NaN, numInputCols=1, numOutputCols=1

In [None]:
# And, transform
# The following code will not work since there are no missing values
model.surrogateDF.show()
model.transform(df_imputer).show()

## Filtering Operations

In the following example, we want to retrieve the movies released in 1930 or before.

In [87]:
df_practice.filter("ReleaseYear<=1930").select(['Title', 'ReleaseYear']).show(truncate=False)

+------------------+-----------+
|Title             |ReleaseYear|
+------------------+-----------+
|Greed             |1924       |
|Greed             |1924       |
|A Jitney Elopement|1915       |
|Greed             |1924       |
|A Jitney Elopement|1915       |
|The Jazz Singer   |1927       |
|A Jitney Elopement|1915       |
|Greed             |1924       |
|Greed             |1924       |
|Greed             |1924       |
|The Jazz Singer   |1927       |
|A Jitney Elopement|1915       |
+------------------+-----------+



Another way,

In [88]:
df_practice.filter(df_practice["ReleaseYear"]<=1930).select(['Title', 'ReleaseYear']).show(truncate=False)

+------------------+-----------+
|Title             |ReleaseYear|
+------------------+-----------+
|Greed             |1924       |
|Greed             |1924       |
|A Jitney Elopement|1915       |
|Greed             |1924       |
|A Jitney Elopement|1915       |
|The Jazz Singer   |1927       |
|A Jitney Elopement|1915       |
|Greed             |1924       |
|Greed             |1924       |
|Greed             |1924       |
|The Jazz Singer   |1927       |
|A Jitney Elopement|1915       |
+------------------+-----------+



If we want to apply several filters, for example we want films released after 1915 and before 1930, we can do

In [95]:
df_practice.filter((df_practice["ReleaseYear"]<1930) & (df_practice["ReleaseYear"]>1915)).select(['Title', 'ReleaseYear']).show(truncate=False)

+---------------+-----------+
|Title          |ReleaseYear|
+---------------+-----------+
|Greed          |1924       |
|Greed          |1924       |
|Greed          |1924       |
|The Jazz Singer|1927       |
|Greed          |1924       |
|Greed          |1924       |
|Greed          |1924       |
|The Jazz Singer|1927       |
+---------------+-----------+



If we want movies released in 1999,

In [103]:
df_practice.filter(df_practice["ReleaseYear"]==1999).select(['Title', 'ReleaseYear']).show(truncate=False)

+-----------------+-----------+
|Title            |ReleaseYear|
+-----------------+-----------+
|Bicentennial Man |1999       |
|Bicentennial Man |1999       |
|Bicentennial Man |1999       |
|Bicentennial Man |1999       |
|Bicentennial Man |1999       |
|The Bachelor     |1999       |
|Edtv             |1999       |
|The Bachelor     |1999       |
|The Bachelor     |1999       |
|Seven Girlfriends|1999       |
|Edtv             |1999       |
|Stigmata         |1999       |
|The Bachelor     |1999       |
|Edtv             |1999       |
|The Bachelor     |1999       |
|The Bachelor     |1999       |
|The Bachelor     |1999       |
|The Bachelor     |1999       |
|The Bachelor     |1999       |
|The Bachelor     |1999       |
+-----------------+-----------+
only showing top 20 rows



If now we want movies that were **not** released in 1999,

In [104]:
# Method 1
df_practice.filter(df_practice["ReleaseYear"]!=1999).select(['Title', 'ReleaseYear']).show(truncate=False)

+----------------------+-----------+
|Title                 |ReleaseYear|
+----------------------+-----------+
|180                   |2011       |
|180                   |2011       |
|180                   |2011       |
|180                   |2011       |
|180                   |2011       |
|180                   |2011       |
|180                   |2011       |
|180                   |2011       |
|24 Hours on Craigslist|2005       |
|Summertime            |2015       |
|Ballers Season 3      |2017       |
|Chance Season 2       |2017       |
|Chance Season 2       |2017       |
|A Night Full of Rain  |1978       |
|A Night Full of Rain  |1978       |
|A Night Full of Rain  |1978       |
|A Night Full of Rain  |1978       |
|Chance Season 2       |2017       |
|Vegas in Space        |1992       |
|Vegas in Space        |1992       |
+----------------------+-----------+
only showing top 20 rows



In [105]:
# Method 2
df_practice.filter(~(df_practice["ReleaseYear"]==1999)).select(['Title', 'ReleaseYear']).show(truncate=False)

+----------------------+-----------+
|Title                 |ReleaseYear|
+----------------------+-----------+
|180                   |2011       |
|180                   |2011       |
|180                   |2011       |
|180                   |2011       |
|180                   |2011       |
|180                   |2011       |
|180                   |2011       |
|180                   |2011       |
|24 Hours on Craigslist|2005       |
|Summertime            |2015       |
|Ballers Season 3      |2017       |
|Chance Season 2       |2017       |
|Chance Season 2       |2017       |
|A Night Full of Rain  |1978       |
|A Night Full of Rain  |1978       |
|A Night Full of Rain  |1978       |
|A Night Full of Rain  |1978       |
|Chance Season 2       |2017       |
|Vegas in Space        |1992       |
|Vegas in Space        |1992       |
+----------------------+-----------+
only showing top 20 rows



## Groupby and Aggregate Operations

In the following, we want to count how many movies were filmed in each location. We use the `groupBy()` method to group the titles by location, then we use the `count()` method to count the number of movies.

In [120]:
df_practice.groupBy('Locations').count().show()

+--------------------+-----+
|           Locations|count|
+--------------------+-----+
|Roxie Theater (31...|    2|
|        Nobles Alley|    2|
|          Bay Bridge|   18|
|San Francisco Zoo...|    2|
|Filbert between H...|    2|
|  1160 Taylor Street|    2|
|Bank of America (...|    2|
|Francisco St from...|    2|
|Firestation #38 (...|    2|
|              Pier 5|    1|
|Way Faire Inn on ...|    2|
|Embarcadero & Was...|    2|
|Atlas Café, 3049 ...|    2|
|        AT&T Stadium|    4|
|       883 42nd Ave.|    2|
|5th Ave @ Fulton ...|    2|
|Grant Avenue betw...|    2|
|Pine between Kear...|    2|
|  Broadway & Sansome|    2|
|        Dolores Park|    6|
+--------------------+-----+
only showing top 20 rows



If we, want to see the last time a movie was filmed in that location, we first need to cast 'ReleaseYear' to `int`, then use the `max()` method.

In [121]:
df_practice.withColumn('ReleaseYear', df_practice.ReleaseYear.cast('int')).groupBy('Locations').max().show()

+--------------------+----------------+
|           Locations|max(ReleaseYear)|
+--------------------+----------------+
|Roxie Theater (31...|            2015|
|        Nobles Alley|            2014|
|          Bay Bridge|            2017|
|San Francisco Zoo...|            1967|
|Filbert between H...|            2016|
|  1160 Taylor Street|            2010|
|Bank of America (...|            1969|
|Francisco St from...|            2012|
|Firestation #38 (...|            1974|
|              Pier 5|            2018|
|Way Faire Inn on ...|            2014|
|Embarcadero & Was...|            2014|
|Atlas Café, 3049 ...|            2015|
|        AT&T Stadium|            2015|
|       883 42nd Ave.|            2015|
|5th Ave @ Fulton ...|            2016|
|Grant Avenue betw...|            2016|
|Pine between Kear...|            2015|
|  Broadway & Sansome|            2014|
|        Dolores Park|            2019|
+--------------------+----------------+
only showing top 20 rows

