# San Francisco Businesses: Boom or Bust?

By Hannah Bassett and Hannah Freitag 

### DATA PREP AND VETTING

### The Project

The purpose of this project is to identify changes that occurred among San Francisco’s businesses during the COVID-19 pandemic. Our analysis of DataSF’s [Registered Business Locations]("https://data.sfgov.org/Economy-and-Community/Registered-Business-Locations-San-Francisco/g8m3-pdis") dataset found that overall the number of business openings have decreased over the past five years – a trend that is reflected among restaurant openings, as well. The Financial District/South Beach neighborhood experienced the most business openings, but the annual number of restaurant openings in the neighborhood is still not back to pre-pandemic levels. 

### The Premise

Cities are the epicenter of COVID-19’s impact in the United States, and San Francisco is no exception. In the three years since San Francisco issued its first [Stay Home]("https://sf.gov/news/sf-expands-stay-home-order-response-coronavirus-outbreak") order, life in SF has largely returned to its pre-pandemic ways, with one notable exception: [business activity in San Francisco has stalled]("https://downtownrecovery.com/death_of_downtown_policy_brief.pdf") at 30% of pre-pandemic levels. The New York Times [dubbed]("https://www.nytimes.com/2023/02/09/podcasts/the-daily/san-francisco-downtown-pandemic.html") San Francisco “the most empty downtown in America.” 

Our project used the SF business location data to answer four discrete quantifiable questions about the pandemic’s impact on SF businesses, zooming in on SF restaurants and neighborhoods from 2018 to 2022: 
1. Has the number of annual business openings increased?
2. In which neighborhoods have occurred the most openings?
3. How have restaurant openings changed throughout the course of the pandemic?
4. How have restaurant openings changed in the three neighborhoods with the most restaurants?

### The Data

We analyzed [Registered Business Locations - San Francisco]("https://data.sfgov.org/Economy-and-Community/Registered-Business-Locations-San-Francisco/g8m3-pdis") data, collected by the Treasurer & Tax Collector’s Office and hosted on DataSF. The dataset includes 27 data fields, as defined by the [data dictionary]("https://data.sfgov.org/Economy-and-Community/Registered-Business-Locations-San-Francisco/g8m3-pdis#:~:text=Columns%20in%20this%20Dataset"). 

The dataset has several limitations, as identified in our interviews with the Treasurer & Tax Collector’s Office. First and foremost, this data is self-reported for tax purposes and thus comes with the usual caveats related to accuracy and thoroughness. For example, the `Closing date` field could be blank because the business location is still open or simply because the business forgot to update their records with the Treasurer & Tax Collector’s Office when they shut down that location. Additionally, several fields in the data reflect the parent business, which can differ from that of the business location listed in the data. For example, the `NAICS` column includes codes that identify a business’ sector, but that categorization could reflect the parent company’s categorization worldwide, not necessarily specific to the SF business location’s activities. 

Mindful of these limitations, we narrowed our analysis to key fields in the dataset with high reliability: `Location Start Date`, `LIC (license codes)`, and `Neighborhood - Analysis Boundaries (neighborhood)`. 


### Data Import

The dataset “Registered Business Locations - San Francisco” is available on DataSF’s portal. 

___________________________________________________________________________________________________________________

In [14]:
import pandas as pd
import altair as alt
from datetime import datetime

In [15]:
# Store csv data of Registered Business Locations as dataframe
sf_biz = pd.read_csv("https://data.sfgov.org/api/views/g8m3-pdis/rows.csv?accessType=DOWNLOAD")

  sf_biz = pd.read_csv("https://data.sfgov.org/api/views/g8m3-pdis/rows.csv?accessType=DOWNLOAD")


In [16]:
# View the first few rows to assess import and column headers
sf_biz.head(2)

Unnamed: 0,Location Id,Business Account Number,Ownership Name,DBA Name,Street Address,City,State,Source Zipcode,Business Start Date,Business End Date,...,Supervisor District,Neighborhoods - Analysis Boundaries,Business Corridor,Business Location,UniqueID,SF Find Neighborhoods,Current Police Districts,Current Supervisor Districts,Analysis Neighborhoods,Neighborhoods
0,1248254-03-201,144332,Intl Training & Exchange Inc,Altruecare,600 California St 10,San Francisco,CA,94108,10/01/1983,,...,3.0,Chinatown,,POINT (-122.40453 37.792652),1248254-03-201-0144332-6100-6299-02-12-2020,104.0,6.0,3.0,6.0,104.0
1,1243044-01-201,144332,Intl Training & Exchange Inc,Campcareusa,600 California St Fl 10,San Francisco,CA,94108,10/01/1983,,...,3.0,Chinatown,,POINT (-122.40453 37.792652),1243044-01-201-0144332-6100-6299-01-15-2020,104.0,6.0,3.0,6.0,104.0


In [17]:
sf_biz.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 304850 entries, 0 to 304849
Data columns (total 32 columns):
 #   Column                               Non-Null Count   Dtype  
---  ------                               --------------   -----  
 0   Location Id                          304850 non-null  object 
 1   Business Account Number              304850 non-null  int64  
 2   Ownership Name                       304850 non-null  object 
 3   DBA Name                             304850 non-null  object 
 4   Street Address                       304845 non-null  object 
 5   City                                 304654 non-null  object 
 6   State                                304218 non-null  object 
 7   Source Zipcode                       304641 non-null  object 
 8   Business Start Date                  304850 non-null  object 
 9   Business End Date                    150746 non-null  object 
 10  Location Start Date                  304850 non-null  object 
 11  Location End 

In [18]:
# Create new variable for location start year
sf_biz["Start_Year"] = sf_biz["Location Start Date"].str[-4:].astype(int)

In [19]:
# Filter dataframe to limit business openings to the past five years (2018-2022) 
sf_biz_18_22 = sf_biz[(sf_biz.Start_Year >= 2018) & (sf_biz.Start_Year <= 2022)]

In [20]:
# Rename "Neighborhoods - Analysis Boundaries" to "Hoods"
sf_biz_18_22 = sf_biz_18_22.rename(columns={"Neighborhoods - Analysis Boundaries": "Hoods"})

In [21]:
# Review all column headers
sf_biz_18_22.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 80771 entries, 0 to 304849
Data columns (total 33 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Location Id                   80771 non-null  object 
 1   Business Account Number       80771 non-null  int64  
 2   Ownership Name                80771 non-null  object 
 3   DBA Name                      80771 non-null  object 
 4   Street Address                80770 non-null  object 
 5   City                          80768 non-null  object 
 6   State                         80751 non-null  object 
 7   Source Zipcode                80654 non-null  object 
 8   Business Start Date           80771 non-null  object 
 9   Business End Date             22595 non-null  object 
 10  Location Start Date           80771 non-null  object 
 11  Location End Date             30459 non-null  object 
 12  Mail Address                  60491 non-null  object 
 13  

In [22]:
# Create a second dataframe that includes all business locations that have a license (LIC Code)
sf_biz_industry = sf_biz_18_22.dropna(subset=['LIC Code'])

The new dataframe, which consists of a subset of the complete data (2018-2022) contains 4,659 rows that have complete columns for (1) Location Start Date, (2) LIC Code, and (3) Start Year, and 4,381 complete rows for neighborhoods ("Hoods"). 

We recognize that this is a small dataset, however from our interviews with experts we know that this is complete and highly reliable data that will support our analysis.

In [23]:
# Storing dataframes as CSV
sf_biz_18_22.to_csv('sf_biz_18_22.csv')
sf_biz_industry.to_csv('sf_biz_industry.csv')