# San Francisco Businesses: Boom or Bust?

By Hannah Bassett and Hannah Freitag 

### DATA VETTING

### The Project

The purpose of this project is to identify changes that occurred among San Francisco’s businesses during the COVID-19 pandemic. Our analysis of DataSF’s [Registered Business Locations]("https://data.sfgov.org/Economy-and-Community/Registered-Business-Locations-San-Francisco/g8m3-pdis) dataset found that overall the number of business openings have decreased over the past five years–a trend that is reflected among restaurant openings, as well. The Financial District/South Beach experienced the most business openings, but the annual number of restaurant openings in the neighborhood are still not back to pre-pandemic levels. 

___________________________________________________________________________________________________________________

In [1]:
import pandas as pd
import altair as alt

In [2]:
# Store csv data of Registered Business Locations as dataframe
sf_biz = pd.read_csv("https://data.sfgov.org/api/views/g8m3-pdis/rows.csv?accessType=DOWNLOAD")

  sf_biz = pd.read_csv("https://data.sfgov.org/api/views/g8m3-pdis/rows.csv?accessType=DOWNLOAD")


In [3]:
# View the first few rows to assess import and column headers
sf_biz.head()

Unnamed: 0,Location Id,Business Account Number,Ownership Name,DBA Name,Street Address,City,State,Source Zipcode,Business Start Date,Business End Date,...,Supervisor District,Neighborhoods - Analysis Boundaries,Business Corridor,Business Location,UniqueID,SF Find Neighborhoods,Current Police Districts,Current Supervisor Districts,Analysis Neighborhoods,Neighborhoods
0,1248254-03-201,144332,Intl Training & Exchange Inc,Altruecare,600 California St 10,San Francisco,CA,94108,10/01/1983,,...,3.0,Chinatown,,POINT (-122.40453 37.792652),1248254-03-201-0144332-6100-6299-02-12-2020,104.0,6.0,3.0,6.0,104.0
1,1243044-01-201,144332,Intl Training & Exchange Inc,Campcareusa,600 California St Fl 10,San Francisco,CA,94108,10/01/1983,,...,3.0,Chinatown,,POINT (-122.40453 37.792652),1243044-01-201-0144332-6100-6299-01-15-2020,104.0,6.0,3.0,6.0,104.0
2,0002881-01-001,2881,Benson & Neff-Cpas-A Prof Corp,Benson & Neff-Cpas-A Prof Corp,1 Post St 2150,San Francisco,CA,94104,10/01/1968,08/31/2018,...,3.0,Financial District/South Beach,,POINT (-122.40211 37.789062),0002881-01-001-0002881--10-01-1996,19.0,6.0,3.0,8.0,19.0
3,0081230-01-001,81230,Li Raymond K Y Md Inc,Li Raymond K Y Md Inc,950 Stockton St Ste 205,San Francisco,CA,94108,01/01/1973,06/29/2022,...,3.0,Chinatown,,POINT (-122.40796 37.794514),0081230-01-001-0081230-6100-6299-01-01-1973,104.0,6.0,3.0,6.0,104.0
4,0030032-30-001,30032,Walgreen Co,Walgreens #11385,1580 Valencia St Ste 101,San Francisco,CA,94110,10/01/1968,,...,9.0,Bernal Heights,,POINT (-122.420265 37.747643),0030032-30-001-0030032-4400-4599-05-07-2007,83.0,9.0,2.0,2.0,83.0


In [6]:
# Create new variable for location start year
sf_biz["Start_Year"] = sf_biz["Location Start Date"].str[-4:].astype(int)

In [7]:
# Filter dataframe to limit business openings to the past five years (2018-2022) 
sf_biz_18_22 = sf_biz[(sf_biz.Start_Year >= 2018) & (sf_biz.Start_Year <= 2022)]

In [8]:
# Rename "Neighborhoods - Analysis Boundaries" to "Neighborhoods"
sf_biz_18_22 = sf_biz_18_22.rename(columns={"Neighborhoods - Analysis Boundaries": "Hoods"})

In [9]:
# Review all column headers
sf_biz_18_22.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 80771 entries, 0 to 304849
Data columns (total 33 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Location Id                   80771 non-null  object 
 1   Business Account Number       80771 non-null  int64  
 2   Ownership Name                80771 non-null  object 
 3   DBA Name                      80771 non-null  object 
 4   Street Address                80770 non-null  object 
 5   City                          80768 non-null  object 
 6   State                         80751 non-null  object 
 7   Source Zipcode                80654 non-null  object 
 8   Business Start Date           80771 non-null  object 
 9   Business End Date             22595 non-null  object 
 10  Location Start Date           80771 non-null  object 
 11  Location End Date             30459 non-null  object 
 12  Mail Address                  60491 non-null  object 
 13  

In [10]:
# Create a second dataframe with all for all entries with license code (LIC Code)
# Drop entries with missing values for license code
sf_biz_industry = sf_biz_18_22.dropna(subset=['LIC Code'])

In [11]:
# Review drop of entries with missing values for license code
sf_biz_industry.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4659 entries, 15 to 304830
Data columns (total 33 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Location Id                   4659 non-null   object 
 1   Business Account Number       4659 non-null   int64  
 2   Ownership Name                4659 non-null   object 
 3   DBA Name                      4659 non-null   object 
 4   Street Address                4659 non-null   object 
 5   City                          4659 non-null   object 
 6   State                         4656 non-null   object 
 7   Source Zipcode                4600 non-null   object 
 8   Business Start Date           4659 non-null   object 
 9   Business End Date             306 non-null    object 
 10  Location Start Date           4659 non-null   object 
 11  Location End Date             483 non-null    object 
 12  Mail Address                  4367 non-null   object 
 13  

The new dataframe which consists of a subset of the complete data and filter by year contains 4,659 rows that have complete columns for (1) Location Start Date, (2) LIC Code, and (3) Start Year, and 4,381 complete rows for neighborhoods ("Hoods").

In [12]:
# Storing dataframes as CSV
sf_biz_18_22.to_csv('sf_biz_18_22.csv')
sf_biz_industry.to_csv('sf_biz_industry.csv')