#  Scratch pad to explore queries.

I'm demonstrating possible queries on the dataframes.  I'm going to stick to the non-spatial ones for starters.<br/>

What you should be thinking about is how queries such as this can be used to pick subsets of information that can be plotted, mapped, ...

In [1]:
import pandas as pd
import geopandas as gpd

In [2]:
ecb_gdf = gpd.read_file("../data/ecb.shp")

In [3]:
ecb_gdf.columns

Index(['BUSINESS A', 'DBA NAME', 'OWNERSHIP', 'ADDRESS', 'CITY', 'ZIP',
       'STATE', 'BUSINESS P', 'OWNER NAME', 'CREATION D', 'START DT', 'EXP DT',
       'NAICS', 'ACTIVITY D', 'today', 'years', 'naics_code', 'NAICS Desc',
       'sector', 'sector_des', 'zip_code', 'geometry'],
      dtype='object')

In [4]:
biz_df = pd.read_csv('../data/transformed.csv', sep='\t').reset_index()
#biz_df['sector'] = biz_df['sector'].astype(str)

In [5]:
biz_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53675 entries, 0 to 53674
Data columns (total 23 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   index              53675 non-null  int64 
 1   Unnamed: 0         53675 non-null  int64 
 2   BUSINESS ACCT#     53675 non-null  int64 
 3   DBA NAME           53675 non-null  object
 4   OWNERSHIP TYPE     53675 non-null  object
 5   ADDRESS            53675 non-null  object
 6   CITY               53647 non-null  object
 7   ZIP                53675 non-null  object
 8   STATE              53675 non-null  object
 9   BUSINESS PHONE     48269 non-null  object
 10  OWNER NAME         53675 non-null  object
 11  CREATION DT        53675 non-null  object
 12  START DT           53675 non-null  object
 13  EXP DT             53675 non-null  object
 14  NAICS              53675 non-null  int64 
 15  ACTIVITY DESC      53675 non-null  object
 16  today              53675 non-null  objec

## Note to self

I think of NAICS codes not as integers but character strings (of digits). <br/>
When I created transformed.csv in wrangling I did such a conversion. Writing and reading has converted it back to int64 so ...<br/>
My initial discussion with `the goog` indicates that is a `feature` of csv, not pandas?<br/>
I wonder if this could be a variant of the famous SettingWithCopy warning?!?!

In the mean time.

In [6]:
biz_df['sector'] = biz_df['sector'].astype(str)

## Looking at some queries

These are meant to be examples.  Pandas provides several techniques to filter rows.  I like using query.<br/>

I am going to look at examples from my neighborhood (the peninsula).

In [7]:
# 92110 is the northern part of peninsula but it also covers bay park and midway.
# Kind of skews things but ...
zips = ['92106', '92107', '92110']

In [8]:
len(biz_df.query(f"zip_code in @zips"))

3971

In [9]:
biz_df.query(f"zip_code in @zips and years >= 5")

Unnamed: 0.1,index,Unnamed: 0,BUSINESS ACCT#,DBA NAME,OWNERSHIP TYPE,ADDRESS,CITY,ZIP,STATE,BUSINESS PHONE,...,EXP DT,NAICS,ACTIVITY DESC,today,years,naics_code,NAICS Description,sector,sector_desc,zip_code
28,28,29,1993003503,1127 OPAL INC,SCORP,3518 BARNETT AVE,SAN DIEGO,92110-3208,CA,(619) 985-0665,...,4/30/2021,421,"WHOLESALE TRADE, DURABLE GOODS",12/22/2020,28,421,"WHOLESALE TRADE, DURABLE GOODS",42,Wholesale Trade(42),92110
33,33,34,2007015912,1344 HOLLY AVENUE LLC,LLC,4817 SANTA MONICA AVE SUITE A,SAN DIEGO,92107-2850,CA,,...,6/30/2021,541,"PROFESSIONAL, SCIENTIFIC & TECHNICAL SERVICES",12/22/2020,13,541,"Professional, Scientific, and Technical Servic...",54,"Professional, Scientific, and Technical Servic...",92107
36,36,37,2007016590,1430 UNION STREET LLC,LLC,575 ALBION ST,SAN DIEGO,92106-3209,CA,,...,6/30/2021,541,"PROFESSIONAL, SCIENTIFIC & TECHNICAL SERVICES",12/22/2020,13,541,"Professional, Scientific, and Technical Servic...",54,"Professional, Scientific, and Technical Servic...",92106
88,88,89,2007000179,220 GROUP LLC,LLC,3405 KENYON ST SUITE 301,SAN DIEGO,92110-5007,CA,(619) 758-9696,...,7/31/2021,541512,COMPUTER SYSTEMS DESIGN SERVICES,12/22/2020,14,5415,Computer Systems Design and Related ServicesT,54,"Professional, Scientific, and Technical Servic...",92110
89,89,90,1997014273,2250 FOURTH AVENUE PARTNERSHIP,PARTNR,2251 SAN DIEGO AVE SUITE A120,SAN DIEGO,92110-2969,CA,(619) 685-4249,...,12/31/2020,5311,LESSORS OF REAL ESTATE,12/22/2020,23,5311,Lessors of Real EstateT,53,Real Estate and Rental and Leasing(53),92110
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53568,53568,53569,1987013586,ZEUGSCHMIDT MOTOR CARS,SOLE,3256 ROSECRANS ST,SAN DIEGO,92110-4837,CA,(619) 222-4342,...,10/31/2021,44112,USED CAR DEALERS,12/22/2020,33,4411,Automobile Dealers,44,Retail Trade(44),92110
53594,53594,53595,1989008612,ZIGMAN/SHIELDS GENERAL CONTRACTORS INC,CORP,3276 ROSECRANS ST SUITE 300,SAN DIEGO,92110-4838,CA,(619) 294-8754,...,7/31/2021,233,"BUILDING, DEVELOPING & GENERAL CONTRACTING",12/22/2020,31,233,"BUILDING, DEVELOPING & GENERAL CONTRACTING",23,Construction(23),92110
53604,53604,53605,1985008562,ZINO'S INTERNATIONAL,SOLE,2168 CHATSWORTH BLVD,SAN DIEGO,92107-2423,CA,(619) 574-7895,...,10/31/2021,812112,BEAUTY SALONS,12/22/2020,36,8121,Personal Care Services,81,Other Services (except Public Administration)(81),92107
53641,53641,53642,2004007859,ZOOK FAMILY INVESTMENTS LP,LP,4533 ADAIR ST,SAN DIEGO,92107-3803,CA,(619) 226-7610,...,5/31/2021,53111,LESSORS OF RESIDENTIAL BUILDINGS & DWELLINGS,12/22/2020,17,5311,Lessors of Real EstateT,53,Real Estate and Rental and Leasing(53),92107


In [10]:
print(f"{len(_)/__:.2%}")

67.74%


### Interesting - 68% of the businesses have been around more than 5 years.

### A few more hacky examples.

In [11]:
biz_df.query(f"zip_code == '92110' and sector == '54'")

Unnamed: 0.1,index,Unnamed: 0,BUSINESS ACCT#,DBA NAME,OWNERSHIP TYPE,ADDRESS,CITY,ZIP,STATE,BUSINESS PHONE,...,EXP DT,NAICS,ACTIVITY DESC,today,years,naics_code,NAICS Description,sector,sector_desc,zip_code
88,88,89,2007000179,220 GROUP LLC,LLC,3405 KENYON ST SUITE 301,SAN DIEGO,92110-5007,CA,(619) 758-9696,...,7/31/2021,541512,COMPUTER SYSTEMS DESIGN SERVICES,12/22/2020,14,5415,Computer Systems Design and Related ServicesT,54,"Professional, Scientific, and Technical Servic...",92110
147,147,148,2007007100,3515 LTD,LP,4895 SAVANNAH ST,SAN DIEGO,92110-3824,CA,(619) 276-2532,...,5/31/2021,541,"PROFESSIONAL, SCIENTIFIC & TECHNICAL SERVICES",12/22/2020,14,541,"Professional, Scientific, and Technical Servic...",54,"Professional, Scientific, and Technical Servic...",92110
184,184,185,2020015969,4 STAR MGMT CONSULTING INC,CORP,2019 CHICAGO ST,SAN DIEGO,92110-3421,CA,(708) 209-6456,...,5/31/2021,541615,CONSULTING SERVICES,12/22/2020,0,5416,"Management, Scientific, and Technical Consulti...",54,"Professional, Scientific, and Technical Servic...",92110
217,217,218,2007018866,4891 LTD,CORP,4895 SAVANNAH ST,SAN DIEGO,92110-3824,CA,,...,6/30/2021,541,"PROFESSIONAL, SCIENTIFIC & TECHNICAL SERVICES",12/22/2020,13,541,"Professional, Scientific, and Technical Servic...",54,"Professional, Scientific, and Technical Servic...",92110
360,360,361,2009007812,718 VENTURA PLACE LLC,LLC,5145 MORENA PL,SAN DIEGO,92110-3921,CA,,...,3/31/2021,541,"PROFESSIONAL, SCIENTIFIC & TECHNICAL SERVICES",12/22/2020,12,541,"Professional, Scientific, and Technical Servic...",54,"Professional, Scientific, and Technical Servic...",92110
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
52897,52897,52898,2019004283,WOOD RODGERS INC,SCORP,1775 HANCOCK ST SUITE 160,SAN DIEGO,92110-2039,CA,(619) 819-9240,...,12/31/2020,54133,ENGINEERING SERVICES,12/22/2020,2,5413,"Architectural, Engineering, and Related Servic...",54,"Professional, Scientific, and Technical Servic...",92110
53035,53035,53036,2004009789,WSA LUMBER INC,CORP,5328 METRO ST,SAN DIEGO,92110-2608,CA,,...,3/31/2021,54,"PROFESSIONAL, SCIENTIFIC & TECHNICAL SERVICES",12/22/2020,16,54,"Professional, Scientific, and Technical ServicesT",54,"Professional, Scientific, and Technical Servic...",92110
53475,53475,53476,2019009791,ZACHARY BARRON PHOTOGRAPHY,SCORP,5343 BANKS ST,SAN DIEGO,92110-4008,CA,(619) 543-9959,...,4/30/2021,541922,COMMERCIAL PHOTOGRAPHY,12/22/2020,2,5419,"Other Professional, Scientific, and Technical ...",54,"Professional, Scientific, and Technical Servic...",92110
53496,53496,53497,2002003117,ZAMORA FINANCIAL CONSULTING INC,CORP,3990 OLD TOWN AVE SUITE 109A,SAN DIEGO,92110-2974,CA,(619) 501-6778,...,2/28/2021,54161,MANAGEMENT CONSULTING SERVICES,12/22/2020,19,5416,"Management, Scientific, and Technical Consulti...",54,"Professional, Scientific, and Technical Servic...",92110


In [12]:
biz_df.query(f"zip_code == '92110' and years >= 5 and sector == '54'")

Unnamed: 0.1,index,Unnamed: 0,BUSINESS ACCT#,DBA NAME,OWNERSHIP TYPE,ADDRESS,CITY,ZIP,STATE,BUSINESS PHONE,...,EXP DT,NAICS,ACTIVITY DESC,today,years,naics_code,NAICS Description,sector,sector_desc,zip_code
88,88,89,2007000179,220 GROUP LLC,LLC,3405 KENYON ST SUITE 301,SAN DIEGO,92110-5007,CA,(619) 758-9696,...,7/31/2021,541512,COMPUTER SYSTEMS DESIGN SERVICES,12/22/2020,14,5415,Computer Systems Design and Related ServicesT,54,"Professional, Scientific, and Technical Servic...",92110
147,147,148,2007007100,3515 LTD,LP,4895 SAVANNAH ST,SAN DIEGO,92110-3824,CA,(619) 276-2532,...,5/31/2021,541,"PROFESSIONAL, SCIENTIFIC & TECHNICAL SERVICES",12/22/2020,14,541,"Professional, Scientific, and Technical Servic...",54,"Professional, Scientific, and Technical Servic...",92110
217,217,218,2007018866,4891 LTD,CORP,4895 SAVANNAH ST,SAN DIEGO,92110-3824,CA,,...,6/30/2021,541,"PROFESSIONAL, SCIENTIFIC & TECHNICAL SERVICES",12/22/2020,13,541,"Professional, Scientific, and Technical Servic...",54,"Professional, Scientific, and Technical Servic...",92110
360,360,361,2009007812,718 VENTURA PLACE LLC,LLC,5145 MORENA PL,SAN DIEGO,92110-3921,CA,,...,3/31/2021,541,"PROFESSIONAL, SCIENTIFIC & TECHNICAL SERVICES",12/22/2020,12,541,"Professional, Scientific, and Technical Servic...",54,"Professional, Scientific, and Technical Servic...",92110
768,768,769,1998011190,AAA COLEMAN MOVING SYSTEMS,CORP,3045 ROSECRANS ST SUITE 202,SAN DIEGO,92110-4818,CA,,...,10/31/2021,54199,"ALL OTH PROF, SCIENTIFIC & TECHNICAL SERVICES",12/22/2020,22,5419,"Other Professional, Scientific, and Technical ...",54,"Professional, Scientific, and Technical Servic...",92110
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
52683,52683,52684,2004015488,WILLIAM P CHUTE,SOLE,3930 CALIFORNIA ST,SAN DIEGO,92110-2117,CA,(619) 298-9614,...,12/31/2020,54161,MANAGEMENT CONSULTING SERVICES,12/22/2020,16,5416,"Management, Scientific, and Technical Consulti...",54,"Professional, Scientific, and Technical Servic...",92110
52862,52862,52863,1985004299,WOLF DESIGN BUILD INC,CORP,1459 LIETA ST,SAN DIEGO,92110-3632,CA,(619) 275-0074,...,2/28/2021,54133,ENGINEERING SERVICES,12/22/2020,36,5413,"Architectural, Engineering, and Related Servic...",54,"Professional, Scientific, and Technical Servic...",92110
52882,52882,52883,2015044333,WONDERIST AGENCY,PARTNR,3015 SAINT CHARLES ST SUITE B,SAN DIEGO,92110-4857,CA,(262) 844-1628,...,9/30/2021,5418,ADVERTISING & RELATED SERVICES,12/22/2020,5,5418,"Advertising, Public Relations, and Related Ser...",54,"Professional, Scientific, and Technical Servic...",92110
53035,53035,53036,2004009789,WSA LUMBER INC,CORP,5328 METRO ST,SAN DIEGO,92110-2608,CA,,...,3/31/2021,54,"PROFESSIONAL, SCIENTIFIC & TECHNICAL SERVICES",12/22/2020,16,54,"Professional, Scientific, and Technical ServicesT",54,"Professional, Scientific, and Technical Servic...",92110


In [13]:
print(f"{len(_)/len(__):.2%} consulting business longer than 5 years")

78.72% consulting business longer than 5 years


### Once again not bad!

In [14]:
biz_df.query(f"sector == '54'")['zip_code'].value_counts()

92101        1051
92121         620
92130         568
92108         561
92037         504
             ... 
90016           1
921302666       1
90630           1
77406           1
45231           1
Name: zip_code, Length: 373, dtype: int64

Yo -- do you see the broken zip code?

## Businesses open 25 years in the peninsula

In [15]:
biz_df.query(f"years == 25 and zip_code in @zips")['sector_desc'].value_counts()

Professional, Scientific, and Technical Services(54)                            23
Other Services (except Public Administration)(81)                               16
Retail Trade(45)                                                                 6
Construction(23)                                                                 5
Administrative and Support and Waste Management and Remediation Services(56)     3
Retail Trade(44)                                                                 3
Health Care and Social Assistance(62)                                            2
Arts, Entertainment, and Recreation(71)                                          2
Educational Services(61)                                                         2
Real Estate and Rental and Leasing(53)                                           2
Transportation and Warehousing(48)                                               1
Manufacturing(33)                                                                1
Fina

In [16]:
sum(_)

70

In [17]:
biz_df.query(f"years <= 5 and zip_code in @zips")['sector_desc'].value_counts()

Other Services (except Public Administration)(81)                               267
Professional, Scientific, and Technical Services(54)                            235
Retail Trade(45)                                                                166
Accommodation and Food Services(72)                                             140
Administrative and Support and Waste Management and Remediation Services(56)     89
Real Estate and Rental and Leasing(53)                                           87
Health Care and Social Assistance(62)                                            79
Educational Services(61)                                                         76
Arts, Entertainment, and Recreation(71)                                          75
Retail Trade(44)                                                                 73
Construction(23)                                                                 45
Transportation and Warehousing(48)                                          

In [18]:
sum(_)

1473

### So 1473 businesses are less than 5 years old.

## Going to stop here.  You should get the gist of how you can interact with the data.