# Downloading data

- The data I'm using for this project is the [Street Tree List]('https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq') from San Francisco Department of Public Works. I downloaded the data on Nov. 6, 2022. At the time, the data set was last updated on Nov. 6, 2022 as well.
- I created a copy of the data set and named it `original_street_tree_list.csv`. I then put the data set in the `raw-data` folder.


# Explore the data

In [1]:
# import packages
import pandas as pd
import altair as alt

In [3]:
# read csv file
sf_trees_original = pd.read_csv('street_tree_list.csv')
sf_trees_original.head()

Unnamed: 0,TreeID,qLegalStatus,qSpecies,qAddress,SiteOrder,qSiteInfo,PlantType,qCaretaker,qCareAssistant,PlantDate,...,XCoord,YCoord,Latitude,Longitude,Location,Fire Prevention Districts,Police Districts,Supervisor Districts,Zip Codes,Neighborhoods (old)
0,217365,Section 806 (d),Ceanothus 'Ray Hartman' :: California Lilac 'R...,707 Rockdale Dr,1.0,Sidewalk: Property side : Yard,Tree,Private,,10/14/2021 12:00:00 AM,...,5997488.0,2098235.0,37.741209,-122.451285,"(37.74120925101712, -122.45128526411095)",9.0,7.0,4.0,59.0,40.0
1,92771,DPW Maintained,Tristaniopsis laurina :: Swamp Myrtle,11X Blanken Ave,4.0,Sidewalk: Curb side : Cutout,Tree,Private,,10/14/2021 12:00:00 AM,...,6011718.0,2087394.0,37.712247,-122.40132,"(37.712246915438215, -122.40132023435935)",10.0,3.0,8.0,309.0,1.0
2,23904,DPW Maintained,Prunus subhirtella 'Pendula' :: Weeping Cherry,1600X Webster St,6.0,Median : Cutout,Tree,DPW,,,...,6003596.0,2114195.0,37.78538,-122.431304,"(37.78537959802679, -122.43130418097743)",13.0,9.0,11.0,29490.0,13.0
3,28646,DPW Maintained,Prunus subhirtella 'Pendula' :: Weeping Cherry,1600X Webster St,7.0,Median : Cutout,Tree,DPW,,,...,6003558.0,2114375.0,37.785872,-122.431449,"(37.78587163716589, -122.43144931782685)",13.0,9.0,11.0,29490.0,13.0
4,229807,DPW Maintained,Jacaranda mimosifolia :: Jacaranda,2560 Bryant St,1.0,Sidewalk: Curb side : Cutout,Tree,Private,,,...,6009700.0,2102427.0,37.753411,-122.409355,"(37.75341142310638, -122.40935530851043)",2.0,4.0,7.0,28859.0,19.0


In [4]:
sf_trees_original.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 196590 entries, 0 to 196589
Data columns (total 23 columns):
 #   Column                     Non-Null Count   Dtype  
---  ------                     --------------   -----  
 0   TreeID                     196590 non-null  int64  
 1   qLegalStatus               196533 non-null  object 
 2   qSpecies                   196590 non-null  object 
 3   qAddress                   195097 non-null  object 
 4   SiteOrder                  194796 non-null  float64
 5   qSiteInfo                  196590 non-null  object 
 6   PlantType                  196590 non-null  object 
 7   qCaretaker                 196590 non-null  object 
 8   qCareAssistant             24707 non-null   object 
 9   PlantDate                  70878 non-null   object 
 10  DBH                        153021 non-null  float64
 11  PlotSize                   146229 non-null  object 
 12  PermitNotes                53367 non-null   object 
 13  XCoord                     19

In [5]:
# Copy the original dataframe

sf_trees = sf_trees_original.copy()

## Check duplicates

In [6]:
# check if TreeID is unique
sf_trees['TreeID'].nunique()

196590

In [7]:
len(sf_trees)

196590

Seems like there's no duplicated `TreeID`. There are 196590 thousand trees planted in SF as of Nov.6, 2022. 

In [8]:
# make sure the length of dataframe matches the number of unique IDs

assert len(sf_trees) == sf_trees['TreeID'].nunique()

- There are 196590 TreeID, but only 70878 have a plant date. 
- `PlantDate` should also be a date Dtype.
- The `Zip Codes` are weird. There are some 2, 3 digit numbers.  

## Convert data type

In [9]:
# convert the `PlantDate` Column

sf_trees['PlantDate'] = pd.to_datetime(sf_trees['PlantDate'])


In [10]:
sf_trees.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 196590 entries, 0 to 196589
Data columns (total 23 columns):
 #   Column                     Non-Null Count   Dtype         
---  ------                     --------------   -----         
 0   TreeID                     196590 non-null  int64         
 1   qLegalStatus               196533 non-null  object        
 2   qSpecies                   196590 non-null  object        
 3   qAddress                   195097 non-null  object        
 4   SiteOrder                  194796 non-null  float64       
 5   qSiteInfo                  196590 non-null  object        
 6   PlantType                  196590 non-null  object        
 7   qCaretaker                 196590 non-null  object        
 8   qCareAssistant             24707 non-null   object        
 9   PlantDate                  70878 non-null   datetime64[ns]
 10  DBH                        153021 non-null  float64       
 11  PlotSize                   146229 non-null  object  

In [11]:
sf_trees.head()

Unnamed: 0,TreeID,qLegalStatus,qSpecies,qAddress,SiteOrder,qSiteInfo,PlantType,qCaretaker,qCareAssistant,PlantDate,...,XCoord,YCoord,Latitude,Longitude,Location,Fire Prevention Districts,Police Districts,Supervisor Districts,Zip Codes,Neighborhoods (old)
0,217365,Section 806 (d),Ceanothus 'Ray Hartman' :: California Lilac 'R...,707 Rockdale Dr,1.0,Sidewalk: Property side : Yard,Tree,Private,,2021-10-14,...,5997488.0,2098235.0,37.741209,-122.451285,"(37.74120925101712, -122.45128526411095)",9.0,7.0,4.0,59.0,40.0
1,92771,DPW Maintained,Tristaniopsis laurina :: Swamp Myrtle,11X Blanken Ave,4.0,Sidewalk: Curb side : Cutout,Tree,Private,,2021-10-14,...,6011718.0,2087394.0,37.712247,-122.40132,"(37.712246915438215, -122.40132023435935)",10.0,3.0,8.0,309.0,1.0
2,23904,DPW Maintained,Prunus subhirtella 'Pendula' :: Weeping Cherry,1600X Webster St,6.0,Median : Cutout,Tree,DPW,,NaT,...,6003596.0,2114195.0,37.78538,-122.431304,"(37.78537959802679, -122.43130418097743)",13.0,9.0,11.0,29490.0,13.0
3,28646,DPW Maintained,Prunus subhirtella 'Pendula' :: Weeping Cherry,1600X Webster St,7.0,Median : Cutout,Tree,DPW,,NaT,...,6003558.0,2114375.0,37.785872,-122.431449,"(37.78587163716589, -122.43144931782685)",13.0,9.0,11.0,29490.0,13.0
4,229807,DPW Maintained,Jacaranda mimosifolia :: Jacaranda,2560 Bryant St,1.0,Sidewalk: Curb side : Cutout,Tree,Private,,NaT,...,6009700.0,2102427.0,37.753411,-122.409355,"(37.75341142310638, -122.40935530851043)",2.0,4.0,7.0,28859.0,19.0


In [18]:
sf_trees['PlantDate'].max()

Timestamp('2022-11-12 00:00:00')

In [19]:
sf_trees['PlantDate'].min()

Timestamp('1955-09-19 00:00:00')

So the earliest tree plant date is in 1955. The tree is 67 years old now. 

In [18]:
sf_trees.columns

Index(['TreeID', 'qLegalStatus', 'qSpecies', 'qAddress', 'SiteOrder',
       'qSiteInfo', 'PlantType', 'qCaretaker', 'qCareAssistant', 'PlantDate',
       'DBH', 'PlotSize', 'PermitNotes', 'XCoord', 'YCoord', 'Latitude',
       'Longitude', 'Location', 'Fire Prevention Districts',
       'Police Districts', 'Supervisor Districts', 'Zip Codes',
       'Neighborhoods (old)'],
      dtype='object')

In [12]:
# look at the unique categories of trees

sf_trees['qSpecies'].unique()

array(["Ceanothus 'Ray Hartman' :: California Lilac 'Ray Hartman'",
       'Tristaniopsis laurina :: Swamp Myrtle',
       "Prunus subhirtella 'Pendula' :: Weeping Cherry",
       'Jacaranda mimosifolia :: Jacaranda',
       'Afrocarpus gracilior :: Fern Pine',
       'Magnolia grandiflora :: Southern Magnolia',
       'Photinia fraseri :: Photinia: Chinese photinia',
       'Laurus nobilis :: Sweet Bay: Grecian Laurel',
       'Platanus x hispanica :: Sycamore: London Plane',
       'Eucalyptus sideroxylon :: Red Ironbark',
       'Sequoia sempervirens :: Coast Redwood',
       'Pinus radiata :: Monterey Pine',
       "Arbutus 'Marina' :: Hybrid Strawberry Tree",
       'Pyrus calleryana :: Ornamental Pear',
       'Callistemon viminalis :: Weeping Bottlebrush',
       'Lophostemon confertus :: Brisbane Box',
       'Geijera parviflora :: Australian Willow',
       'Acacia melanoxylon :: Blackwood Acacia',
       'Ginkgo biloba :: Maidenhair Tree',
       "Platanus x hispanica 'Columb

In [13]:
# look at the caretaker of trees

sf_trees['qCaretaker'].unique()

array(['Private', 'DPW', 'Mission Verde', 'DPW for City Agency',
       'Fire Dept', 'Rec/Park', 'Mayor Office of Housing', 'Port',
       'SFUSD', 'CAN', 'Health Dept', 'MTA', 'Public Library', 'PUC',
       'Housing Authority', 'Dept of Real Estate', 'Police Dept',
       'Office of Mayor', 'Purchasing Dept', 'Arts Commission',
       'War Memorial', 'City College', 'Asian Arts Commission',
       'Cleary Bros. Landscape'], dtype=object)

In [17]:
sf_trees_species = sf_trees.groupby(['qSpecies']).count()
sf_trees_species_sorted = sf_trees_species.sort_values(by=['TreeID'], ascending=False)
sf_trees_species_sorted

Unnamed: 0_level_0,TreeID,qLegalStatus,qAddress,SiteOrder,qSiteInfo,PlantType,qCaretaker,qCareAssistant,PlantDate,DBH,...,XCoord,YCoord,Latitude,Longitude,Location,Fire Prevention Districts,Police Districts,Supervisor Districts,Zip Codes,Neighborhoods (old)
qSpecies,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Tree(s) ::,11806,11799,10727,11586,11806,11806,11806,353,10183,1620,...,10560,10560,10560,10560,10560,10555,10550,10560,10560,10560
Platanus x hispanica :: Sycamore: London Plane,11739,11739,11702,11724,11739,11739,11739,300,1890,10574,...,11680,11680,11680,11680,11680,11673,11679,11679,11680,11679
Lophostemon confertus :: Brisbane Box,8904,8899,8870,8879,8904,8904,8904,1060,4343,7017,...,8810,8810,8810,8810,8810,8801,8800,8806,8810,8806
Metrosideros excelsa :: New Zealand Xmas Tree,8850,8850,8829,8802,8850,8850,8850,1194,2351,7618,...,8760,8760,8760,8760,8760,8756,8760,8760,8760,8760
Tristaniopsis laurina :: Swamp Myrtle,7448,7448,7437,7345,7448,7448,7448,1041,3716,6431,...,7422,7422,7422,7422,7422,7422,7422,7422,7422,7422
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Acer tegmentosum :: Manchurian snakebark maple,1,1,1,1,1,1,1,0,1,1,...,1,1,1,1,1,1,1,1,1,1
Prunus sargentii :: Sargent Cherry,1,1,1,1,1,1,1,0,1,0,...,1,1,1,1,1,1,1,1,1,1
Prunus sargentii 'Columnaris' :: Sargent Cherry Tree 'Columnaris',1,1,1,0,1,1,1,0,1,1,...,1,1,1,1,1,1,1,1,1,1
Prunus persica nectarina :: Flowering Nectarine Tree,1,1,1,1,1,1,1,0,0,1,...,1,1,1,1,1,1,1,1,1,1


In [14]:
sf_trees.groupby(['qCaretaker']).count()

Unnamed: 0_level_0,TreeID,qLegalStatus,qSpecies,qAddress,SiteOrder,qSiteInfo,PlantType,qCareAssistant,PlantDate,DBH,...,XCoord,YCoord,Latitude,Longitude,Location,Fire Prevention Districts,Police Districts,Supervisor Districts,Zip Codes,Neighborhoods (old)
qCaretaker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Arts Commission,31,31,31,31,30,31,31,0,11,31,...,31,31,31,31,31,31,31,31,31,31
Asian Arts Commission,6,6,6,6,6,6,6,1,0,6,...,6,6,6,6,6,6,6,6,6,6
CAN,9,9,9,9,9,9,9,8,9,9,...,9,9,9,9,9,9,9,9,9,9
City College,11,11,11,11,11,11,11,2,2,11,...,11,11,11,11,11,11,11,11,11,11
Cleary Bros. Landscape,1,1,1,1,1,1,1,0,0,1,...,1,1,1,1,1,1,1,1,1,1
DPW,28254,28212,28254,28251,27988,28254,28254,926,10913,26046,...,27265,27265,27265,27265,27265,27133,27129,27135,27135,27135
DPW for City Agency,213,213,213,213,212,213,213,103,66,109,...,200,200,200,200,200,200,200,200,200,200
Dept of Real Estate,94,94,94,94,94,94,94,2,43,89,...,92,92,92,92,92,92,92,92,92,92
Fire Dept,69,69,69,69,69,69,69,2,44,64,...,69,69,69,69,69,69,69,69,69,69
Health Dept,56,56,56,56,55,56,56,1,15,51,...,56,56,56,56,56,56,56,56,56,56
