# Fetch and process Los Angeles street trees
> This notebook fetches and processes the City of Los Angeles' [historical inventory](https://geohub.lacity.org/datasets/lahub::trees-bureau-of-street-services/about) of roughly 510,000 roadside trees. It downloads the data using a Bureau of Street Services [API endpoint](https://services5.arcgis.com/7nsPwEMP38bSkCjy/ArcGIS/rest/services/Trees_Data_Bureau_of_Street_Services/FeatureServer/0) and cleans it in preparation for later analysis and visualization.

---

#### Load Python tools and Jupyter config

In [1]:
import requests
import pandas as pd
import jupyter_black
import geopandas as gpd
from tqdm.notebook import tqdm

In [2]:
jupyter_black.load()
pd.options.display.max_columns = 100
pd.options.display.max_rows = 1000
pd.options.display.max_colwidth = None

In [3]:
today = pd.Timestamp.today().strftime("%Y-%m-%d")

---

## Fetch
> The data can be retrieved from an [open API endpoint](https://services5.arcgis.com/7nsPwEMP38bSkCjy/ArcGIS/rest/services/Trees_Data_Bureau_of_Street_Services/FeatureServer/0), powered by Esri. This feature service has a 1,000 record limit, however, so we have to first get the total record count and then paginate through the results to store all the trees. 

#### Base URL for the API endpoint

In [4]:
url = "https://services5.arcgis.com/7nsPwEMP38bSkCjy/arcgis/rest/services/Trees_Data_Bureau_of_Street_Services/FeatureServer/0/query"

#### Get count of records by querying for total

In [5]:
count_params = {"where": "1=1", "returnCountOnly": "true", "f": "json"}
response = requests.get(url, params=count_params)
total_count = response.json().get("count", 0)

#### Pagination setup

In [6]:
params = {
    "where": "1=1",
    "outFields": "*",
    "f": "geojson",
    "resultRecordCount": 1000,  # The API limit
}

#### Loop with the pagination, gathering 1,000 trees per request, and store them in a geodataframe

In [7]:
pbar = tqdm(total=total_count)

tree_data = []

for offset in range(0, total_count, params["resultRecordCount"]):
    params["resultOffset"] = offset
    response = requests.get(url, params=params)
    geojson = response.json()

    tree_data.extend(geojson["features"])
    pbar.update(len(geojson["features"]))

pbar.close()

src = gpd.GeoDataFrame.from_features(tree_data)

  0%|          | 0/635558 [00:00<?, ?it/s]

#### Dataframe with just trees, not stumps or tree wells

In [8]:
gdf_src = src.query("Type == 1").copy()

---

## Process

#### Lowercase the column names

In [9]:
gdf_src.columns = gdf_src.columns.str.lower()

#### Split the `tooltip` column into useable data

In [10]:
split_data = gdf_src["tooltip"].str.replace("\\n", "\n").str.split("\n", expand=True)

In [11]:
tree_id = split_data[0].str.replace("Tree ID: ", "")
location = split_data[1].str.replace("Location: ", "").str.title()
species = split_data[2].str.replace("Species: ", "").str.title()
botanical_name = split_data[3].str.replace("Botanical Name: ", "").str.title()

In [12]:
extracted_df = pd.DataFrame(
    {
        "tree_id": tree_id,
        "location": location,
        "species": species,
        "botanical_name": botanical_name,
    }
)

#### Add the extracted values back into our main dataframe, and remove vacant tree spots

In [13]:
gdf = (
    pd.concat(
        [
            gdf_src.drop(
                columns=[
                    "objectid",
                    "tooltip",
                    "nla_url",
                    "type_description",
                    "type",
                    "treeid",
                ]
            ),
            extracted_df,
        ],
        axis=1,
    )
    .set_crs("4326")
    .query("species != 'Vacant - Ok To Plant'")
)

#### Define a function to extract longitude and latitude from a Point geometry

In [14]:
def extract_lon_lat(point):
    return pd.Series([point.x, point.y])

#### Apply the function to the 'geometry' column and create new 'longitude' and 'latitude' columns

In [15]:
gdf[["longitude", "latitude"]] = gdf["geometry"].apply(extract_lon_lat)

---

## Results

#### How many trees?

In [16]:
len(gdf)

517009

#### How many trees without a species specified? 

In [17]:
len(gdf.query("species == 'Not Specified'"))

157961

#### How many distinct species?

In [18]:
len(gdf.query("species != 'Not Specified'")["species"].str.strip().unique())

545

#### Random five trees

In [19]:
gdf.sample(5)

Unnamed: 0,geometry,tree_id,location,species,botanical_name,longitude,latitude
286975,POINT (-118.45641 33.99122),940159,2317 Oakwood Av - S2,Chinese Evergreen Elm,Ulmus Parvifolia Semperviren,-118.456406,33.991222
162931,POINT (-118.49903 34.06910),647949,1883 Westridge Rd - F11,Myoporum,Myoporum Laetum,-118.499033,34.069097
603075,POINT (-118.25984 34.11131),1283241,2942 Gracia St - F2,Carob,Ceratonia Siliqua,-118.259844,34.111308
189439,POINT (-118.30533 33.83964),812845,20901 La Salle Av - F3,Mexican Fan Palm,Washingtonia Robusta,-118.305326,33.839642
135981,POINT (-118.41219 34.26875),621165,Not Specified,Not Specified,Not Specified,-118.412195,34.268752


---

## Exports

#### JSON

In [20]:
gdf[
    ["tree_id", "location", "species", "botanical_name", "longitude", "latitude"]
].to_json("../data/processed/la_tree_locations.json", indent=4, orient="records")

#### CSV

In [21]:
gdf[
    ["tree_id", "location", "species", "botanical_name", "longitude", "latitude"]
].to_csv("../data/processed/la_tree_locations.csv", index=False)

#### GeoJSON

In [22]:
gdf.to_file(
    "../data/processed/la_tree_locations_locations.geojson",
    driver="GeoJSON",
)