# LA Street Names
> This notebook fetches a dataframe of street names and their history from the [LA Street Names](https://lastreetnames.com/) project by [Mark Tapio Kines](https://cassavafilms.com/about). The data is collected only as a personal web scraping exercise and for local data analysis. 

#### Load Python tools and Jupyter config

In [1]:
import requests
import pandas as pd
import jupyter_black
from bs4 import BeautifulSoup
from tqdm.notebook import tqdm

In [2]:
jupyter_black.load()
pd.options.display.max_columns = 100
pd.options.display.max_rows = 1000
pd.options.display.max_colwidth = None

---

## Fetch
> The data can be collected in three steps: A [main directory](https://lastreetnames.com/street/) page > [alphabetical pages](https://lastreetnames.com/alpha/a/) > street [detail pages](https://lastreetnames.com/street/aaron-street/). The goal is to collect each street name and its corresponding url, map and history text. 

#### Headers for requests

In [3]:
headers = {
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
}

#### Function to fetch the initial directory of pages

In [4]:
def fetch_directory_urls():
    r = requests.get("https://lastreetnames.com/street/", headers=headers)
    s = BeautifulSoup(r.text, "html.parser")
    urls = []
    for p in s.find_all("li", attrs={"class": "cat-item"}):
        a_tag = p.find("a")
        if a_tag and "href" in a_tag.attrs:
            urls.append(a_tag["href"])
    return urls

#### Function to extract the street data from each directory page

In [5]:
def extract_street_data(url):
    r = requests.get(url, headers=headers)
    soup = BeautifulSoup(r.text, "html.parser")
    streets = []
    for article in soup.find_all("article", class_="street"):
        street_name = article.find("h2", class_="entry-title").get_text(strip=True)
        neighborhood_info = article.find("span", class_="neighborhoods-meta").get_text(
            strip=True
        )
        street_url = article.find("a", href=True)["href"]
        streets.append((street_name, neighborhood_info, street_url))
    return streets

#### Function to extract detailed information from each street page

In [6]:
def extract_details(url):
    r = requests.get(url, headers=headers)
    soup = BeautifulSoup(r.text, "html.parser")
    data = {"street_url": url}
    data["street_name"] = (
        soup.find("h1", class_="entry-title").get_text(strip=True)
        if soup.find("h1", class_="entry-title")
        else None
    )
    neighborhood_tag = soup.find("span", class_="terms-neighborhoods")
    data["neighborhood"] = (
        neighborhood_tag.find("a").get_text(strip=True)
        if neighborhood_tag and neighborhood_tag.find("a")
        else None
    )
    data["description"] = (
        soup.find("div", class_="entry-content").get_text(strip=True)
        if soup.find("div", class_="entry-content")
        else None
    )
    map_area = soup.find("div", class_="map-area")
    data["map_url"] = (
        map_area.find("img")["src"] if map_area and map_area.find("img") else None
    )
    return data

---

## Execute

#### Run the fetch functions

In [7]:
directory_urls = fetch_directory_urls()

all_streets = []
for directory_url in tqdm(directory_urls):
    all_streets.extend(extract_street_data(directory_url))

details_list = [extract_details(street[2]) for street in tqdm(all_streets)]

  0%|          | 0/27 [00:00<?, ?it/s]

  0%|          | 0/454 [00:00<?, ?it/s]

---

## Store

#### Convert the list of dictionaries to a DataFrame

In [8]:
df = pd.DataFrame(details_list)

#### How many streets? 

In [9]:
len(df)

454

#### The resulting dataframe: 

In [10]:
df.head(1)

Unnamed: 0,street_url,street_name,neighborhood,description,map_url
0,https://lastreetnames.com/street/0001st-street/,1st Street,Los Angeles (Citywide),"First things first. The numbering system of L.A.’s streets was established by 1846. In 1883, addresses south of 1st Street – technically southwest, since DTLA’s grid is laid out at a 36° angle; more on that in a second – were given the postal designation “South”, while all addresses north of 1st were “North”. That’s why we have, for example, 789 N.La Breaand 789 S. La Brea. Likewise, addresses east ofMain Streetwere designated “East” and so on. Now about that 36° angle, which you can see on a map: this reveals L.A.’s Spanish roots. The Laws of the Indies, set forth in 1573 to develop Spanish colonies worldwide, decreed that streets be laid out at a 45° angle so that all structures may receive equal sunlight throughout the day. The geography of young Los Angeles didn’t quite allow for that, so early urban planners got as close as they could with 36°. Once L.A. expanded west of today’sHoover Street, the city – now an American one – adopted Thomas Jefferson’s rationalist north-south grid system.",https://lastreetnames.com/wp-content/uploads/2022/07/1st-street-map.png


In [11]:
df.tail(1)

Unnamed: 0,street_url,street_name,neighborhood,description,map_url
453,https://lastreetnames.com/street/zook-drive/,Zook Drive,Glendale,"Probably named for Omer Law “O.L.” Zook (1873-1942), a teacher-turned-lumber dealer-turned-real estate agent. Born in Illinois and raised in Iowa, Zook moved to Oklahoma and married Elsie Mae Woodmancy (1885-1965) in 1902. The couple relocated to Glendale in 1920 and had one daughter, Virginia Mae (1923-1992), who later married a fellow named Ed Wasil. Although the Zooks lived two miles away from Zook Drive, the street was named in 1937 while Zook was still active in real estate. He is thus a safe bet as its namesake. (Family footnote: O.L.’s brother A.J. Zook, a physician, came to Burbank in 1924.)",https://lastreetnames.com/wp-content/uploads/2022/04/zook-drive-map.png


---

## Export

#### JSON

In [12]:
df.to_json(
    f"data/processed/la_street_names.json",
    indent=4,
    orient="records",
)

#### CSV

In [13]:
df.to_csv("data/processed/la_street_names.csv", index=False)