# Los Angeles neighborhoods
> This notebook looks at the city's unique boundaries, [drawn by the Los Angeles Times](https://github.com/datadesk/boundaries.latimes.com). These areas don't directly correspond to U.S. Census Bureau geographies, but population estimates have been derived from block-level data by the Times and others. These estimates [have been archived and released](https://censusreporter.org/user_geo/12895e183b0c022d5a527c612ce72865/) by Census Reporter, an independent project seeking to make the bureau's products easier to use.

---

#### Load Python tools and Jupyter config

In [1]:
import json
import pandas as pd
import altair as alt
import jupyter_black
import geopandas as gpd

In [2]:
jupyter_black.load()
pd.options.display.max_columns = 100
pd.options.display.max_rows = 1000
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

---

## Read data

#### Import geojson file with boundaries, with selected race categories, and clean up column names

In [3]:
gdf = gpd.read_file("../data/raw/la_neighborhoods_race.geojson")[
    [
        "id",
        "name",
        "original_id",
        "p0010001_2020",
        "p0010002_2020",
        "p0010003_2020",
        "p0010004_2020",
        "p0010006_2020",
        "p0010008_2020",
        "p0010005_2020",
        "p0010007_2020",
        "p0010009_2020",
        "geometry",
    ]
].rename(
    columns={
        "p0010001_2020": "pop",
        "p0010003_2020": "white",
        "p0010002_2020": "pop_one_race",
        "p0010004_2020": "black",
        "p0010006_2020": "asian",
        "p0010005_2020": "ai_an",
        "p0010007_2020": "nh_pi",
        "p0010008_2020": "other",
        "p0010009_2020": "multirace",
    }
)

---

## Categorize

#### What's the percentage of the population of each race group by neighborhood

In [4]:
numeric_columns = [
    "pop_one_race",
    "white",
    "black",
    "asian",
    "other",
    "ai_an",
    "nh_pi",
    "multirace",
]

In [5]:
for col in numeric_columns:
    pct_col_name = col + "_pct"  # New column name
    gdf[pct_col_name] = ((gdf[col] / gdf["pop"]) * 100).round(2)  # Calculate percentage

#### Which race is most common in each neighorhood

In [6]:
race_cols = ["white", "black", "asian", "other", "ai_an", "nh_pi", "multirace"]

#### Calculate the "majority" column

In [7]:
def calculate_majority(row):
    # Extract the race columns for this row and find the max value(s)
    race_values = row[race_cols]
    max_value = race_values.max()
    # Identify all races that have the max value (handling ties)
    majority_races = race_values[race_values == max_value].index.tolist()
    # Concatenate them into a single string if there's more than one
    return ", ".join(majority_races)


gdf["majority"] = gdf.apply(calculate_majority, axis=1)

#### Calculate a "plurality_white" column

In [8]:
def calculate_plurality_white(row):
    # Sum of all non-white populations
    non_white_sum = sum(row[col] for col in race_cols if col != "white")
    # Compare "white" population to the sum of non-white populations
    return row["white"] > non_white_sum


gdf["plurality_white"] = gdf.apply(calculate_plurality_white, axis=1)

---

## Exports

#### Reorder columns so geometry is at the end

In [9]:
cols_except_geometry = [col for col in gdf.columns if col != "geometry"]
new_column_order = cols_except_geometry + ["geometry"]
gdf = gdf[new_column_order]

#### GeoJSON

In [10]:
gdf.to_file(
    "../data/processed/la_neigborhoods_pop_race.geojson",
    driver="GeoJSON",
)