## Predicting Graffiti Rates in San Diego 

#### K. Schlesinger,  Partnering with the City of San Diego Performance & Analytics Department

Whether large or small, graffiti has a significant effect on neighborhood quality. At the most basic level, graffiti is quite expensive to repair, costing San Diego upwards of $5 million dollars in a given year<sup>[1](#fn1)</sup>. It also has a detrimental effect on property values and can potentially increase crime rates. According to the Broken Windows theory<sup>[2](#fn2)</sup>, small crimes, such as vandalism, leads to residents disengaging from their neighborhoods. Over time, this community withdrawal makes the area more susceptible to serious crimes.  

San Diego is committed to decreasing graffiti. In June, they announced a new <a href="https://www.sandiego.gov/get-it-done">"Get It Done"</a> reporting system, where citizens can submit reports on a wide range of problems, like graffiti, tree hazards, and potholes. With this system, city government would like to learn more about the range of problems in different municipal areas and improve their response time. 



Unfortunately, not every area is using the 311 system equally. The bulk of the ~20,000 reports stem from a small number of areas. Below, we have divided the San Diego municipal area into <a href="https://en.wikipedia.org/wiki/Census_block_group">Census block groups</a>. Overlaid is a heatmap showing the frequency of 311 reports in each area. 

<img src="output_figures/report_distribution.png">

As you can see, most of the reports are clustered around the Balboa park area downtown. There are 864 blockgroups in the San Diego municipal area. 264 of these have filed less than 10 reports since the "Get It Done" app has gone lived. These areas account for 16% of the land area and 26% of the population of San Diego. Most importantly, we don't know if we're not getting reports from these areas because they don't have problems or they're simply just not using 311. The city could be unaware of serious problems in these areas, resulting in them underserving a large portion of their citizens. 

Fortunately, we can use the information from blockgroups that **_are_** reporting to understand the needs of those that **_aren't_**. 

To constrain the graffiti patterns in reporting neighborhoods, I combined information from a wide range of data sources, including: 
- Demographics from the <a href="https://factfinder.census.gov/faces/nav/jsf/pages/programs.xhtml?program=acs">American Community Survey</a>. Among other information, I looked at: 
    - Age
    - Race and Ethnicity
    - Household Income
    - English Fluency 
    - Property Values 
    - Typical rent costs
- <a href="http://data.sandiego.gov/dataset/get-it-done-requests-311">311 Get It Done Requests</a>, of which ~3000 are graffiti-related
- <a href="http://data.sandiego.gov/dataset/code-enforcement-violations">Code Enforcement Violations</a>, including abandoned buildings
- <a href="http://www.sandag.org/index.asp?classid=14&subclassid=21&projectid=446&fuseaction=projects.detail">Public Crime Data</a>
- <a href="http://www.sandag.org/index.asp?fuseaction=home.home">San Diego Infrastructure Information</a> such as: 
    - City Street Lights
    - Publicly maintained Trees
    - Police beats
    
    

I could then map out San Diego in a range of categories, looking for any possible patterns in reported graffiti. 

### Interactive javascript code here?

As I explored the data, a few clear patterns arose. Graffiti tended to cluster in lower income areas. 

### OTHER CORRELATIONS BUT DON'T MAKE THIS OFFENSIVE. 



After looking at the various correlations between my graffiti rate and other neighborhood characteristics, I combined a series of features together in a <a href="http://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares">Linear Regression Model</a>. As this model can be severely affected by collinearity, I had to be very careful with the input features. In its favor, this model is clearly interpretable, as you can examine the effect each feature has on the overall graffiti rate. This is particularly valuable in this context, as it can help San Diego better understand their graffiti problem. 

## MORE DETAILS ABOUT MODEL CHOICE HERE. 

To train my model, I need to extract a sample of blockgroups that have a range of graffiti rates. 

<a name="fn1">1</a>: <a href="http://www.sandag.org/uploads/publicationid/publicationid_1667_14466.pdf">Graffiti Tracker: An Evaluation of the San Diego County Multi-Discipline Graffiti Abatement program</a> 

<a name="fn2">2</a>: <a href="https://en.wikipedia.org/wiki/Broken_windows_theory">Broken Windows Theory</a> 

#### The Problem: 

- San Diego wants more efficiently and effectively respond to citizen complaints about neighborhood graffiti filed with their 311 system. 
- To primary types of graffiti: tagging and gang-related
    - Tagging: to get publicity for your graffiti group, etc. Expect that tagging is discouraged if the tags are rapidly painted over. Not always from the actual area they're tagging. 
    - Gang-related: marking the turf of a particular gang, memorial for members, posturing by crossing out the names of others. Not necessarily discouraged by rapid repainting.
- From Graffiti Tracker report (data on 2011 graffiti): 
    - Expect 22% gang-related (vast majority of these were also done for publicity)
    - 70% tagging
    - Remainder unclear
- In 2011, 129,639 square feet of damage in San Diego from graffiti. Costs about $250 to repair an individual graffiti area (<10 square feet) (2011 report, averages costs associated with painting and power washing).   
- Huge costs in repairs/maintenance alone and complicated breakdown because repairs handled by a range of different departments depending on where the graffiti occurred. 

- Also, graffiti has a negative affect on neighborhoods. The "Broken Windows" crime theory suggest that small infractions, like graffiti or illegal dumping, that persist over time can increase more violent and destructive crime. When graffiti is not rapidly removed, it suggests that the citizens in an area are not invested in the conditions and quality of life in an area, making it a prime location for increased crime. 

- San Diego wants to identify areas of high graffiti frequency to: 
    - establish preventative measures (increased patrolling, community murals, etc) to save maintenance costs 
    - improve response time to avoid an area appearing to be neglected. 
    
- Not all areas of San Diego make use of the 311 system. However, it is unclear if they're not filing reports because they don't have problems or they are simply not reporting neighborhood issues. 
- Using information from the areas that do file 311 reports, I will identify areas that have a higher fraction of graffiti reports. 
- I will look for demographic patterns that distinguish areas in San Diego with high and low rates of graffiti. 
- With this demographic information, I can then design a machine-learning model that will predict the expected graffiti rate for an area, even if they're not filing reports.
- When they know which areas to focus on, San Diego city government can establish programs to prevent graffiti in the first place, such as community murals and increased patrolling. This will save them time and money, as they will not have to send maintenance crews to the same areas repeatedly. 




*a brief caveat*

For this analysis, I will be dividing the San Diego municipal areas into blockgroups. This is a region defined by the U.S. Census that ranges in population from 600 to 3,000 people. This will allow us to look for graffiti problems with fine resolution (e.g., we can be more precise about exactly where in San Diego we expect problems), while also ensuring a significant enough population for meaningful, and recent, demographic information. 

#### Why do we care?
- The average number of reports filed per blockgroup is 22. 
- Looking in more detail, we see that there are a handful of blockgroups that file a huge number of reports (16 with more than one 100 reports!), but most aren't filing many at all.

<img src="output_figures/cumulative_fraction_reports.png">


- Let's consider areas that have filed less than 10 reports since the end of May. With so few reports, this means that we know very little about the issues for: 
    - 264 block groups out of 863 in San Diego 
    - 16% of the land area in San Diego
    - 26% of the population of San Diego
    
Most importantly, we don't know if we're not getting reports from these areas because they don't have problems or they're simply just not using 311.     


The San Diego government wants to serve all areas of the city, even those that are not currently using the 311 system. If these areas are having unreported problems, they want to know about it!  

#### Mapping out the Graffiti

To help us understand the needs of areas that aren't using 311, we want to look at areas that are. What fraction of their 311 complaints are graffiti related? 

Below, we map out each of the census block groups in San Diego, color-coded by the fraction of reports that are graffiti-related. Darker colors a higher fraction of reports are graffiti related. Lighter colors means one of two things; either a) they don't have many reports at all or b) they have plenty of reports but few are graffiti related. Below, I outline blockgroups with less than 10 reports.

<img src="output_figures/graffiti_per_map.png">

Some of the areas with very little reporting are in regions where the general fraction of graffiti reports is low overall. However, for some others, there are high graffiti rates nearby. Thus, we cannot just assume that a low number of reports means that there is not a graffiti problem.

#### Bringing in Demographic information 
- Want to know if there are demographic patterns that distinguish graffiti rich and poor areas. 
- For each block group, I pull information from the American Community Survey, part of the U.S. Census. 
    - Age
    - Income 
    - Home value, contract rent
    - Racial breakdown 
    - Fraction on Public Assistance 
    - Fraction of homes that are owner-occupied
    - Total population 
- I also combine this with additional information from the city of San Diego: 
    - Reported building code violations
    - Crime reports 
    - Street light locations 
    