# COGS 108 - Project Proposal

# Names

- Dean Nafarrete
- Emily Le
- Cedric Jeng
- Kevin Morales
- Richard Lao

# Research Question

Is there a statistically significant relationship between a region's economic output, environmental degradation, and homelessness on crime rates? How can we use these variables to create a new standard measurement for economic health? 


## Background and Prior Work


<span style = "font-family: 'Segoe UI'; font-size: 14px;">
    <p>Since the advent of the modern industrial city, crime rates in urban areas have been a consistent concern. By nature, cities are densely populated with a robust infrastructure and diverse economy, and tend to attract more people. With such a large population in a small area, it’s inevitable that crime will rise proportionally; however, underlying factors beyond industrialization are at play in these statistics. Many hold the sentiment that cities are less ideal places to settle due to factors like crime rate, population density, and homelessness while praising the large economies that seemingly contribute to these issues. Ultimately, cities have become an integral part of the globalized economy, serving as centers for trade and culture despite the negative connotations. Consequently, efforts to remedy these issues by identifying and addressing the root causes tend to go overlooked or rarely implemented. Economic health and overall quality of life are not mutually exclusive as many seem to believe. An in-depth analysis of the effects of the physical and social environment may provide insight into issues that require more allocation to reduce crime rates in the future. In studying these factors, establishing a new measure for economic health that is inclusive of issues that impact those living within cities can deviate focus away from simple measures like GDP, which only measure the gross economic output of a given region, and allow for more nuanced discussions on how to maximize economic returns while reducing the toll on average citizens.<p>
    <p>The disproportionate amount of crime in urban areas is a topic that has been heavily explored by sociologists and economists alike. An article from the Journal of Political Economy, “Why is There More Crime in Cities?”,  compiled a number of different theories. They note that crime rates in cities outpace their rural counterparts even when accounting for the larger population. Some of the theories they posit suggest that dense city environments may cultivate less connected communities compared to smaller towns, decrease risk for criminals, or that those looking to profit from crime are likely to find it in economic centers. Ultimately, they found that crime reporting is underrepresented in cities, and the likelihood of arrest for a given crime is lower.<a name="#cite_note-1">[1]</a> Regardless, it’s difficult to pinpoint any one cause, and the theories are merely speculation on the social implications on crime. As for the environment, much less research is evident. The council on Strategic Risks highlighted this issue; extreme weather events, higher temperatures, and social factors related to stresses based on climate have been linked to various forms of crime, especially violent crime. Some examples include gender-based violence against women increasing following adverse weather events and the likelihood of mass shooting events increasing in the summer months. The working theory is that changes in the environment may indirectly cause stresses that incentivize crime more by reducing the perceived risk for potential criminals.<a name="#cite_note-2">[2]</a> Directly connecting a factor of climate change, like environmental degradation, may lend more credence to this issue being a factor in addressing crime in the future.<p>
    <p>Along with changes in the climate, income inequality is a known factor in crime rates. Increasing fears over crime often go hand in hand with homelessness; as such, this phenomenon has been explored in the past. The Institute of Labor Economics studied this effect in California, finding that homeless rates and crime rates are linked, but how they are linked is interesting; high rates of homelessness increase the number of violent crimes, but not property crimes.<a name="cite_note-3">[3]</a> Given this study was conducted at the state level, perhaps focusing on a single area may reaffirm or contradict this finding, depending on how a smaller, less diversified economy may have an effect. On the whole, while larger factors on crime rates appear to go overlooked, there have been some initiatives to address this disparity between economy, climate, inequality, and crime. Several states have implemented an alternate measure of economic health known as the genuine progress indicator, or GPI. This standard allows states to take into consideration non-economic factors like the environment and human health standards on the economy. According to the government of Maryland, this measure informs policymakers of economic progress without purely looking at the economic output, which may increase at the expense of its citizens.<a name="cite_note-4">[4]</a> We believe that this approach is more progressive on these issues and may prove to benefit the economy and well being of individuals in unison, however, the fact that this model is only implemented in a few states as a policy informing measurement is insufficient. Finding a general link between these factors may reveal the true cost of these factors, and potentially allow us to devise another measurement that can be achieved using public data.<p>
  </span>



1. <a name="cite_note-1"></a> [^](#cite_ref-1) Glaeser, E. L., & Sacerdote, B. (1999). Why is There More Crime in Cities? Journal of Political Economy, 107(S6), S225–S258. https://doi.org/10.1086/250109
2. <a name="cite_note-2"></a> [^](#cite_ref-2) Facini, A. (2024, October 17). Climate Change & Crime: A big, bad, largely overlooked Nexus. The Council on Strategic Risks. http://councilonstrategicrisks.org/2024/10/17/climate-change-crime-a-big-bad-largely-overlooked-nexus/ 
3. <a name="cite_note-3"></a> [^](cite_ref-3) Artz, B., & Welsch, D. M. (2024, June). Homelessness and crime: An examination of California. Institute of Labor Economics. https://docs.iza.org/dp17086.pdf 
4. <a name="cite_note-4"></a> [^](cite_ref-4) Campbell, E. (n.d.). Maryland Genuine Progress Indicator. Maryland Department of Natural Resources. https://dnr.maryland.gov/mdgpi/Pages/default.aspx 


# Hypothesis


Our hypothesis is that there is a significant relationship with environmental degradation, high economic output, and homelessness that affects crime rates, and using these factors we can create a new measure of a city’s well being, as an alternate measure of economic health. Higher outputs of the economy attract higher rates of crime because of the high foot traffic in retail centers where businesses sell products and people are carrying valuables or money. Additionally, the stress of falling into homelessness and being homeless have individuals resorting to committing crimes as one of their survival options. As the environment continues to decline, this will also lead to higher crime rates and in particular violent crimes due to higher temperatures inflicting environmental stress on the population.

# Data

1. Explain what the **ideal** dataset you would want to answer this question. (This should include: What variables? How many observations? Who/what/how would these data be collected? How would these data be stored/organized?)<br>
* Our ideal dataset would be a dataset that includes measures for a region's GDP, environmental degradation, homelessness, and crime rates. While measures for GDP, homelessness, and crime rates are straightforward, environmental degradation is not as clear. Thus we can use tools provided by the state such as CalEnviroScreen 4.0 to act as measures for a region's environmental degradation. Unemployment data can be found from the California Employment Development Department, crime data from OpenJustice led by the California Department of Justice, and U.S. Census for some population data. Alternatively, we have information from San Diego for data regarding specific regions within San Diego county. This can be obtained directly for the San Diego County database. All of this data is availible in usable forms (csv file, Microsoft Excel spreadsheet, etc.) through government websites. We would convert our files into pandas dataframes, which we would then merge into one dataframe that includes the data that we want. That is, the ideal dataset would be organized by it's observations of main regions in San Diego county, where the variables will be the region's income/capita, environmental degradation, homelessness, and crime crates.

1. Search for potential **real** datasets that could provide you with something useful for this project.  You do not have to find every piece of data you will use, but you do need to have demonstrated some idea that (a) this data is gettable and (b) that this data may be different from what your ideal is.<br>

* <u>**Environmental degradation**</u> data: https://oehha.ca.gov/calenviroscreen/report/calenviroscreen-40<br>

* **San Diego Median Income/Capita, Crime Rates, Unemployment info, based on specific regions in San Diego** <br>
https://data.sandiegocounty.gov/Live-Well-San-Diego/Live-Well-San-Diego-Database/wsyp-5xpf/about_data <br> 

* <u>**Unemployment.. Less than Ideal**</u> <br>
Unemployment rate for California (as a whole):  https://labormarketinfo.edd.ca.gov/geography/california-statewide.html <br>
Unemployment rate for California (as counties): https://labormarketinfo.edd.ca.gov/geography/lmi-by-county.html <br>
* <u>**Crime.. Less than Ideal**</u> <br>
Number of each crime commited, broken down to type of crime (basically all felonies) in California and county: https://openjustice.doj.ca.gov/exploration/crime-statistics/crimes-clearances <br>
 For a more general numbers of crime committed, we may look at this (includes misdemeanors numbers): https://openjustice.doj.ca.gov/exploration/crime-statistics/arrests <br>
For some insight on probation, including termination versus revoke, we may want to use: https://openjustice.doj.ca.gov/exploration/crime-statistics/adult-probation-caseload-actions <br>
For calculating proportions based on county, U.S. Census 2020-2024 provides population data: https://www.census.gov/data/tables/time-series/demo/popest/2020s-counties-total.html <br>


* All these sites have data that's not only obtainable but also easily processes because they are kept in Excel files. Excel files are csv files which can be easily turned into pandas dataframes. Of course, these sites contain more data than we need in our project, so tidying will be necessary. Moreover, we may choose a focus on specific types of crimes, such as violent crimes versus misdemeanors, or we may choose to look at all crime as a whole. 


# Ethics & Privacy

- Thoughtful discussion of ethical concerns included
- Ethical concerns consider the whole data science process (question asked, data collected, data being used, the bias in data, analysis, post-analysis, etc.)
- How your group handled bias/ethical concerns clearly described

Acknowledge and address any ethics & privacy related issues of your question(s), proposed dataset(s), and/or analyses. Use the information provided in lecture to guide your group discussion and thinking. If you need further guidance, check out [Deon's Ethics Checklist](http://deon.drivendata.org/#data-science-ethics-checklist). In particular:

- Are there any biases/privacy/terms of use issues with the data you propsed?
- Are there potential biases in your dataset(s), in terms of who it composes, and how it was collected, that may be problematic in terms of it allowing for equitable analysis? (For example, does your data exclude particular populations, or is it likely to reflect particular human biases in a way that could be a problem?)
- How will you set out to detect these specific biases before, during, and after/when communicating your analysis?
- Are there any other issues related to your topic area, data, and/or analyses that are potentially problematic in terms of data privacy and equitable impact?
- How will you handle issues you identified?

When it comes to dealing with ethics for our project, there may be potential region bias in the data available since it may be the case that there are missing regions that are underepresented in the available government datasets listed above. Addtionally, since we are mostly sampling from local government datasets there may be bias as we are not including other datasets that aren't federal which may pose as a confounding variable as not all crime, homelessness, and unemployment may be accounted for if not reported to the government. Though a confounding variable, the data collected from websites such as openjustice.doj.ca.gov permits the public usage of the data from their webiste, noting that their website public data is made sure to not include personal information of minors and or use copyrighted material. Addtionally, the data that we plan on scraping from census.gov already has personal identifying information removed and is publicly available for use as well, protecting personal privacy. Furthermore, there may be bias in our statistical analyses when it comes to looking at the rate of homelessness and crime rate for a specific high income cities which can bias our interpretations of the data. Addtionally, we are focusing our analyses on San Diego County which can potentially be more expensive of a county to live in compared to other counties in California. Additionally Counties like San Diego will have different economies which can potentially skew our analyses as San Diego may not be able to provide enough econimic diversity to tailor to all counties in California.



# Team Expectations 


Read over the [COGS108 Team Policies](https://github.com/COGS108/Projects/blob/master/COGS108_TeamPolicies.md) individually. Then, include your group’s expectations of one another for successful completion of your COGS108 project below. Discuss and agree on what all of your expectations are. Discuss how your team will communicate throughout the quarter and consider how you will communicate respectfully should conflicts arise. By including each member’s name above and by adding their name to the submission, you are indicating that you have read the COGS108 Team Policies, accept your team’s expectations below, and have every intention to fulfill them. These expectations are for your team’s use and benefit — they won’t be graded for their details.

* Use Discord to communicate. Within 12 hours expected, but 2 hours if close to deadline.
* Meet at least **once** a week - Every Friday 12:30 PM. 
* Have an open space where all voices are heard. Everyone should be open to ideas, criticisms, and suggestions. Make decisions as a group. Majority vote makes final decisions on major portions of the project.
* Specializations: Dean (Project Leader), Kevin (Analysis), Emily (Editor), Cedric and Richard (Coder). Project leader will be in charge of handling merge requests, delegating group responsibilities, and organizing roles/meets. Analysis is in charge of overseeing the analysis portion and guiding the rest of the group members on statistics-related items. Editor will be in charge of major writing responsibilities, including drafting reports and proofreading. Coders will advise on coding for the rest of the group and oversee data wrangling portion. These roles are not restrictive - everyone will work on each part a little, but these are the main *specializations*
* In the event someone is struggling to deliver something they promised on, it is expected to let the group know as soon as possible. That way, we can look to delegate the tasks among the others to compensate. Project Leader will make final decision to contact TA as needed.

# Project Timeline Proposal

Specify your team's specific project timeline. An example timeline has been provided. Changes the dates, times, names, and details to fit your group's plan.

If you think you will need any special resources or training outside what we have covered in COGS 108 to solve your problem, then your proposal should state these clearly. For example, if you have selected a problem that involves implementing multiple neural networks, please state this so we can make sure you know what you’re doing and so we can point you to resources you will need to implement your project. Note that you are not required to use outside methods.



| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 4/25  |  4 PM | Brainstormed topics and questions | Finalized research question and delegated tasks for proposal. Rough research done on topic. | 
| 4/28  |  8 PM |  Finalize Proposal. Hypothesis, Looking for Data, Ethics, How to Data Wrangle | Looking forward. Delegate tasks for how to research, retrieve data, and analysis plan | 
| 5/2  | 12:30 PM  | Compile list of datasets & clarify roles/organization  | Focus on group organization. Will we use branches? How will we check in. Potential analysis techniques we will do   |
| 5/9  | 12:30 PM  | Wrangle *some* data and have an idea on analysis | Review wrangling for correctness. Review analysis and plan   |
| 5/16  | 12:30 PM  | Finalize wrangling and go continue analysis | Dive in fully into analysis and have a complete review of project as a whole |
| 5/23 | 12:30 PM  | Complete analysis fully and begin drafting conclusions | Discuss difficulties and edit final draft together |
| 5/30 | 12:30 PM  | Review and fix any small details | Discuss final turn in of project before 6/13 |
| 6/13 | Before 11:59 PM | NA | Turn in Final Project & Group Project Surveys | 