**If you lost points on the last checkpoint you can get them back by responding to TA/IA feedback**  

Update/change the relevant sections where you lost those points, make sure you respond on GitHub Issues to your TA/IA to call their attention to the changes you made here.

Please update your Timeline... no battle plan survives contact with the enemy, so make sure we understand how your plans have changed.

# COGS 108 - Data Checkpoint

# Names

- Haoyu Fu
- Qianxia Hui
- Arianna Morris
- Michael Tang
- Bofu Zou

# Research Question

Does parking occupancy on the UCSD campus predict traffic incidents on major roads near the UCSD campus?

## Background and Prior Work

Traffic incidents are always a significant concern in modern cities, especially in densely populated areas. As a large institution with a growing population, the University of California, San Diego (UCSD) campus tackles the problems brought by heavy vehicular traffic and parking. As the campus community grows, the parking occupancy rates may reflect the number of traffic incidents happening around the campus area. Our study seeks to find out whether there’s a correlation between the UCSD campus parking occupancy rates and the traffic incidents happening surrounding campus. 

While our topic is mainly focused on how parking occupancy may predict the occurrence of traffic collisions within the area of UCSD, there have been prior studies that analyze the likelihood of car accidents occurring in a given geographic region. For example, Forbes Advisor compiled a list of the top 50 U.S. cities with the highest likelihood of getting into a car accident, with San Diego being one of them.<a name="cite_ref-1"></a>[<sup>1</sup>](#cite_note-1) This ranking system was determined by data for fatality rate per 100,000 people from the National Highway Transportation Safety Administration and average years between collisions and relative collision likelihood from the AllState Best Drivers Index. This data analysis only focuses on urban cities, which means the dataset is relatively small as only the 50 largest cities by population from census data were chosen. A general understanding of a particular city's collision danger is approximated by comparison to other major cities in the dataset. Forbes Advisor’s primary goal is providing financial advice, which this data analysis does by offering relevant information about collisions that may involve insurance, lawyer/attorney, and risk assessment situations. 

Another study investigated the causal relationship between road density and parking occupancy.<a name="cite_ref-2"></a>[<sup>2</sup>](#cite_note-2) Tunisian researchers conducted a research paper to investigate the causal relationship between road density and parking occupancy in Tunis city center using Granger causality tests based on vector error correction modeling. The authors collected data using video cameras around a major street in the capital of Tunis, and found that there does exist a causal relationship between road density and parking occupancy, with road density Granger-causing parking occupancy. This suggests that increasing road density may lead to an increase in parking occupancy, which in turn may lead to an increase in road congestion. The authors suggest that their findings can be used to develop more sustainable parking policies that reduce road congestion and can be incorporated into parking models to improve their accuracy and effectiveness. This study provides important insights into the relationship between road density and parking occupancy, and gives us some ideas for our own project since road density and traffic incidents are two significant features of a city’s traffic conditions.

Similarly, we want to assess what the general level of safety associated with traffic collisions might be through exploring a more confined geographical area that is of interest to us, UCSD. Based upon these prior works and the UCSD campus parking occupancy and nearby traffic incidence data, our project aims to identify patterns specific to UCSD campus and its surrounding areas. We will conduct a descriptive and exploratory data analysis of these datasets, which will allow us to identify important features of the datasets and how they may relate to one another. We aim to contribute to the existing literature on road safety and how it impacts us on the UCSD campus. 

**References:**

1. <a name="cite_note-1"></a> [^](#cite_ref-1) Christy Bieber, J. D. (2023, October 25). The cities where you’re most likely to get in a car accident. Forbes. https://www.forbes.com/advisor/legal/auto-accident/cities-most-car-accidents/  
2. <a name="cite_note-2"></a> [^](#cite_ref-2) Hassine, S. B., Kooli, E., & Mraihi, R. (2022). The causal relationship between road density and parking occupancy. World Journal of Advanced Research and Reviews, 15(3), 125–134. https://wjarr.com/sites/default/files/WJARR-2022-0896.pdf


# Hypothesis


We hypothesize that there will be a positive correlation between the number of cars parked on campus and the amount of traffic incidents occurring on and around the UCSD campus. We believe this because more congestion within the campus area could lead to more traffic incidents occurring.

# Data

## Data overview

For each dataset include the following information
- Dataset #1
  - Dataset Name:
  - Link to the dataset:
  - Number of observations:
  - Number of variables:
- Dataset #2 (if you have more than one!)
  - Dataset Name:
  - Link to the dataset:
  - Number of observations:
  - Number of variables:
- etc

Now write 2 - 5 sentences describing each dataset here. Include a short description of the important variables in the dataset; what the metrics and datatypes are, what concepts they may be proxies for. Include information about how you would need to wrangle/clean/preprocess the dataset

If you plan to use multiple datasets, add a few sentences about how you plan to combine these datasets.

## Dataset #1 (use name instead of number here)

In [1]:
## YOUR CODE TO LOAD/CLEAN/TIDY/WRANGLE THE DATA GOES HERE
## FEEL FREE TO ADD MULTIPLE CELLS PER SECTION
## Import pandas to read csv file
import pandas as pd
parking_data = pd.read_csv('QbyQ UCSD Parking Occupancy.csv', usecols = ['Quarter', 'Year', 'Parking Spaces', 'Empty Spaces', 'Occupied Spaces', '% Occupied'])
parking_data.head(5)

Unnamed: 0,Quarter,Year,Parking Spaces,Empty Spaces,Occupied Spaces,% Occupied
0,Summer,2016,19297,6567,12730,65.97%
1,Fall,2016,19245,3578,15667,81.41%
2,Winter,2017,18316,2691,15625,85.31%
3,Spring,2017,18334,3096,15238,83.11%
4,Summer,2017,18082,5050,13032,72.07%


## Dataset #2 (if you have more than one, use name instead of number here)

In [2]:
## YOUR CODE TO LOAD/CLEAN/TIDY/WRANGLE THE DATA GOES HERE
## FEEL FREE TO ADD MULTIPLE CELLS PER SECTION
## import csv file with collision data
collision_data = pd.read_csv('https://seshat.datasd.org/traffic_collision_details/pd_collisions_details_datasd.csv',
usecols = ['date_time', 'address_road_primary', 'injured', 'killed'], parse_dates = ['date_time'])
collision_data.head(5)

Unnamed: 0,date_time,address_road_primary,injured,killed
0,2015-01-14 20:00:00,JUNIPER,0,0
1,2015-03-19 12:00:00,LINDA VISTA,0,0
2,2015-03-24 03:05:00,WASHINGTON,2,0
3,2015-03-27 23:56:00,WORDEN,1,0
4,2015-07-06 11:45:00,EL CAJON,0,0


# Ethics & Privacy

- Thoughtful discussion of ethical concerns included
- Ethical concerns consider the whole data science process (question asked, data collected, data being used, the bias in data, analysis, post-analysis, etc.)
- How your group handled bias/ethical concerns clearly described

Acknowledge and address any ethics & privacy related issues of your question(s), proposed dataset(s), and/or analyses. Use the information provided in lecture to guide your group discussion and thinking. If you need further guidance, check out [Deon's Ethics Checklist](http://deon.drivendata.org/#data-science-ethics-checklist). In particular:

- Are there any biases/privacy/terms of use issues with the data you propsed?
- Are there potential biases in your dataset(s), in terms of who it composes, and how it was collected, that may be problematic in terms of it allowing for equitable analysis? (For example, does your data exclude particular populations, or is it likely to reflect particular human biases in a way that could be a problem?)
- How will you set out to detect these specific biases before, during, and after/when communicating your analysis?
- Are there any other issues related to your topic area, data, and/or analyses that are potentially problematic in terms of data privacy and equitable impact?
- How will you handle issues you identified?

# Team Expectations 

1. Communicate through group messages when we are making changes to the project (on github documents). 
2. Meet weekly at a time we all agree on
3. Equitable contribution
    - Each team member works through their portion(s) of the project equally 
    - If issues arise, communicate sooner than later
    - Ask another team member for help/advice if you run into any issues

# Project Timeline Proposal

| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
|  10/23  |  5 PM  | Read & think about previous projects given to review; brainstorm topics/questions  | Determine best form of communication; Complete project review; Discuss possible final project topics |
|  10/30  |  5 PM  |  Brainstorm final project topics | Decide final project topic and split up work for project proposal |
|  11/1  |  5 PM  | Work on project proposal | Finalize and submit final project proposal |
|  11/11  |  6 PM  | Have project proposal submitted; Search for datasets  | Discuss wrangling and possible analytical approaches; Assign group members to lead each specific part |
|  11/13  |  5 PM  | Import & wrangle data; EDA | Finalize data checkpoint, Review/edit wrangling/EDA; Discuss analysis plan |
|  11/20  |  5 PM  | Finalize wrangling/EDA; Begin analysis | Discuss/edit analysis; Complete project check-in |
|  11/27  |  5 PM  | Complete analysis; Draft results/conclusion/discussion | Finalize EDA checkpoint; Discuss/edit full project |
|  12/4  |  5 PM  | NA | Record the final project video |
