# Final Project

*Your name here*

### About this template

This file is a **template** for filling out and submitting your final project. As such, I've created sub-sections along the lines of what we'd like to see. Your job is to **fill out** these sections, using the dataset and research question of your choice.

### Things to be aware of

- Your project likely depends on **data**. Make sure any dataset you will analyze is *stored in your DataHub directory*, so you can submit it along with your project.  
- Each of these sections will be assigned a point score. Make sure you add code cells in the relevant section, as needed.
- The final project should be completed independently.

## Introduction (2 pts.)

Questions to answer:

1. What dataset are you looking at? 
2. Where/how was it created? 
3. What research question(s) will you be asking? 

These should be answered in Markdown.

### Dataset #1
- Dataset Name: Downtown San Diego Unsheltered Count
- Link to the dataset: [downtownsandiego.org](https://downtownsandiego.org/clean-and-safe/unhoused-care/) [(pdf)](https://downtownsandiego.org/wp-content/uploads/2024/03/February-2024-Unsheltered-Count-w-Maps.pdf)
- Number of observations: 146 (12 months * 12 years + 2 months)
- Number of variables: 2 (date, count)

This dataset contains monthly counts of unsheltered people performed by the Downtown San Diego Partnership using the Clean & Safe Program methodology. The `.pdf` was converted to `.csv` by hand.

### Dataset #2
- Dataset Name: Downtown San Diego Precipitation
- Link to the dataset: generated from [scacis.rcc-acis.org](https://scacis.rcc-acis.org/)
- Number of observations: 146 (12 months * 12 years + 2 months)
- Number of variables: 2 (date, precipitation)

This dataset contains the monthly sum of precipitation (in inches of water) January 2012 to March 2024, measured at the San Diego International Airport weather station. This acts as a proxy for the precipitation experienced across the rest of Downtown San Diego.

### Research Question
How closely has the unsheltered homeless count correlated to precipitation?

## Data (3 pts.)

This section should contain **descriptive statistics** about your data. This includes (but is not limited to):

1. Overall `shape` of the data. 
2. Summary statistics, e.g., central tendency, variability, of key **features** (i.e., columns).
3. Histograms / count-plots of key features (i.e., columns). 
4. Information about missing values, if relevant.  
5. Information about **merging** datasets, if relevant.

These should be **answered** using Python code (but can be written in Markdown if you prefer).

In [1]:
# imports
import pandas as pd

# helpers
def fetchMeltMerge(csv, id, var, value, data):
  newData = pd.read_csv(csv, skipinitialspace=True)
  newMelt = pd.melt(newData, id_vars=[id], var_name=var, value_name=value)
  
  if data.empty:
    return newMelt
  else:
    return pd.merge(data, newMelt, on=[id, var], how='outer')


## Visualizations (4 pts.)

This section should contain:

- **2-3 graphs** showing specific patterns or features you'd like to highlight. 
- Each visualization should be accompanied by a **short (1-2 sentences) description** of what you think it shows.

These should be **produced** using Python code (but the descriptions can be written in Markdown if you prefer).

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Analyses (4 pts.)

This section should contain:

- **2-3 analyses** using methods discussed in class (e.g., linear regression, logistic regression, etc.) to address your question.
- Each analysis should be accompanied by a short (1-3 sentences) **interpretation**. 
- Should also include **evaluation** of your model somehow, e.g., $R^2$, AIC, etc. 

These should be **produced** using Python code (but the interpretations can be written in Markdown if you prefer).

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Limitations and Ethical Issues (3 pts.)

This section should contain a discussion of any **limitations** to your analysis, as well as any **ethical issues**, if relevant.

- Limitations could range from issues in the data (e.g., poor generalizability, biased sample) to the assumptions of the analysis (e.g., homoscedasticity vs. heteroscedasticity), and so on.
- Ethical issues should focus on concepts covered in class, e.g., relating to bias and/or privacy.  

These should be answered in Markdown.

YOUR ANSWER HERE

## Conclusion (1 pt.)

Draw a conclusion about the dataset and the questions you posed.

These should be answered in Markdown.

YOUR ANSWER HERE