# <a id='toc1_'></a>[Data Analysis](#toc0_)

Name
Topic
email
June 4th, 2023


**Table of contents**<a id='toc0_'></a>    
- [Statistics & Public Health 2: Data Analysis](#toc1_)    
- [1. Introduction](#toc2_)    
  - [Key Questions](#toc2_1_)    
- [2. Setup and Data Collection](#toc3_)    
- [3. Methods and Assumptions](#toc4_)    
- [4. Part 1 - Basic Analysis](#toc5_)    
  - [4.1. Data Preparation](#toc5_1_)    
  - [4.2. Exploratory Data Analysis](#toc5_2_)    
- [5. Part 2 - Statistical Analysis](#toc6_)    
  - [5.1. Hypothesis Testing](#toc6_1_)    
  - [5.2. Correlation Analysis](#toc6_2_)    
- [6. Part 3 - Advanced Statistical Analysis](#toc7_)    
  - [6.1. Linear Regression](#toc7_1_)    
  - [6.2. Logistic Regression](#toc7_2_)    
- [7. Conclusion](#toc8_)    
- [8. Submission](#toc9_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc2_'></a>[1. Introduction](#toc0_)

In this project, we will perform a set of analyses on the relationship between different variables and the mosquito number, as well as the probability of finding West Nile Virus (WNV) at any particular time and location. 


## <a id='toc2_1_'></a>[1.1. Key Questions](#toc0_)


# <a id='toc3_'></a>[2. Setup and Data Collection](#toc0_)

We will be utilizing the cleaned mosquito tracking data from the city of Chicago, Illinois, spanning from 2008 to 2019 provided [here](link_to_dataset). This section will include the necessary libraries and modules for the analysis, as well as the data preparation steps.


In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
import re

# initialize styling params
plt.rcParams['figure.figsize'] = (8.0, 6.0) #setting figure size

# Load the data

# Display the first few rows of the dataframe


Unnamed: 0,Year,Week,Address Block,Trap,Trap type,Date,Mosquito number,WNV Present,Species,Lat,Lon,Month
0,2019,39,100XX W OHARE AIRPORT,T910,GRAVID,2019-09-26 00:09:00,2,negative,CULEX RESTUANS,41.977738,-87.880969,9
1,2019,39,52XX S KOLMAR AVE,T114,GRAVID,2019-09-26 00:09:00,1,negative,CULEX RESTUANS,41.798211,-87.736925,9
2,2019,39,58XX N WESTERN AVE,T028,GRAVID,2019-09-26 00:09:00,2,negative,CULEX RESTUANS,41.987245,-87.689417,9
3,2019,39,39XX N SPRINGFIELD AVE,T228,GRAVID,2019-09-26 00:09:00,1,negative,CULEX RESTUANS,41.953664,-87.724987,9
4,2019,39,131XX S BRANDON AVE,T209,GRAVID,2019-09-26 00:09:00,9,negative,CULEX RESTUANS,41.657069,-87.546049,9



# <a id='toc4_'></a>[3. Methods and Assumptions](#toc0_)

This section will describe the methods and techniques used in the analysis.



# <a id='toc5_'></a>[4. Part 1 - Basic Analysis](#toc0_)

In this section, we will perform some basic data analysis.



## <a id='toc5_1_'></a>[4.1. Data Preparation](#toc0_)

- Convert the 'WNV Present' column into a binary column
- Create dummy variables from the 'Trap type' column



## <a id='toc5_2_'></a>[4.2. Exploratory Data Analysis](#toc0_)

- Calculate the average number of mosquitoes for each month
- Identify any noticeable trends



# <a id='toc6_'></a>[5. Part 2 - Statistical Analysis](#toc0_)

In this section, we will perform some statistical analysis on the data.



## <a id='toc6_1_'></a>[5.1. Hypothesis Testing](#toc0_)

- Test for a statistically significant difference between the different mosquito species when looking at the occurrence of West Nile Virus



## <a id='toc6_2_'></a>[5.2. Correlation Analysis](#toc0_)

- Identify columns that are positively correlated with the number of mosquitoes caught
- Identify columns that are negatively correlated
- Test if these correlations are statistically significant



# <a id='toc7_'></a>[6. Part 3 - Advanced Statistical Analysis](#toc0_)

In this section, we will perform some advanced statistical analysis on the data.



## <a id='toc7_1_'></a>[6.1. Linear Regression](#toc0_)

- Run a linear regression to determine how the independent variables affect the number of mosquitoes caught
- Explain the model construction process
- Analyze the model and the results
- Discuss the model’s limitations

Note: You will likely see a low R^2 value, that is to be expected. This dataset does not respond well to performing VIF analysis, so this is not required. 'WNV Present' must not be one of your independent variables.



## <a id='toc7_2_'></a>[6.2. Logistic Regression](#toc0_)

- Run a logistic regression to determine how the independent variables affect West Nile Virus presence
- Explain the model construction process
- Analyze the model and the results
- Discuss the model’s limitations

Note: 'Mosquito number' should be one of your independent variables.


# <a id='toc8_1_'></a>[7. Key Findings](#toc0_)

Summarize the findings from the analyses.



# <a id='toc8_2_'></a>[8. Recommendations](#toc0_)

