# COVID-19 and Temperature

As we approach the summer months, how might the rate of COVID-19 infections in humans increase or decrease? I aim to find a correlation between the infection rate and the temperature. To accomplish this, I will gather data about various US counties' monthly:

- Total number of infections
- Total number of deaths
- Average temperature
- Population Density

I wanted to drill in the data down to the county and month in order to leave out rooms for error such as the different temperatures per month and the varied temperatures between counties. This level of specificity should be fairly sufficient since counties generally have the same temperature due to small variations in latitude, longitude, and biomes between them. I also elected to choose the United States only because of the varied temperatures and biomes it has, and also the US had widespread testing insufficiencies. If I included other countries, that would skew the data.

Additionally, I also wanted to include population density because this feature is a large indicator affecting the spread of COVID-19. Leaving it out might have negative consequences because county densities vary greatly (e.g LA and Sonomish).

USAFacts Google Dataset: https://console.cloud.google.com/marketplace/details/bigquery-public-datasets/covid19-dataset-list?filter=solution-type:dataset&filter=category:covid19&id=4a850823-3f83-48f5-92d1-01ba6f8ed81e

```SELECT * FROM `bigquery-public-data.covid19_usafacts.confirmed_cases` LIMIT 1000```

EPA Historical Air Quality: 


In [117]:
import pandas

# covid_data = pandas.read_csv('./covid-confirmed-cases.csv')

# # Select only US counties
# us_covid = covid_data.iloc[211:401]

# # Drop unneeded data
# us_covid.drop(covid_data.columns[1:14], axis=1, inplace=True)

# us_covid.to_csv('covid-clean.csv')

import pandas

covid_data = pandas.read_csv('./covid-confirmed-usafacts.csv')

# Remove FIPS Code & January
covid_data.drop(covid_data.columns[3:14], axis=1, inplace=True)

# Remove May
covid_data.drop(covid_data.columns[93:], axis=1, inplace=True)

# Remove unneeded cities
covid_data.drop(covid_data.index[108:], inplace=True)

# Counties with no temperature data - remove
drop = [13,18,21,26,32,33,35,36,37,38,39,42,43,44,45,46,47,50,58,59,60,61,62,66,69,70,71,73,74,78,82,84,85,86,87,88,89,93,94,97,98,106]

for i in reversed(range(len(drop))):
    covid_data.drop(covid_data.index[drop[i]], inplace=True)

# Save
covid_data.to_csv('./covid-clean.csv')

In [118]:
infections = pandas.read_csv('./covid-clean.csv')
temperature = pandas.read_csv('./covid-temperature.csv')

temp_infections = {}

for i in range(0,65):
    for j in range(4,93):
        temp_infections[float(temperature.values[i][j])] = temp_infections.get(float(temperature.values[i][j]), 0) + float(infections.values[i][j])

In [119]:
print(temp_infections)


{65.4: 684.0, 60.4: 3765.0, 61.0: 3014.0, 53.9: 3597.0, 51.6: 1175.0, 55.0: 3730.0, 56.4: 4247.0, 59.2: 1579.0, 58.8: 3225.0, 56.3: 2170.0, 58.5: 1532.0, 57.3: 1914.0, 56.6: 381.0, 57.6: 3351.0, 57.5: 3739.0, 59.3: 2064.0, 61.4: 815.0, 61.3: 3085.0, 60.6: 1646.0, 61.8: 6718.0, 61.2: 1643.0, 59.9: 5395.0, 62.5: 1227.0, 65.8: 3210.0, 68.1: 1041.0, 67.3: 1219.0, nan: 2.0, 59.6: 3.0, 59.0: 2298.0, 61.5: 2904.0, 61.6: 1725.0, 62.1: 3547.0, 61.9: 689.0, 62.0: 941.0, 65.3: 9426.0, 64.4: 851.0, 61.7: 407.0, 58.4: 1702.0, 59.4: 2482.0, 60.5: 2960.0, 62.4: 9112.0, 62.6: 683.0, 56.8: 8953.0, 60.3: 1087.0, 62.3: 923.0, 62.2: 2249.0, 60.7: 2778.0, 58.7: 1744.0, 59.5: 3453.0, 63.4: 3423.0, 65.9: 2028.0, 63.9: 7862.0, 62.7: 8842.0, 63.1: 2520.0, 64.3: 2926.0, 64.7: 5043.0, 67.1: 6840.0, 70.4: 3842.0, 68.8: 4650.0, 68.7: 5874.0, 68.2: 3579.0, 68.5: 4336.0, 68.0: 4727.0, 48.2: 8533.0, 50.4: 1835.0, 55.5: 822.0, 66.2: 466.0, 67.2: 2006.0, 42.8: 5290.0, 55.3: 4959.0, 68.4: 1458.0, 69.8: 1202.0, 67.9: 688

In [124]:
import math

df = pandas.DataFrame([[items[0], items[1]] for items in temp_infections.items() if not math.isnan(items[0])], columns=['temperature', 'infections'])

df.to_csv('temp-vs-infections.csv', index=False)

In [121]:
# for i in range(0, 66):
#     print(f'{i} - {infections.values[i][2]} - {temperature.values[i][2]}')

In [122]:
u={"o": 4, "k": '2'}
pandas.DataFrame(u, columns=u.keys(), index=[0]).to_csv('test.csv')