# Sun radiation and its impact in COVID-19 spread and fatality rate

 
**Executive Summary
**
>>2X to 3X higher mortality rate is seen where/when Sun radiation is low. The analysis done with data across 173 countries and more than 330 days of COVID-19 pandemic shows that, when we eliminate other Country specific factors and normalize by Country population, Sun radiation has a clear impact on COVID-19 mortality rate. By doing the same analysis on confirmed cases, the same does not apply, COVID-19 spead happens almost equally regardless the level of Sun radiation, but death rate is substantially different. The higher the Sun radiation the lower the COVID-19 mortality rate is. Sun radiation has a direct influence in body´s Vitamine D level which is well known driver of the strength of our immune system, and its ability to fight any infection, including COVID-19. While qualitatively we could find the conclusion to be logical apriori, this analysis provides a clear quantification of the impact that Sun exposure has on the development of this Pandemic, and how simple recommendations to the population to increase Sun exposure (at zero cost) would have a substantial impact on fatality rates (2X or 3X lower).
    
    ---------------------

The hypothesis we would like to evaluate is whether there has been any impact of COVID-19 spread and fatality rates relative to the Sun radiation. The rational behind is that COVID-19 severity (which directly impacts the spread rate) is highly dependent on each individual´s immune system. Sun radiation exposure is a direct driver for Vitamin D production by the skin, and has a direct impact to a healthy state of the immune system. 
We could have the intuition that regions and times where Sun radiation exposure has been stronger, the general population immune system status should have been stronger and that have an impact in lower rates of spread and fatality. If that is the case, we could also infer that by recommending people to have larger sun exposure, we could influence the evolution of the pandemic. 
For this analysis we have used publicly available datasets including:
-	COVID absolute and cumulative confirmed cases, deaths and recovery numbers from: 

https:// covid19tracking.narrativa.com
 
-	UV Radiation dataset from Kaggle: 

https://www.kaggle.com/juanjodd/uv-biologically-effective-dose-from-cams

-	Population by country
https://www.kaggle.com/tanuprabhu/population-by-country-2020

The study steps are as follows:
-	Produce a view per country, per day of:
o	Incremental deaths, confirmed cases and recovery cases.
o	Normalized to its maximum, to eliminate country specific variables (general health, diet, genetics, ...), we want to evaluate change only related to UV radiation.
o	Normalized to the country population, so we obtain rates per inhabitant.
o	We have computed weekly rolling averages, since COVID-19 data is very noisy.


In [None]:
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import math
#from plotline import *
%matplotlib inline 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
from datetime import datetime
from pandas.plotting import scatter_matrix
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from statsmodels.tsa.seasonal import seasonal_decompose



dataframe_deaths = pd.read_csv('../input/covidnarrativa/deaths.csv')
dataframe_deaths.info()


In [None]:
dataframe_deaths.describe()

In [None]:
dataframe_deaths.keys()

In [None]:
dataframe_deaths.head()

In [None]:
pop = pd.read_csv('../input/population-by-country-2020/population_by_country_2020.csv')

In [None]:
pop.info()

In [None]:
pop=pop.loc[:,['Country (or dependency)','Population (2020)']]

In [None]:
pop.columns=['Country_EN','Population']

In [None]:
pop.info()

In [None]:
#Data on the dataset are cumulative values, so we will calculate incremental
plt.plot(dataframe_deaths.iloc[0,4:])

In [None]:
dataframe_confirmed = pd.read_csv('../input/covidnarrativa/confirmed.csv')
dataframe_confirmed.info()


In [None]:
dataframe_confirmed.head()

In [None]:
plt.plot(dataframe_confirmed.iloc[0,4:])

In [None]:
dataframe_recovered = pd.read_csv('../input/covidnarrativa/recovered.csv')
dataframe_recovered.info()

In [None]:
plt.plot(dataframe_recovered.iloc[0,4:])

In [None]:
dataframe_recovered.head()

In [None]:
#on the UV dataset we have a lookup_table with the Country names we will use to match the COVID dataset
lu_table=pd.read_csv('../input/uv-biologically-effective-dose-from-cams/LookUp_Table.csv')
uvbed=pd.read_csv('../input/uv-biologically-effective-dose-from-cams/uvbed.csv')

In [None]:
lu_table.info()

In [None]:
lu_table

In [None]:
uvbed.info()

In [None]:
uvbed

In [None]:
def df_transform(mydf, lu_table,datapoint):
    #calculate first the difference and then 7 day rolling average
    mydf_d = np.maximum(0,mydf.iloc[:,4:].diff(axis=1))
    mydf_d = mydf_d.rolling(7,min_periods=1,axis=1).mean()
    mydf_d = pd.concat([mydf_d,mydf.iloc[:,:4]],axis=1)
    test = pd.merge(mydf_d, lu_table, how='inner',left_on='Country_EN',right_on='Combined_Key')
    test.drop(columns=['Country_ES','Country_IT','Lat','Long_','Combined_Key'],inplace=True)
    test = pd.merge(test,pop, how='inner', left_on='Country_EN', right_on='Country_EN')
    test.iloc[:,:-4]=test.iloc[:,:-4].div(test['Population']/1000000,axis=0)
    mydf_max=test.iloc[:,:-4].max(axis=1).values
    test.iloc[:,:-4]=test.iloc[:,:-4]/mydf_max.reshape(-1,1)
    plt.plot(test.iloc[0,:-4])
    test.drop(columns=['Population'],inplace=True)
    test2 = test.groupby('UID').mean()
    print('Num countries:',test2.shape[0])
    test3 = test2.unstack()
    test4 = pd.DataFrame(test3)
    test5 = test4.reset_index()
    test5.columns=['date','UID',datapoint]
    test5['date']=pd.to_datetime(test5['date'])
    print(test5.describe())
    return test5

In [None]:
death_table=df_transform(dataframe_deaths, lu_table, 'deaths')

In [None]:
death_table

In [None]:
recovered_table=df_transform(dataframe_recovered, lu_table,'recovered')

In [None]:
recovered_table

In [None]:
confirmed_table=df_transform(dataframe_confirmed,lu_table,'confirmed')

In [None]:
confirmed_table

In [None]:
final_table = pd.merge(confirmed_table, death_table, how='inner',left_on=['UID','date'],right_on=['UID','date'])

In [None]:
final_table

In [None]:
final_table = pd.merge(final_table, recovered_table,how='inner', left_on=['UID','date'], right_on=['UID','date'])

In [None]:
final_table

In [None]:
uvbed.info()

In [None]:
uvbed['date_s']=uvbed['date'].astype(str)

In [None]:
uvbed['date_s']=uvbed['date_s'].str[0:4]+'-'+uvbed['date_s'].str[4:6]+'-'+uvbed['date_s'].str[6:]

In [None]:
uvbed

In [None]:
uvbed['date_t']=pd.to_datetime(uvbed['date_s'])

In [None]:
uvbed

In [None]:
uvbed.drop(columns=['date','date_s'],inplace=True)

In [None]:
uvbed.info()

In [None]:
final_table.info()

In [None]:
final_df = pd.merge(final_table,uvbed,how='inner',left_on=['UID','date'],right_on=['UID','date_t'])
final_df.drop(columns='date_t',inplace=True)

In [None]:
final_df.info()

In [None]:
final_df['r_d']=final_df['recovered']/(final_df['deaths']+1)
final_df['d_c']=final_df['deaths']/(final_df['confirmed']+1)

In [None]:
final_df.plot.scatter(x='uvbed[W/m2]',y='deaths')

In [None]:
final_df.info()

In [None]:
corrMatrix=final_df.corr()
sn.heatmap(corrMatrix)

In [None]:
scatter_matrix(final_df.iloc[:,2:], alpha=0.5, diagonal='kde', figsize=(20,20))

There are no clear linear correlations between UV radiation and the COVID deaths/cases, but there is clearly a higher concentration of higher rates at low radiation levels. One could argue that there is not necessarily causality, that radiation is lower in winter/cloudy/rainy days which is when people will tend to be more together, and therefore spread more. For that reason we have created new columns with ratios between death rate and case confirmation rate values, so we can eliminate the absolute value influence and focus on the degree to which people infected manage to recover or not. 

In [None]:
final_df.plot.scatter(x='uvbed[W/m2]',y='d_c')

In [None]:
uv_bins = pd.cut(final_df['uvbed[W/m2]'], bins=20)

In [None]:
uv_bins

In [None]:
final_df['uv_range']=pd.cut(final_df['uvbed[W/m2]'], bins=20)

In [None]:
bin_view = final_df.groupby('uv_range').mean()

In [None]:
bin_view.plot(y='d_c', kind='bar')
plt.title('Daily ratio deaths to confirmed cases per million inhabitant normalized per country')

In [None]:
bin_view.plot(y='confirmed', kind='bar')
plt.title('Daily confirmed cases per million inhabitant normalized per country')

In [None]:
bin_view.plot(y='deaths', kind='bar')
plt.title('Daily deaths per million inhabitant normalized per country')

# Conclusions

Based on the above, we can clearly see that on average, locations and times with higher UV exposure result in lower fatality rates(absolute rate of death rate per day per inhabitant and relative to confirmation case rate). 
One could infer as possible conclusion, that if people were recommended to get more exposed to Sun (if we assume causality), this could shift fatality towards the right hand side of the graphs above, and minimize substantially the impact of this pandemic.
