# TABLE OF CONTENTS


* [1. INTRODUCTION](#section-one)
* [2. SETUP](#section-two)
    - [2.1 Installing Packages](#subsection-two-one)
    - [2.1 Importing Packages](#subsection-two-two)
    - [2.2 Wrangle Data](#subsection-two-three)
* [3. Covid-19 Analysis](#section-three)
    - [3.1 Covid-19 cases by country](#subsection-three-one)
    - [3.2 Covid-19 cases by state](#subsection-three-two)
    - [3.3 Impact of state cases on stocks performance](#subsection-three-three)
    - [3.4 Predicting future Covid-19 patterns](#subsection-three-four)
* [4. Stock analysis](#section-four)
    - [4.1 Stock selection](#subsection-four-one)
        - [4.1.1 Bear ETF and Treasury ETF](#subsection-four-one-one)
        - [4.1.2 Technology stocks](#subsection-four-one-two)
        - [4.1.3 Consumer Cylical stocks](#subsection-four-one-three)
        - [4.1.4 Real estate stocks](#subsection-four-one-four)
        - [4.1.5 Healthcare stocks](#subsection-four-one-five)        
* [5. Portfolio optimization](#section-five)
    - [5.1 Stock correlation](#subsection-five-one)
    - [5.2 Simulation of portfolio performance](#subsection-five-two)
    - [5.3 Portfolio allocation](#subsection-five-three)
    - [# 5.4 Portfolio returns relationship to new cases](#subsection-five-four)
* [6. Conclusion](#section-six)

<a id="section-one"></a>
# 1. Introduction

<a id="section-two"></a>
# 2. Set-up

- [2.1 Draw Packages](#subsection-two-one)
#  2.1 Installing pips

In [None]:
!pip install --upgrade pip

In [None]:
!pip install yfinance

<a id="subsection-two-two"></a>
# 2.2 Importing

In [None]:
# Date
from dateutil import relativedelta as rd
from datetime import datetime, date, timedelta
from datetime import date

#Data Manipulation
import pandas as pd
import numpy as np
from pandas import DataFrame
from numpy import inf

# Visualization
import matplotlib.pyplot as plt
import plotly as py
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as pyo
pyo.init_notebook_mode()
import seaborn as sns
import matplotlib.cm as cm
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter
import geopandas as gpd
import matplotlib as mpl
from scipy.stats.mstats import winsorize
from matplotlib.patches import Ellipse
from matplotlib.text import OffsetFrom

import os

# Regression 
import statsmodels.api as sm
from statsmodels.formula.api import ols
import statsmodels.graphics.api as smg
from scipy.optimize import curve_fit
import yfinance as yf


#Prediction
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split



<a id="subsection-two-three"></a>
# 2.3 WRANGLING DATA

In [None]:
# Color Palettes
cnf, dth, rec, act = '#393e46', '#ff2e63', '#21bf73', '#fe9801' 

In [None]:
# World Data up till july
country_wise = pd.read_csv('../input/corona-virus-report/country_wise_latest.csv')
country_wise = country_wise.replace('', np.nan).fillna(0)
full_grouped = pd.read_csv('../input/corona-virus-report/full_grouped.csv')
full_grouped['Date'] = pd.to_datetime(full_grouped['Date'])
day_wise = pd.read_csv('../input/corona-virus-report/day_wise.csv')
day_wise['Date'] = pd.to_datetime(day_wise['Date'])

In [None]:
#Germany Data up till july
world_confirmed = pd.read_csv('../input/covid19-report-2020117/time_series_covid19_confirmed_global.csv')
germany_confirmed = world_confirmed[world_confirmed['Country/Region'].isin(['Germany'])]
germany_confirmed = germany_confirmed.drop(['Province/State','Country/Region','Lat','Long'],axis=1)
germany_confirmed = germany_confirmed.transpose()
germany_confirmed.reset_index(inplace = True)
germany_confirmed.columns = ["Date",'Total Confirmed']
#print(germany_confirmed)
world_deaths = pd.read_csv('../input/covid19-report-2020117/time_series_covid19_deaths_global.csv')
germany_deaths = world_deaths[world_deaths['Country/Region'].isin(['Germany'])]
germany_deaths = germany_deaths.drop(['Province/State','Country/Region','Lat','Long'],axis=1)
germany_deaths = germany_deaths.transpose()
germany_deaths.reset_index(inplace = True)
germany_deaths.columns = ["Date",'Total Deaths']
#print(germany_deaths)
world_recovered = pd.read_csv('../input/covid19-report-2020117/time_series_covid19_recovered_global.csv')
germany_recovered = world_recovered[world_recovered['Country/Region'].isin(['Germany'])]
germany_recovered = germany_recovered.drop(['Province/State','Country/Region','Lat','Long'],axis=1)
germany_recovered = germany_recovered.transpose()
germany_recovered.reset_index(inplace = True)
germany_recovered.columns = ["Date",'Total Recovered']
#print(germany_recovered)
germanycountry= pd.merge(germany_confirmed,germany_deaths,on='Date',how='outer')
germanycountry= pd.merge(germanycountry,germany_recovered,on='Date',how='outer')
germanycountry['Total Active'] = germanycountry['Total Confirmed']-germanycountry['Total Deaths']-germanycountry['Total Recovered']
germanycountry['Date'] = pd.to_datetime(germanycountry['Date'])
#print(germanycountry)


In [None]:
## Function to get the pandemic countries
germanystate = pd.read_csv('../input/covid19-tracking-germany/covid_de.csv')
germanystate['date'] = pd.to_datetime(germanystate['date'])

#print(germanystate['gender'].isnull().sum()) #missing data from gender
#print(germanystate['age_group'].isnull().sum()) #missing data from age_group
germanystate.dropna(subset=['gender','age_group'], how='all',inplace=True)

state = germanystate.sort_values(['state','date','gender','age_group']).reset_index()
state_cases_per_day=state.groupby(['state','date','gender','age_group']).agg({'cases':'sum','deaths':'sum'}).reset_index()
state_cases_per_day['Total cases']=state_cases_per_day.groupby('state')['cases'].cumsum()
state_cases_per_day['Total deaths']=state_cases_per_day.groupby('state')['deaths'].cumsum()

germanypop= pd.read_csv('../input/covid19-tracking-germany/demographics_de.csv')
#print(germanypop.info())
germanypop = germanypop.replace('female','F')
germanypop = germanypop.replace('male','M')
#print(germanypop.head(20))
germany_cases_pop = pd.merge(germanystate,germanypop,on=['state','gender','age_group'],how='inner')
germany_cases_pop['Total cases']=germany_cases_pop.groupby('state')['cases'].cumsum()
germany_cases_pop['Total deaths']=germany_cases_pop.groupby('state')['deaths'].cumsum()
germany_cases_pop.rename(columns={'cases':'new_cases','deaths':'new_deaths'},inplace=True)
germany_cases_pop.drop(columns=['county'],inplace=True)
#print(germany_cases_pop)



In [None]:
# Worldometer data
# ================

worldometer_data = pd.read_csv('../input/corona-virus-report/worldometer_data.csv')

# Replace missing values '' with NAN and then 0
# What are the alternatives? Drop or impute. Do they make sense in this context?
worldometer_data = worldometer_data.replace('', np.nan).fillna(0)
worldometer_data['Case Positivity'] = round(worldometer_data['TotalCases']/worldometer_data['TotalTests'],2)
worldometer_data['Case Fatality'] = round(worldometer_data['TotalDeaths']/worldometer_data['TotalCases'],2)

# Case Positivity is infinity when there is zero TotalTests due to division by zero
worldometer_data[worldometer_data["Case Positivity"] == inf] = 0

# Qcut is quantile cut. Here we specify three equally sized bins and label them low, medium, and high, respectively.
worldometer_data ['Case Positivity Bin']= pd.qcut(worldometer_data['Case Positivity'], q=3, labels=["low", "medium", "high"])

# Population Structure
worldometer_pop_struc = pd.read_csv('../input/covid19-worldometer-snapshots-since-april-18/population_structure_by_age_per_contry.csv')

# Replace missing values with zeros
worldometer_pop_struc = worldometer_pop_struc.fillna(0)
#worldometer_pop_struc.info()

# Merge worldometer_data with worldometer_pop_struc
# Inner means keep only common key values in both datasets
worldometer_data = worldometer_data.merge(worldometer_pop_struc,how='inner',left_on='Country/Region', right_on='Country')

# Keep observations where column "Country/Region" is not 0
worldometer_data = worldometer_data[worldometer_data["Country/Region"] != 0]

# Inspect worldometer_data's metadata
#worldometer_data.info()

# Inspect Data
# worldometer_data.info()
# worldometer_data.tail(20)
# worldometer_data["Case Positivity"].describe()


First we wrangle the data, by making datetime format for dates, and removing data that are missing in both age_groups and gender. Next, we found the total cases and deaths per state. Next, we merged the data from population to create find cases and deaths as a percentage of population. 

<a id="subsection-three"></a>
# 3. Covid-19 Analysis


<a id="subsection-three-one"></a>
# 3.1 **Covid-19 cases by country**

In [None]:
germanycountry['Recovered%'] = round(germanycountry['Total Recovered']/germanycountry['Total Confirmed']*100,2)
germanycountry["New Active"] = germanycountry["Total Active"].diff()
germanycountry["New Cases"] = germanycountry["Total Confirmed"].diff()
germanycountry["New Deaths"] = germanycountry["Total Deaths"].diff()
germanycountry = germanycountry.replace('', np.nan).fillna(0)
print(germanycountry)
temp = germanycountry.melt(id_vars="Date", value_vars=['New Cases', 'New Deaths'],
                 var_name='Case', value_name='Count')
temp.head()

fig = px.area(temp, x="Date", y="Count", color='Case', height=600, width=1200,
             title='Cases over time (Germany)', color_discrete_sequence = [rec, dth, act])
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()


> There is a huge spike of covid 19 cases in March to May, at its peak, there were over 7000 cases per day. In October 2020, the second wave came and the daily cases is even higher, reaching its peak at 30k cases per day on November 4

In [None]:
temp = germanycountry[['Date','Total Deaths', 'Total Recovered', 'Total Active']].tail(1)
temp.head()
temp = temp.melt(id_vars="Date", value_vars=['Total Active', 'Total Deaths', 'Total Recovered'])
print(temp)
fig = px.treemap(temp, path=["variable"], values="value", height=225, 
                 color_discrete_sequence=[act, rec, dth])
fig.data[0].textinfo = 'label+text+value'
fig.show()

> By 7 November, all the new active cases has already reached nearly half of the total cases. We expect long lockdowns in order to curb such a growth

In [None]:
comparables = world_confirmed[world_confirmed['Country/Region'].isin(['Turkey','France','Germany'])]
comparables = pd.DataFrame.transpose(comparables)
comparables.columns = ['116','117','118','119','120','121','122','123','124','125','126','Germany','Turkey']
comparables = pd.DataFrame(data = comparables)

comparables["France"] = comparables["116"] + comparables["117"] + comparables["118"] + comparables["119"] \
+ comparables["120"] + comparables["121"] + comparables["122"] + comparables["123"] + comparables["124"] + comparables["125"] + comparables["126"]
comparables = comparables.drop(['116','117','118','119','120','121','122','123','124','125','126'],axis = 1)
comparables = comparables.drop(['Province/State','Country/Region','Lat','Long'])
comparables = comparables.reset_index()
comparables = comparables.rename(columns = {'index':'Date'})
comparables = pd.melt(comparables, id_vars = ["Date"], value_vars = ['Germany','Turkey','France'],var_name = "Country",value_name='Confirmed')
comparables = comparables[comparables['Confirmed']>100000]
comparables['Date'] = pd.to_datetime(comparables['Date'])
#comparables['min_date'] = comparables.groupby('Country')["Date"].min()
min_date = comparables.groupby('Country')["Date"].min()
min_date = pd.DataFrame(data = min_date).reset_index()
min_date = min_date.rename(columns = {'Date':'min_date'})
comparables = pd.merge(comparables,min_date,on='Country',how='left')

comparables['N days']=(comparables['Date'] - comparables['min_date']).dt.days
print(comparables)

fig = px.line(comparables, x='N days', y='Confirmed', color='Country', 
               title='N days from '+ str(100000) +' case', height=600)
fig.show()


> France seems to have trouble maintaining the total cases at a lower level, its spike is much higher than that of germany and turkey. Germany curve is turning up as well, suggesting it might follow the pattern of France soon

# Estimating real fatality rate
> These 3 countries are all from Europe with similar patterns in growth of covid cases. As all three has experienced more than 100k total cases, we will like to estimate the true fatality rate of the three countries.  

In [None]:
def plot_hbar_wm(col, n, min_pop=1000000):
    df = worldometer_data[worldometer_data['Population']>min_pop]
    df = df.sort_values(col, ascending=True).tail(n)
    df.info()
    fig = px.bar(df,
                 x=col, y="Country/Region", color='WHO Region',  
                 text=col, orientation='h', width=700, 
                 color_discrete_sequence = px.colors.qualitative.Dark2)
    fig.update_layout(title=col+' (Only countries with Population > ' + str(min_pop), 
                      xaxis_title="", yaxis_title="", 
                      yaxis_categoryorder = 'total ascending',
                      uniformtext_minsize=8, uniformtext_mode='hide')
    fig.show()
    
# Draw histogram with two arguments
# 1. variable of interest
# 2. the number of bins
def plot_histogram_wm(col, bins):
    fig = px.histogram(worldometer_data[col], x=col, nbins=bins)
    fig.show()

    
def compar_wm(col, n):
    df = worldometer_data[worldometer_data['Country/Region'].isin(['Turkey','France','Germany'])]
    df = df.sort_values(col, ascending=True).tail(n)
    
    fig = px.bar(df,
                 x=col, y="Country/Region", color='WHO Region',  
                 text=col, orientation='h', width=700, 
                 color_discrete_sequence = px.colors.qualitative.Dark2)
    fig.update_layout(title=col, 
                      xaxis_title="", yaxis_title="", 
                      yaxis_categoryorder = 'total ascending',
                      uniformtext_minsize=8, uniformtext_mode='hide')
    fig.show()
compar_wm('Case Fatality', 15)
compar_wm('Tests/1M pop', 15)
compar_wm('Tot Cases/1M pop', 15)

> While Germany seem to have the highest case fatality rate among all three countries, it also does more tests than the others. Next we will use the average fatality rate of the world as a benchmark to estimate their real fatality rate. 

In [None]:
benchmark_countries = worldometer_data[worldometer_data["Case Positivity"]<=0.01]
# Assume that the number of confirmed cases are close to the true infections rates for countries with gold standard testing regimes 
# Thus, their case fatality rates are closer to the true infection fatality rates
infection_fatality_rate = benchmark_countries['TotalDeaths'].sum() / benchmark_countries['TotalCases'].sum()

# Calculate the fraction of total Covid19 deaths for the population aged 65+ among the benchmark countries
benchmark_death_65y_pct = sum(benchmark_countries['TotalDeaths'] * benchmark_countries['Fraction age 65+ years']) / sum(benchmark_countries['TotalDeaths'])

print(infection_fatality_rate)
print(benchmark_death_65y_pct)

print('Estimated Infection Fatality Rate for a benchmark country with %.1f%s of population older than 65 years old \
is %.2f%s' %(100 * benchmark_death_65y_pct,'%',100 * infection_fatality_rate,'%'))

> Case Fatality Rate is positively correlated with the fraction of 65+ in the population. Here we calibrate each country's Infections Fatality Rate following the estimated fraction of total Covid19 deaths for the population aged 65+ (i.e., we expect the Infections Fatality Rate of a country with 20% estimated total Covid19 deaths among 65+ year old population to be twice higher than another country with 10%).

In [None]:
# Estimate Infection Fatality Ratio using the estimated fraction of total Covid19 deaths for the population aged 65+
worldometer_data['Estimated Infection Fatality Ratio'] \
    = ((worldometer_data['TotalDeaths'] * worldometer_data['Fraction age 65+ years']
        /worldometer_data['TotalDeaths']) / benchmark_death_65y_pct) * infection_fatality_rate

# Show descriptive statistics of the columns Estimated Infection Fatality Ratio and Case Fatality
worldometer_data['Estimated Infection Fatality Ratio'].describe()
worldometer_data['Case Fatality'].describe()

# Plot histogram of Estimated Infection Fatality Ratio and Case Fatality
px.histogram(worldometer_data, x='Estimated Infection Fatality Ratio', barmode="overlay")
px.histogram(worldometer_data, x='Case Fatality', barmode="overlay")

# Overlay both histograms for comparison
fig = go.Figure()

fig.add_trace(go.Histogram(x=worldometer_data['Estimated Infection Fatality Ratio'], 
    name = 'Estimated Infection Fatality Rate'
))

fig.add_trace(go.Histogram(x=worldometer_data['Case Fatality'], 
    name = 'Case Fatality Rate'
))

fig.update_layout(barmode='overlay', 
    title = 'Estimated Infection Fatality Rate vs. Case Fatality Rate',
    xaxis_title_text='Value', # xaxis label
    yaxis_title_text='Count', # yaxis label
)
                  
fig.update_traces(opacity=0.75)

fig.show()

In [None]:
worldometer_data['Estimated Infection Fatality Ratio'] \
    = ((worldometer_data['TotalDeaths'] * worldometer_data['Fraction age 65+ years']
        /worldometer_data['TotalDeaths']) / benchmark_death_65y_pct) * infection_fatality_rate
def estim_wm(col):
    
    estimated_wm = worldometer_data[worldometer_data['Country/Region'].isin(['Turkey','France','Germany'])]

    
    fig = px.bar(estimated_wm,
                 x=col, y="Country/Region", color='WHO Region',  
                 text=col, orientation='h', width=700, 
                 barmode='overlay')     
    
    fig.show()

estim_wm('Estimated Infection Fatality Ratio')
estim_wm('Case Fatality')

<a id="subsection-three-two"></a>
# 3.2 Covid-19 cases by state
> Based on the estimation, Germany does not have such a high fatality ratio as recorded. Next we will look at cases based on state, gender and age to see how it affects the deaths and cases in Germany.

In [None]:
states = ['Baden-Wuerttemberg','Bayern','Berlin','Brandenburg','Bremen',
          'Hamburg','Hessen','Mecklenburg-Vorpommern','Niedersachsen','Nordrhein-Westfalen',
          'Rheinland-Pfalz','Saarland','Sachsen','Sachsen-Anhalt','Schleswig-Holstein','Thueringen' ]

totalcases = state_cases_per_day.groupby('state')[['cases','deaths']].sum()
#print(totalcases)
germanypop = germanypop.replace('female','F')
germanypop = germanypop.replace('male','M')
totalpop = germanypop.groupby('state').population.sum()

covid_per_state = pd.merge(totalcases, totalpop, on ='state', how='outer')
covid_per_state['cases_per_pop'] = covid_per_state['cases']/covid_per_state['population']
covid_per_state['deaths_per_pop'] = covid_per_state['deaths']/covid_per_state['population']
covid_per_state = pd.DataFrame(data = covid_per_state, columns = ['cases','deaths','population','cases_per_pop','deaths_per_pop'])
covid_per_state.reset_index(inplace=True)


fig, ax = plt.subplots(2,2, figsize=(15, 15), facecolor='#f7f7f7')
fig.subplots_adjust(top=0.92)
fig.suptitle('Summary of cases by states', fontsize=18)
fig.tight_layout(pad=15.0)
covid_per_state.set_index('state').cases.plot(kind='bar', ax=ax[0][0], color='gold')
covid_per_state.set_index('state').cases_per_pop.plot(kind='bar', ax=ax[0][1], color='green')
covid_per_state.set_index('state').deaths.plot(kind='bar', ax=ax[1][0], color='gold')
covid_per_state.set_index('state').deaths_per_pop.plot(kind='bar', ax=ax[1][1], color='green')


ax[0][0].set_title('Total Cases per state', fontsize=14)
ax[0][1].set_title('Positivity Rate (%)', fontsize=14)
ax[1][0].set_title('Total deaths', fontsize=14)
ax[1][1].set_title('Death Rate (%)', fontsize=14)
#print(covid_per_state)


for axes in ax[0]:
    axes.set_xlabel('')
    axes.set_xticklabels(axes.get_xticklabels(), rotation=90)
    axes.grid(axis='y')
    axes.set_yticklabels(['{:,.1%}'.format(x) for x in axes.get_yticks()])
for axes in ax[1]:
    axes.set_xlabel('')
    axes.set_xticklabels(axes.get_xticklabels(), rotation=90)
    axes.grid(axis='y')
    axes.set_yticklabels(['{:,.1%}'.format(x) for x in axes.get_yticks()])

plt.show()



> As you can see, the cases and deaths for Bayern and Nordrhein-Westfalen are actually very high, however, after scaling to population sizes, it remains at less than 0.01% of cases for the population, and less than 0.00025% death rate. Next we will examine the cases and deaths by age. 

In [None]:
germ_pop_sum=germanypop.groupby('age_group').population.sum()
germ_covid_sum=germany_cases_pop.groupby('age_group',as_index=False)[['new_cases','new_deaths']].sum()
germ_summary = pd.merge(germ_pop_sum,germ_covid_sum,on='age_group')
germ_summary['positivity_rate'] = germ_summary['new_cases'] / germ_summary['population']
germ_summary['death_rate'] = germ_summary['new_deaths'] / germ_summary['new_cases']
germ_summary['prop_positives'] = germ_summary['new_cases'] / germ_summary['new_cases'].sum()
germ_summary['prop_deaths'] = germ_summary['new_deaths'] / germ_summary['new_deaths'].sum()
print(germ_summary)


fig, ax = plt.subplots(3,2, figsize=(15, 15), facecolor='#f7f7f7')
fig.subplots_adjust(top=0.92)
fig.suptitle('Summary of the situation in Germany (age)', fontsize=18)

germ_summary.set_index('age_group').new_cases.plot(kind='bar', ax=ax[0][0], color='gold')
germ_summary.set_index('age_group').new_deaths.plot(kind='bar', ax=ax[1][0], color='red')
germ_summary.set_index('age_group').positivity_rate.plot(kind='bar', ax=ax[0][1], color='gold')
germ_summary.set_index('age_group').death_rate.plot(kind='bar', ax=ax[1][1], color='red')

ax[2][0].pie(germ_summary.prop_positives.values, labels=germ_summary.age_group, autopct='%.0f%%')
ax[2][1].pie(germ_summary.prop_deaths.values, labels=germ_summary.age_group, autopct='%.0f%%')

ax[0][0].set_title('Total Cases', fontsize=14)
ax[1][0].set_title('Total Deceased', fontsize=14)
ax[0][1].set_title('Positivity Rate (%)', fontsize=14)
ax[1][1].set_title('Death Rate (%)', fontsize=14)
ax[2][0].set_title('Proportion of Positives', fontsize=14)
ax[2][1].set_title('Proportion of Victims', fontsize=14)

for axes in ax[0]:
    axes.set_xlabel('')
    axes.set_xticklabels(axes.get_xticklabels(), rotation=0)
    axes.grid(axis='y')
for axes in ax[1]:
    axes.set_xlabel('')
    axes.set_xticklabels(axes.get_xticklabels(), rotation=0)
    axes.grid(axis='y')
    axes.set_yticklabels(['{:,.2%}'.format(x) for x in axes.get_yticks()])

plt.show()

> Most cases are contacted by age 15-39, even if scaled to population percentages. This can be generally due to the fact most of the working forces are within that age range. However, most deaths are spread between age 60 and above, around nearly 90% of the deaths. 

In [None]:
germ_gender_sum=germanypop.groupby('gender').population.sum()
germ_covid_sum=germany_cases_pop.groupby('gender',as_index=False)[['new_cases','new_deaths']].sum()
germ_summary_2 = pd.merge(germ_gender_sum,germ_covid_sum,on='gender')
#print(germ_summary_2)
germ_summary_2['positivity_rate'] = germ_summary_2['new_cases'] / germ_summary_2['population']
germ_summary_2['death_rate'] = germ_summary_2['new_deaths'] / germ_summary_2['new_cases']
germ_summary_2['prop_positives'] = germ_summary_2['new_cases'] / germ_summary_2['new_cases'].sum()
germ_summary_2['prop_deaths'] = germ_summary_2['new_deaths'] / germ_summary_2['new_deaths'].sum()
print(germ_summary_2)


fig, ax = plt.subplots(3,2, figsize=(12, 12), facecolor='#f7f7f7')
fig.subplots_adjust(top=0.92)
fig.suptitle('Summary of the situation in Germany (gender)', fontsize=18)

germ_summary_2.set_index('gender').new_cases.plot(kind='bar', ax=ax[0][0], color='gold')
germ_summary_2.set_index('gender').new_deaths.plot(kind='bar', ax=ax[1][0], color='red')
germ_summary_2.set_index('gender').positivity_rate.plot(kind='bar', ax=ax[0][1], color='gold')
germ_summary_2.set_index('gender').death_rate.plot(kind='bar', ax=ax[1][1], color='red')

ax[2][0].pie(germ_summary_2.prop_positives.values, labels=germ_summary_2.gender, autopct='%.0f%%')
ax[2][1].pie(germ_summary_2.prop_deaths.values, labels=germ_summary_2.gender, autopct='%.0f%%')

ax[0][0].set_title('Total Cases', fontsize=14)
ax[1][0].set_title('Total Deceased', fontsize=14)
ax[0][1].set_title('Positivity Rate (%)', fontsize=14)
ax[1][1].set_title('Death Rate (%)', fontsize=14)
ax[2][0].set_title('Proportion of Positives', fontsize=14)
ax[2][1].set_title('Proportion of Victims', fontsize=14)

for axes in ax[0]:
    axes.set_xlabel('')
    axes.set_xticklabels(axes.get_xticklabels(), rotation=0)
    axes.grid(axis='y')
for axes in ax[1]:
    axes.set_xlabel('')
    axes.set_xticklabels(axes.get_xticklabels(), rotation=0)
    axes.grid(axis='y')
    axes.set_yticklabels(['{:,.2%}'.format(x) for x in axes.get_yticks()])

plt.show()

> Based on the graphs, both genders contact the virus at the same rate, however, males seem to have a higher death rate of 2.5% to 2%

In [None]:

for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        os.path.join(dirname, filename)

# Any results you write to the current directory are saved as output.
path_to_file_covid = '../input/covid19-tracking-germany/covid_de.csv'
path_to_file_demo  = '../input/covid19-tracking-germany/demographics_de.csv'
path_to_file_shape = '../input/covid19-tracking-germany/de_state.shp'

#getting the data
covid_de  = pd.read_csv(path_to_file_covid, index_col="date", parse_dates=True) #cases and deaths per state and age and sex
demo_de   = pd.read_csv(path_to_file_demo)    # demography file
shape_de2 = gpd.read_file(path_to_file_shape) # geography file

# replace Umlaute
shape_tmp = shape_de2.replace({'Baden-Württemberg' : 'Baden-Wuerttemberg', 'Thüringen' : 'Thueringen' }).copy()
shape_de = shape_tmp.rename(columns={'GEN': 'state'}).copy()

# conversion factor for later
m2tokm2 = 1/1000000

In [None]:
# Set the coordinate reference system (CRS) to EPSG 3035
# Lambert Azimuthal Equal Area -> 3035
shape_de.crs = {'init': 'epsg:3025'}
# print(shape_de.geometry.crs)

# Population and population density

In [None]:
norm_axis1 = 15e6
norm_axis2 = 4.1e3


# map with population
def add_pop_state(state):
    popu = demo_de[demo_de.state == state].population.sum()
    #print(popu)
    shape_de[shape_de.state == state].plot(figsize=(10,10),color= cm.Greens(popu/norm_axis1), edgecolor='gainsboro', zorder=3, ax =  ax1)

#map with population density
# for this get the area from the polygon, i.e. geometry
def get_area(state):   
    return shape_de[shape_de.state == state].geometry.area

def add_dens_state(state):
    dens  = demo_de[demo_de.state == state].population.sum()/float(get_area(state))/(m2tokm2)
    #print(state , '----' , round((dens)/(m2tokm2),2), 'people/km**2')#properly normalised density people/km**2
    shape_de[shape_de.state == state].plot(figsize=(10,10),color= cm.Greens(dens/norm_axis2), edgecolor='gainsboro', zorder=3, ax =  ax2)
    
    
plt.figure() 

# Create a map
ax1 = plt.axes([0., 0., 1., 2.])
shape_de['geometry'].plot(color='whitesmoke', edgecolor='gainsboro', zorder=3, ax = ax1)
for i in shape_de.state:
    add_pop_state(i)
ax1.set_title('Population of the states', fontsize=20)

# add colorbar
fig = ax1.get_figure()
cax = fig.add_axes([1.1, 0.0, 0.1, 2.0])
norm = mpl.colors.Normalize(vmin=0,vmax=norm_axis1)
sm = plt.cm.ScalarMappable(norm = norm, cmap='Greens')
sm._A = []
cbar = fig.colorbar(sm, cax=cax , ax=ax1)
cbar.ax.tick_params(labelsize=15)
# Create a second map
ax2 = plt.axes([1.6, 0., 1., 2.])
shape_de['geometry'].plot(figsize=(10,10),color='whitesmoke', edgecolor='gainsboro', zorder=3, ax = ax2)
for i in shape_de.state:
    add_dens_state(i)
    
# add colorbar
fig2 = ax2.get_figure()
cax2 = fig.add_axes([2.8, 0.0, 0.1, 2.])
norm2 = mpl.colors.Normalize(vmin=0,vmax=norm_axis2)
sm2 = plt.cm.ScalarMappable(norm=norm2,cmap='Greens')
sm2._A = []
cbar = fig.colorbar(sm2, cax=cax2)
cbar.ax.tick_params(labelsize=15)
ax2.set_title('Population density of the states (ppl/sqkm)', fontsize=20)

plt.show()

Here once can see, that the population is large in e.g. NRW, but the density is of course much higher in cities.

# Cases and deaths per state

In [None]:
norm_axis1 = 90e3
norm_axis2 = 1800

def add_case_per_state(state):
    case = covid_de.loc[covid_de['state'] == state ].cases.sum()
    #print(case)
    shape_de[shape_de.state == state].plot(figsize=(10,10), color= cm.Blues(case/norm_axis1), edgecolor='gainsboro', zorder=3, ax =  ax1)

def add_death_per_state(state):
    death = covid_de.loc[covid_de['state'] == state ].deaths.sum()
    #print(death)
    shape_de[shape_de.state == state].plot(figsize=(10,10), color= cm.YlOrRd(death/norm_axis2), edgecolor='gainsboro', zorder=3, ax = ax2)

plt.figure() 

# Create a map
ax1 = plt.axes([0., 0., 1., 2.])
shape_de['geometry'].plot(color='whitesmoke', edgecolor='gainsboro', zorder=3, ax = ax1)
for i in shape_de.state:
    add_case_per_state(i)
ax1.set_title('Cases per state', fontsize=20)

# add colorbar
fig = ax1.get_figure()
cax = fig.add_axes([1.1, 0.0, 0.1, 2.0])
norm = mpl.colors.Normalize(vmin=0,vmax=norm_axis1)
sm = plt.cm.ScalarMappable(norm = norm, cmap='Blues')
sm._A = []
cbar = fig.colorbar(sm, cax=cax , ax=ax1)
cbar.ax.tick_params(labelsize=15)
# Create a second map
ax2 = plt.axes([1.6, 0., 1., 2.])
shape_de['geometry'].plot(figsize=(10,10),color='whitesmoke', edgecolor='gainsboro', zorder=3, ax = ax2)
for i in shape_de.state:
    add_death_per_state(i)
    
# add colorbar
fig2 = ax2.get_figure()
cax2 = fig.add_axes([2.8, 0.0, 0.1, 2.])
norm2 = mpl.colors.Normalize(vmin=0,vmax=norm_axis2)
sm2 = plt.cm.ScalarMappable(norm=norm2,cmap='YlOrRd')
sm2._A = []
cbar = fig.colorbar(sm2, cax=cax2)
cbar.ax.tick_params(labelsize=15)
ax2.set_title('Deaths per state', fontsize=20)

plt.show()




Cases and deaths do mainly occur in highly populated states. In the following, on can check the normalisation per population and population density. Interesting is also that the eastern part of Germany is much less affected than the eastern part.

# Cases normalised to population

In [None]:
norm_axis1 = 0.012
norm_axis2 = 1e9


def add_case_per_pop_state(state):
    case_norm = covid_de.loc[covid_de['state'] == state ].cases.sum() / demo_de[demo_de.state == state].population.sum()
    #print(case_norm)
    shape_de[shape_de.state == state].plot(figsize=(10,10), color= cm.Blues(case_norm/norm_axis1), edgecolor='gainsboro', zorder=3, ax =  ax1)

def get_area(state):   
    return shape_de[shape_de.state == state].geometry.area
    
def add_case_per_dens_state(state):
    case_dens = covid_de.loc[covid_de['state'] == state ].cases.sum() / (demo_de[demo_de.state == state].population.sum()/float(get_area(state)))
    #print(case_dens)
    shape_de[shape_de.state == state].plot(figsize=(10,10), color= cm.Blues(case_dens/norm_axis2), edgecolor='gainsboro', zorder=3, ax =  ax2)

    

plt.figure() 

# Create a map
ax1 = plt.axes([0., 0., 1., 2.])
shape_de['geometry'].plot(color='whitesmoke', edgecolor='gainsboro', zorder=3, ax = ax1)
for i in shape_de.state:
    add_case_per_pop_state(i)
ax1.set_title('Cases per state per population', fontsize=20)

# add colorbar
fig = ax1.get_figure()
cax = fig.add_axes([1.1, 0.0, 0.1, 2.0])
norm = mpl.colors.Normalize(vmin=0,vmax=norm_axis1)
sm = plt.cm.ScalarMappable(norm = norm, cmap='Blues')
sm._A = []
cbar = fig.colorbar(sm, cax=cax , ax=ax1)
cbar.ax.tick_params(labelsize=15)
# Create a second map
ax2 = plt.axes([1.6, 0., 1., 2.])
shape_de['geometry'].plot(figsize=(10,10),color='whitesmoke', edgecolor='gainsboro', zorder=3, ax = ax2)
for i in shape_de.state:
    add_case_per_dens_state(i)
    
# add colorbar
fig2 = ax2.get_figure()
cax2 = fig.add_axes([2.8, 0.0, 0.1, 2.])
norm2 = mpl.colors.Normalize(vmin=0,vmax=norm_axis2)
sm2 = plt.cm.ScalarMappable(norm=norm2,cmap='Blues')
sm2._A = []
cbar = fig.colorbar(sm2, cax=cax2)
cbar.ax.tick_params(labelsize=15)
ax2.set_title('Cases per state per population density', fontsize=20)

plt.show()

When dividing by the population (left side), one can see that mainly the south has many cases. When normalising by the population density of the state, Bavaria is leading. this is probably due to the fact that the area is rather large compared to the other states. 

<a id="subsection-three-three"></a>
# 3.3 Impact of state cases on stocks performance
> https://www.listenchampion.de/2019/03/23/die-10-groessten-unternehmen-in-bayern-unsere-liste-2019/
Many companies have set-up physical production plants and retail stores in different states in Germany. Due to the lockdown in the states, these companies might suffer a larger loss than others. We will take a closer look at some examples of stocks that are located in Bavern, one of the states that is hugely populated and is going into a second lockdown. 

In [None]:

Siemens = yf.download("SIEGY",start = "2019-11-01", end = "2020-11-01")
Siemens = Siemens["Adj Close"]
Siemens = pd.DataFrame(Siemens)
Siemens.columns = ['Adj_Close']

Adidas = yf.download("ADDYY",start = "2019-11-01", end = "2020-11-01")
Adidas = Adidas["Adj Close"]
Adidas = pd.DataFrame(Adidas)
Adidas.columns = ['Adj_Close']

BMW = yf.download("BMW.DE",start = "2019-11-01", end = "2020-11-01")
BMW = BMW["Adj Close"]
BMW = pd.DataFrame(BMW)
BMW.columns = ['Adj_Close']

In [None]:
fig, ax = plt.subplots(3,1, figsize=(15, 15), facecolor='#f7f7f7')


fig.subplots_adjust(top=0.92)
fig.suptitle('Companies in Bayern', fontsize=18)
fig.tight_layout(pad=8.0)

Siemens.Adj_Close.plot(kind='line', ax=ax[0], color='red')
Adidas.Adj_Close.plot(kind='line', ax=ax[1], color='red')
BMW.Adj_Close.plot(kind='line',ax=ax[2], color ='red')



ax[0].set_title('Siemens', fontsize=14)
ax[1].set_title('Adidas', fontsize=14)
ax[2].set_title('BMW', fontsize=14)



plt.show()

> All three companies suffered a dip in prices during march, and have a slow climb back in prices. In November, during the second wave, prices continue to drop. We believe there is a negative correlation between growth in new confirmed cases as well as stock prices. Therefore, we will try to avoid stocks with physical production in Germany states in our portfolio selection. 

<a id="subsection-three-four"></a>
# 3.4 Predicting future Covid-19 patterns

In [None]:
def plot_germany(df,date): 
    
    df['recent_wave'] = np.where(df['Date'] >= date,1,0)
    fig = px.line(df, x='Date', y='Total Confirmed', color='recent_wave',\
                  title = 'Infections', height=600)      
    fig.show()
    
    fig = px.line(df, x='Date', y='Total Recovered', color='recent_wave', \
              title = 'Recovered Patients ', height=600)      
    fig.show()
    

In [None]:
plot_germany(germanycountry,'2020-10-01')

We will use the month of october as our recent wave, and predict a SIR model based on it

In [None]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
import statsmodels.graphics.api as smg
from datetime import datetime, date, timedelta

In [None]:
def estimate_sir_param_germany(df,date):
    
    # Assume everyone is at risk
    # Identify the maximum population and the latest date in the time series for the country
    population = pd.read_csv('../input/covid19-tracking-germany/demographics_de.csv')
    population['Total population']  = population['population'].cumsum()
    population = population['Total population'].max()
    
    #latest_date = datetime.strptime(df["Date"].max(), '%Y-%m-%d')
    latest_date = df["Date"].max()
    time_series_length = (latest_date - datetime.strptime(date,'%Y-%m-%d')).days + 1

    
    df['recent_wave'] = np.where(df['Date'] >= date,1,0)
    
    # Initialize Numpy arrays for total population (the maximum population), 
    # susceptible population (empty), and change in time (i.e., 1 day)
    N  = np.array([population] * time_series_length)
    S  = np.array([])
    dt = np.array([1] * (time_series_length-1))

    # Apply the condition N = S+I+(R+D)
    # Filter time-series to those of the recent wave
    I = np.array(df[df['recent_wave']==1]['Total Active'])
    R = np.array(df[df['recent_wave']==1]['Total Recovered'])
    D = np.array(df[df['recent_wave']==1]['Total Deaths'])

    # R includes both Recovered and Death for brevity
    S = N - I - (R + D)

    ## 1. Estimate beta
    
    x = (S * I) / N
    
    # Copy all elements except the last
    x = x[:-1].copy()
    
    # Take the first difference
    dS = np.diff(S)
    y = dS/dt

    # Fit into a linear regression
    results = sm.OLS(y, x, missing='drop').fit()
    beta = results.params
    print(results.summary())
    print('\n')
    print('*'*80)
    print(f"Transmission rate or Beta is: {beta}")
    print('*'*80)
    
    ## 2. Estimate gamma
    
    x = I[:-1].copy()
    dR = np.diff(R+D)
    y = dR/dt

    results = sm.OLS(endog=y, exog=x, missing='drop').fit()
    gamma = results.params
    print (results.summary())
    print('\n')
    print('*'*80)
    print(f"Recovery (and Mortality) rate or Gamma is: {gamma}")
    print('*'*80)
    
    #3. Calculate R

    print('\n')
    print('*'*80)
    print(f"Reproduction number or R is: {-beta/gamma}")
    print('*'*80)
    
    
    return -beta.astype('float'), gamma.astype('float'), datetime.strptime(date,'%Y-%m-%d').date()

In [None]:
estimate_sir_param_germany(germanycountry,'2020-10-01')

In [None]:
def sir_model_betalist(I0 = 0.01, betalist = [0.5,0.8], gammalist = [0.15,0.25,0.5], days = 365):
    """
    Function takes Initial Infected Population(I0), list of transmission rates (betalist)
    and list of recovery rates(gammalist) as arguments.
    Plots Infectious population and Infected Population vs time for input parameters
    """
    
    for gamma in gammalist:
        
        # A. Plot Infectious Population
        plt.figure(figsize=(10,6))
        sns.set(style="darkgrid")
        plt.title("SIR Model: Infectious Population", fontsize=18)
        
        # Initialize model parameters
        for beta in betalist:
            N=1
            I=I0
            S=N-I
            gamma=gamma
            R=beta/gamma
            
            # Initialize empty lists
            inf=[]
            day=[]
            
            # Project into the future
            for i in range(days):
                day.append(i)
                inf.append(I)
                new_inf= I*S*beta
                new_rec= I*gamma
                I=I+new_inf-new_rec
                S=S-new_inf
            
            # Create plot objects by gamma and beta
            inf_max=round(np.array(inf).max()*100,1)
            sns.lineplot(day,inf, label=f"Beta: {beta} Gamma: {gamma} R0: {round(R,2)} Peak: {inf_max}%")
            plt.legend()
            
        # Show all plots objects
        plt.show()
        
        # B. Plot Total Infected Population
        plt.figure(figsize=(10,6))
        plt.title("SIR Model: Total Confirmed Cases", fontsize=18)       
        
        # Initialize model parameters
        for beta in betalist:
            N=1
            I=I0
            S=N-I
            C=I
            gamma=gamma
            R=beta/gamma
            
            # Initialize empty lists
            day=[]
            conf=[]

            # Project into the future            
            for i in range(days):
                day.append(i)
                conf.append(C)

                new_inf= I*S*beta
                new_rec= I*gamma
                I=I+new_inf-new_rec
                S=S-new_inf
                C=C+new_inf

            # Create plot objects by gamma and beta
            conf_max=round(np.array(conf).max()*100,1)
            sns.lineplot(day,conf, label=f"Beta: {beta} Gamma: {gamma} R0: {round(R,2)} Total :{conf_max}%")
            plt.legend()
            
        # Show all plots objects            
        plt.show()

In [None]:
sir_model_betalist(I0=0.034,betalist=[0.11294725], gammalist=[0.0447071])

In [None]:
def sir_model(I0=0.01, beta=0.6, gamma=0.1, days=365, date=date.today()):
    """
    Function will take in initial state for infected population,
    Transmission rate (beta) and recovery rate(gamma) as input.
    
    The function returns the maximum percentage of infectious population,
    the number of days to reach the maximum (inflection point),
    the maximum percentage of population infected,
    the number of days to reach 80% of the maximum percentage of population infected.
    
    """
    ## Initialize model parameters
    N = 1          #Total population in percentage, i.e., 1 = 100%
    I = I0         #Initial state of I default value 1% of population, i.e., I0 = 0.01
    S = N - I      #Initial state of S
    R = 0          #Initial State of R
    C = I          #Initial State of Total Cases
    beta  = beta   #Transmission Rate
    gamma = gamma  #Recovery Rate

    ## Initialize empty lists
    inf  = []       # List of Infectious population for each day
    day  = []       # Time period in day
    suc  = []       # List of Susceptible population for each day
    rec  = []       # List of Recovered population for each day
    conf = []       # List of Total Cases population for each day
    
    ## Project into the future
    for i in range(days):
        day.append(i)
        inf.append(I)
        suc.append(S)
        rec.append(R)
        conf.append(C)

        new_inf= I*S*beta/N            #New infections equation (1)   
        new_rec= I*gamma               #New Recoveries equation (2)
        
        I=I+new_inf-new_rec            #Total infectious population for next day
        S=max(min(S - new_inf, N), 0)  #Total infectious population for next day
        R=min(R + new_rec, N)          #Total recovered population for next day
        
        C=C+new_inf                    #Total confirmed cases for next day

    ## Pinpoint important milestones    
    max_inf = round(np.array(inf).max()*100,2)        #Peak infectious population in percentage
    inflection_day = inf.index(np.array(inf).max())   #Peak infectious population in days
    max_conf = round(np.array(conf).max()*100,2)      #Overall infected population in percentage
    plateau_day = np.array(np.where(np.array(conf) >= 0.8*np.array(conf).max())).min()   #Peak infectious population in days
        
    print(f"Maximum Infectious population at a time :{max_inf}%")
    print(f"Number of Days to Reach Maximum Infectious Population (Inflection Point):{inflection_day} days or {date + timedelta(days=inflection_day)}")
    print(f"Total Infected population :{max_conf}%")
    print(f"Number of Days to Reach 80% of the Projected Confirmed Cases (Plateau Point):{plateau_day} days or {date + timedelta(days=plateau_day.item())}")

    ## Visualize the model outputs
    sns.set(style="darkgrid")
    plt.figure(figsize=(10,6))
    plt.title(f"SIR Model: R = {round(beta/gamma,2)}", fontsize=18)
    sns.lineplot(day,inf, label="Infectious")
    sns.lineplot(day,suc,label="Succeptible")
    sns.lineplot(day,rec, label="Recovered")
    
    plt.legend()
    plt.xlabel("Time (in days)")
    plt.ylabel("Fraction of Population")
    plt.show()

In [None]:
sir_model(I0=0.034,beta=0.11294725, gamma=0.0447071, days=730)

> This is an extremely high infected population and it is under the assumption that intervention is unsuccessful or failed. 

<a id="subsection-four"></a>
# 4. Stock analysis

In [None]:
germanystock = pd.DataFrame()
germanystock = yf.download("EWG",start = "2019-11-01", end = "2020-11-01")
germanystock = germanystock["Adj Close"]
germanystock = pd.DataFrame(germanystock)
germanystock.columns = ['Adj Close']

plt.plot(germanystock["Adj Close"],label = "EWG")
plt.title("Portfolio Adj.Close Price History")
plt.xlabel('Date')
plt.ylabel('Adj. Price ')
plt.legend(germanystock.columns.values,loc = 'upper left')
plt.show()
print('Volatility: ' + str(germanystock["Adj Close"].std()))

"EWG suffered a huge loss in march after their spike in covid cases. We expect a huge drop in november as well due to the second spike in covid cases in germany. As a result, we hope to diversify into four different areas.

1. Countries with similar correlation, but no spike in cases 
2. Stocks/Funds/Bonds with low correlation with EWG
3. Stocks/Funds/Bonds with high negative correlation 
4. Stocks/Funds/Bonds from Germany that is doing well despite in spike of Covid cases

<a id="subsection-four-one"></a>
# 4.1 Stock selection


1. 3690 HK Meituan an investment holding company, provides an e-commerce platform that uses technology to connect consumers and merchants
2. ZM Zoom Doing well, especially in the covid-19 crisis, e-conferencing stock that is used worldwide
3. APX.AX  Appen Limited, together with its subsidiaries, provides data solutions and services for machine learning and artificial intelligence application
4. BNTX Germany stock doing biotech and vaccine for covid, news on vaccine development likely to drive prices even higher
5. DSD.PA The SHORTDAX X2 INDEX is linked to the performance of the DAX Index in an inverse way
6. DWNI.DE Deutsche Wohnen SE is a German property company. Housing prices and rental remain one of the most resilient market in Germany
7. IEF The fund generally invests at least 90% of its assets in the bonds of the underlying index and at least 95% of its assets in U.S. government bonds. Safe protection from negative runs
8. LEG.DE Germany stock LEG Immobilien AG is a German property company, similar to DWNI
9. SPTI The index is designed to measure the performance of intermediate term (3-10 years) public obligations of the U.S. Treasury.
10. Tencent Limited Chinese multinational technology conglomerate holding company.


<a id="subsection-four-one-one"></a>
# 4.1.1 Bear ETF and Treasury ETF

1. Lyxor Daily ShortDAX x2 UCITS ETF Acc
2. SPDR Portfolio Intermediate Term Treasury ETF
3. iShares 7-10 Year Treasury Bond ETF


In [None]:

SPTI = yf.download("SPTI",start = "2019-11-01", end = "2020-11-01")
SPTI = SPTI["Adj Close"]
SPTI = pd.DataFrame(SPTI)
SPTI.columns = ['Adj_Close']

Short_Dax = yf.download("DSD.PA",start = "2019-11-01", end = "2020-11-01")
Short_Dax = Short_Dax["Adj Close"]
Short_Dax = pd.DataFrame(Short_Dax)
Short_Dax.columns = ['Adj_Close']

IEF = yf.download("IEF",start = "2019-11-01", end = "2020-11-01")
IEF = IEF["Adj Close"]
IEF = pd.DataFrame(IEF)
IEF.columns = ['Adj_Close']

In [None]:
fig, ax = plt.subplots(3,1, figsize=(15, 15), facecolor='#f7f7f7')


fig.subplots_adjust(top=0.92)
fig.suptitle('Safety Stocks', fontsize=18)
fig.tight_layout(pad=8.0)

Short_Dax.Adj_Close.plot(kind='line', ax=ax[0], color='gold')
SPTI.Adj_Close.plot(kind='line', ax=ax[1], color='green')
IEF.Adj_Close.plot(kind='line',ax=ax[2], color ='red')



ax[0].set_title('Short_DAX', fontsize=14)
ax[1].set_title('SPTI', fontsize=14)
ax[2].set_title('IEF', fontsize=14)



plt.show()


These 3 bonds/stocks experienced sharp growth in March 2020. While Short_DAX returned to its normal range, a sharp spike in cases should result in sharp growth in November/December 2020. 

<a id="subsection-four-one-two"></a>
# 4.1.2 Technology stocks

1. Zoom
2. Tencent
3. Appen


In [None]:
Tencent = yf.download("0700.HK",start = "2019-11-01", end = "2020-11-01")
Tencent = Tencent["Adj Close"]
Tencent = pd.DataFrame(Tencent)
Tencent.columns = ['Adj_Close']

APX = yf.download("APX.AX",start = "2019-11-01", end = "2020-11-01")
APX = APX["Adj Close"]
APX = pd.DataFrame(APX)
APX.columns = ['Adj_Close']

Zoom = yf.download("ZM",start = "2019-11-01", end = "2020-11-01")
Zoom = Zoom["Adj Close"]
Zoom = pd.DataFrame(Zoom)
Zoom.columns = ['Adj_Close']


In [None]:
fig, ax = plt.subplots(3,1, figsize=(15, 15), facecolor='#f7f7f7')


fig.subplots_adjust(top=0.92)
fig.suptitle('Technology Stocks', fontsize=18)
fig.tight_layout(pad=8.0)

Tencent.Adj_Close.plot(kind='line', ax=ax[0], color='gold')
APX.Adj_Close.plot(kind='line', ax=ax[1], color='green')
Zoom.Adj_Close.plot(kind='line',ax=ax[2], color ='red')



ax[0].set_title('Tencent', fontsize=14)
ax[1].set_title('Appen', fontsize=14)
ax[2].set_title('Zoom', fontsize=14)




plt.show()

These 3 technology stocks have shown steady growth from March onwards. Under our assumption of further lockdowns in the next year, reliance on online communication will be higher than before. Appen is also positively correlated to Germany market index, but since it is based on Australia, a relatively Covid-19 unaffected country, we expect a lower dip in future due to rise in cases. 

<a id="subsection-four-one-two"></a>
# 4.1.3 Consumer Cylical stocks

1. Meituan


In [None]:
Meituan = yf.download("3690.HK",start = "2019-11-01", end = "2020-11-01")
Meituan = Meituan["Adj Close"]
Meituan = pd.DataFrame(Meituan)
Meituan.columns = ['Adj_Close']

In [None]:
fig, ax = plt.subplots(figsize=(15, 7), facecolor='#f7f7f7')
fig.subplots_adjust(top=0.92)
fig.suptitle('Consumer Cyclical Stock', fontsize=18)
fig.tight_layout(pad=8.0)
Meituan.Adj_Close.plot(kind='line', color='gold')
ax.set_title('Meituan', fontsize=14)
plt.show()

> China has one of the best management of Covid-19 situation and the country is already preparing to open up for tourism and travel. Meituan is an excellent company that provide services between companies and consumers, and is expected to grow during the opening of China's economy

<a id="subsection-four-one-four"></a>
# 4.1.4 Real estate stocks

1. Deutsche Wohnen
2. LEG Immobilien AG

In [None]:

DWNI = yf.download("DWNI.DE",start = "2019-11-01", end = "2020-11-01")
DWNI = DWNI["Adj Close"]
DWNI = pd.DataFrame(DWNI)
DWNI.columns = ['Adj_Close']

LEG = yf.download("LEG.DE",start = "2019-11-01", end = "2020-11-01")
LEG = LEG["Adj Close"]
LEG = pd.DataFrame(LEG)
LEG.columns = ['Adj_Close']



In [None]:
fig, ax = plt.subplots(2,1, figsize=(15, 15), facecolor='#f7f7f7')


fig.subplots_adjust(top=0.92)
fig.suptitle('Real Estate Stocks', fontsize=18)
fig.tight_layout(pad=8.0)

DWNI.Adj_Close.plot(kind='line', ax=ax[0], color='gold')
LEG.Adj_Close.plot(kind='line', ax=ax[1], color='green')



ax[0].set_title('DWNI', fontsize=14)
ax[1].set_title('LEG', fontsize=14)




plt.show()

These 2 stocks have a high correlation with the germany stock market. LEG and DWNI are both real estate stocks, and they are expected to 
In March 2020, they suffered a smaller dip in prices as compared to EWG and has since then recovered to a new high. In addition, many data shown the resilent nature of real estate stocks in downfall as their sales/rental prices continue to remain stable. 
https://www.globalpropertyguide.com/Europe/Germany/Price-History
https://www.globalpropertyguide.com/news-germanys-house-price-rises-continue-to-accelerate-4096
https://www.orrick.com/en/Insights/2020/08/Investments-in-Germany-under-COVID-19-Turning-a-Crisis-into-Opportunities

<a id="subsection-four-one-five"></a>
# 4.1.5 Healthcare stocks

1. Biotech


In [None]:
Biotech = yf.download("BNTX",start = "2019-11-01", end = "2020-11-01")
Biotech = Biotech["Adj Close"]
Biotech = pd.DataFrame(Biotech)
Biotech.columns = ['Adj_Close']

In [None]:
fig, ax = plt.subplots(figsize=(15, 7), facecolor='#f7f7f7')
fig.subplots_adjust(top=0.92)
fig.suptitle('Healthcare Stocks', fontsize=18)
fig.tight_layout(pad=8.0)
Biotech.Adj_Close.plot(kind='line', color='gold')
ax.set_title('BNTX', fontsize=14)
plt.show()

> Biotech is one of the leading healthcare companies developing vaccines for Covid-19 and has reportedly partnered with Pfizer to develop a vaccine at 90% effectiveness. Their stock prices, once the vaccine is fully developed, is expected to skyrocket. 
https://www.bbc.com/pidgin/tori-54884656
https://www.pfizer.com/news/press-release/press-release-detail/pfizer-and-biontech-announce-vaccine-candidate-against
https://markteinblicke.de/157988/2020/09/biontech-erreicht-neuen-meilenstein/
https://www.fool.com/investing/2020/10/20/3-critical-near-term-milestones-to-watch-with-pfiz/
https://www.clinicaltrialsarena.com/news/pfizer-covid-vaccine-early-data/

In [None]:
symbols_list = ["BNTX","EURUSD=X"]
start = datetime(2019,11,1)
end = datetime(2020,11,9)
data = yf.download(symbols_list, start=start, end=end)
df = data['Adj Close']
df =df.pct_change()[1:]

In [None]:
fig, ax = plt.subplots(figsize=(18, 10))
ax.plot(df.index,df['BNTX'],'bo',)
ax.set_title('Vaccine Milestones')
ax.set_ylabel('Returns')

ax.annotate("90% effectiveness",
            xy=(pd.Timestamp('2020-11-9'), -0.01),\
            xytext=(pd.Timestamp('2020-10-01'), -0.3),
            arrowprops={'arrowstyle': '->', 'lw': 1, 'color': 'black'},
            va='center')

ax.annotate("Testing in Germany authorized",
            xy=(pd.Timestamp('2020-9-7'), 0.0251513),\
            xytext=(pd.Timestamp('2020-9-01'), 0.3),
            arrowprops={'arrowstyle': '->', 'lw': 1, 'color': 'black'},
            va='center')

ax.annotate("Data from Phase I/II",
            xy=(pd.Timestamp('2020-8-13'), 0.027348),\
            xytext=(pd.Timestamp('2020-8-01'), 0.6),
            arrowprops={'arrowstyle': '->', 'lw': 1, 'color': 'black'},
            va='center')

ax.annotate("Phase III clinical trial",
            xy=(pd.Timestamp('2020-7-27'), 0.028039),\
            xytext=(pd.Timestamp('2020-7-01'), -0.3),
            arrowprops={'arrowstyle': '->', 'lw': 1, 'color': 'black'},
            va='center')

ax.annotate("Announcement of Pfizer and BionTech collaboration",
            xy=(pd.Timestamp('2020-3-17'), 0.665),\
            xytext=(pd.Timestamp('2020-1-01'), 0.45),
            arrowprops={'arrowstyle': '->', 'lw': 1, 'color': 'black'},
            va='center')


> The scatter plot shows the return of BNTX throughout 2019 to 2020. Many key dates such as collaboration and testing phases are included in the plot above, and the 90% effectiveness news is expected to drive its price higher. 

<a id="subsection-five"></a>
# 5. Portfolio optimization

<a id="subsection-five-one"></a>
# 5.1 Stock correlation

In [None]:
# Plot heatmap of the relationships across different sectors
comparisons = ["0700.HK", "3690.HK","ZM","BNTX","DSD.PA","IEF","SPTI","DWNI.DE","LEG.DE","APX.AX","EWG"]
stockStartDate= '2000-01-01'
stockStartDate = datetime.strptime(stockStartDate,'%Y-%m-%d')
today = datetime.today().strftime('%Y-%m-%d')
df = yf.download(comparisons, start=stockStartDate, end=today)
df = df['Adj Close']
baseline_corr = df[["0700.HK", "3690.HK","ZM","BNTX","DSD.PA","IEF","SPTI","DWNI.DE","LEG.DE","APX.AX","EWG"]].dropna().corr() # dropna() means drop the missing value
# light color: strong correlation
# dark: negative correlation
# mirror image again
fig, ax = plt.subplots(figsize=(20,10)) 
sns.heatmap(baseline_corr, annot=True, ax = ax)
#df.tail()

> The correlation remains positive for most stocks except the short index, designed to precisely hedge against fall in DAX, an index containing the top few companies in Germany. 

In [None]:
assets = ["0700.HK", "3690.HK","ZM","BNTX","DSD.PA","IEF","SPTI","DWNI.DE","LEG.DE","APX.AX"]

stockStartDate= '2000-01-01'
stockStartDate = datetime.strptime(stockStartDate,'%Y-%m-%d')
today = datetime.today().strftime('%Y-%m-%d')
df = pd.DataFrame()

df = yf.download(assets, start=stockStartDate, end=today)
df = df['Adj Close']
df = df.drop(df.index[len(df) -1])
#print(df)

#for stock in assets:
    #df[stock]= web.DataReader(stock,data_source='yahoo',start = stockStartDate, end = today )['Adj Close']
#print(df)


my_stocks = df
for c in my_stocks.columns.values:
    plt.plot(my_stocks[c],label = c)
plt.title("Portfolio Adj.Close Price History")
plt.xlabel('Date')
plt.ylabel('Adj. Price ')
plt.legend(my_stocks.columns.values,loc = 'upper left')
plt.show()

e_r = df.resample('Y').last().pct_change().mean()
#print(e_r)
cov_matrix = df.pct_change().apply(lambda x: np.log(1+x)).cov()
#print(cov_matrix)
sd = df.pct_change().apply(lambda x: np.log(1+x)).std().apply(lambda x: x*np.sqrt(250))
#print(sd)
assets = pd.concat([e_r,sd],axis = 1)
assets.columns = ['Returns','Volatility']
#print(assets)

<a id="subsection-five-two"></a>
# 5.2 Simulation of portfolio performance

In [None]:
p_ret = []
p_vol = []
p_weights = []

num_assets = len(assets)
num_portfolios = 10000


In [None]:
np.random.seed(42)
for portfolio in range(num_portfolios):
    weights = np.random.random(num_assets)
    weights = weights/np.sum(weights)
    p_weights.append(weights)
    
    returns = np.dot(weights, e_r)
    p_ret.append(returns)
    
    var = cov_matrix.mul(weights,axis=0).mul(weights,axis = 1).sum().sum()
    sd = np.sqrt(var)
    ann_sd = sd*np.sqrt(250)
    p_vol.append(ann_sd)
    

In [None]:
data = {'Returns':p_ret,'Volatility':p_vol}
for counter,symbol in enumerate(df.columns.tolist()):
    data[symbol+ ' weightage'] = [w[counter] for w in p_weights]
portfolios = pd.DataFrame(data)
portfolios['Sharpe Ratio']= portfolios['Returns']/portfolios['Volatility']
#print(portfolios.head())

In [None]:
plt.figure(figsize = (10,4))
plt.scatter(portfolios['Volatility'],portfolios['Returns'], c = portfolios['Returns']/portfolios['Volatility'])
plt.xlabel('Volatility')
plt.ylabel('Returns')
plt.colorbar(label= 'Sharpe Ratio')


> Using 10000 simulations, we created a efficient frontier of portfolio return vs portfolio volaitility and used it to find the most suitable portfolio for investors

In [None]:
min_variance = portfolios.iloc[portfolios['Volatility'].idxmin()]
print(min_variance)

In [None]:
optimal_port = portfolios.iloc[portfolios['Sharpe Ratio'].idxmax()]
print(optimal_port)
optimal_port = pd.DataFrame(optimal_port)
optimal_port.columns = ['Values']

weightage = optimal_port['Values'].tolist()

index = [0,1,12]
for x in sorted(index,reverse = True):
    del weightage[x]
#print(weightage)

> We decided to maximize our sharpe ratio for the portfolio to generate the best returns for investors

In [None]:
port_return = pd.DataFrame()

returns = assets["Returns"].to_list()
#print(returns)
total_return = [a * b for a,b in zip(weightage,returns)]
total_return = sum(total_return)

print(total_return)

> The backtesting confirmed that the weightage used will generate the following returns for investors during 2019

<a id="subsection-five-three"></a>
# 5.3 Portfolio allocation

In [None]:
labels = ["0700.HK", "3690.HK","ZM","BNTX","DSD.PA","IEF","SPTI","DWNI.DE","LEG.DE","APX.AX"]
#fig, ax = plt.subplots()
#fig = plt.figure(figsize=(20,20))
#ax.pie(data, explode=explode, labels=labels, autopct='%1.1f%%',
        #shadow=True, startangle=90)
#ax.axis('equal')

#plt.show()

portfolio_weights = DataFrame(dict( Weightage = weightage, tickets = labels)).reset_index()
#print(portfolio_weights)
portfolio_weights.plot(kind='pie',
                            figsize=(15, 15),
                            autopct='%1.1f%%', 
                            startangle=90,    
                            shadow=True,       
                            labels=None,                 # turn off labels on pie chart
                            pctdistance=1.12,            # the ratio between the pie center and start of text label
                            y ='Weightage')
plt.legend(labels = portfolio_weights.tickets)
plt.show()

> The portfolio puts a high amount of weightage in Appen (21.1%), LEG Immobilien AG (20.2%), SPDR Portfolio Intermediate Term Treasury ETF (18.0%). To remain active in Germany, we will only sell a percentage of our previous portfolio to buy into this new portfolio of 10 stocks

In [None]:
port_return = pd.DataFrame()
returns = my_stocks.pct_change()

for i in range(10):
    port_return[i] = returns.iloc[:,i]*weightage[i]

#print(port_return)
port_return = port_return.dropna().sum(axis=1)
#print(port_return)

return_acc = 100
list_return = []
for i,j in enumerate(port_return.dropna()):
    return_acc = return_acc * (1 + port_return.dropna().iloc[i])
    list_return.append(return_acc)
    
portfolio_index = pd.DataFrame(data = list_return, index = port_return.dropna().index)
portfolio_index.columns = ['Portfolio Index']



stockStartDate= '2000-01-01'
stockStartDate = datetime.strptime(stockStartDate,'%Y-%m-%d')
today = datetime.today().strftime('%Y-%m-%d')
df = pd.DataFrame()


EWG = yf.download('EWG', start=stockStartDate, end=today)
EWG = EWG['Adj Close'].pct_change()[1:]
EWG = pd.DataFrame(EWG).rename(columns={'Adj Close':'EWG'})


port_return = port_return.dropna()
port_return = pd.DataFrame(port_return)


final_port = port_return.merge(EWG, on='Date',how='inner')
final_port = final_port.rename(columns={0:'Portfolio'})


final_port_return = pd.DataFrame()
weights_2 = [0.20,0.75]
for i in range(2):
    final_port_return[2] = final_port.iloc[:,i]*weights_2[i]


final_port_return = final_port_return.dropna().sum(axis=1)


return_acc = 100
list_return = []
for i,j in enumerate(final_port_return.dropna()):
    return_acc = return_acc * (1 + port_return.dropna().iloc[i])
    list_return.append(return_acc)
    
final_portfolio_index = pd.DataFrame(data = list_return, index = final_port_return.dropna().index)
final_portfolio_index.columns = ['Portfolio Index']


fig = px.line(final_portfolio_index, x = final_portfolio_index.index, y='Portfolio Index')
fig.show()

> While the graph seems to generate 150% returns and more, it is important to take note that these are due to extraordinary circumstances and it is not likely to last for a  long time. The portfolio selected is aimed to generate profits in the short term, where the cases in Germany continue to rise and lockdowns remain necessary. After the second wave passes, the portfolio must be reinvestigated and readjusted for better returns and risk management. 

> The weightage is only at 95% as 5% will be kept as cash. This will be used to take advantage of new opportunities once the second wave died down. 

<a id="subsection-five-four"></a>
# 5.4 Portfolio returns relationship to new cases

In [None]:
baseline2020 = port_return[port_return.index >= '2020-01-01']


baseline2020 = pd.merge(baseline2020,germanycountry, how='left', on='Date')
baseline2020['New Cases'] = baseline2020['New Cases'].fillna(0)
baseline2020 = baseline2020.rename(columns={0:'Returns'})


sns.jointplot(x = 'New Cases', y = 'Returns', data = baseline2020, kind='reg')

> There remains little to no correlation between new cases and returns of our portfolio, and we can conclude our portfolio will be able to perform irregardless of the cases in Germany

In [None]:
.
future_days = 54
final_portfolio_index['Prediction'] = final_portfolio_index[['Portfolio Index']].shift(-future_days)
#print(final_portfolio_index)
X = np.array(final_portfolio_index.drop(['Prediction'],1))[:-future_days]
#print(X)
y = np.array(final_portfolio_index['Prediction'])[:-future_days]
#print(y)

In [None]:
x_train,x_test,y_train,y_test = train_test_split(X,y, test_size = 0.2)

In [None]:
tree = DecisionTreeRegressor().fit(x_train,y_train)
lr = LinearRegression().fit(x_train,y_train)

x_future = final_portfolio_index.drop(['Prediction'],1)[:-future_days]
x_future = np.array(x_future.tail(future_days))
#print(final_portfolio_index)

In [None]:
tree_prediction = tree.predict(x_future)
#print(tree_prediction)


In [None]:
lr_prediction = lr.predict(x_future)
#print(lr_prediction)

In [None]:
predictions = tree_prediction
valid = final_portfolio_index[X.shape[0]:]
valid['Predictions'] = predictions
plt.figure(figsize=(16,8))
plt.title('Model')
plt.xlabel('Days')
plt.ylabel('Close Price USD ($)')
plt.plot(final_portfolio_index['Portfolio Index'])
plt.plot(valid[['Portfolio Index','Predictions']])
plt.legend(['Orig','Val','Pred'])
plt.show()

In [None]:
predictions = lr_prediction
valid = final_portfolio_index[X.shape[0]:]
valid['Predictions'] = predictions
plt.figure(figsize=(16,8))
plt.title('Model')
plt.xlabel('Days')
plt.ylabel('Close Price USD ($)')
plt.plot(final_portfolio_index['Portfolio Index'])
plt.plot(valid[['Portfolio Index','Predictions']])
plt.legend(['Orig','Val','Pred'])
plt.show()

<a id="subsection-six"></a>
# 6.Conclusion
While Germany successfully overcame the 1st wave, a lot of uncertainties are brought by the massive 2nd wave it is experiencing right now. And even though the German Government has provided rescue packages of a huge amount, the outlook remains pessimistic. No significant lowering of new cases can yet be seen in Germany and if this continues the German market is not going to perform well. While the portfolio generates an extraordinarily high return, it is developed in a way to hedge against uncertainty for a short time horizon of maximum of a year. Reallocation might be needed depending on the future state of Germany and the situation of the world also taking into consideration the possibility of the distribution of the vaccine.   