# COVID-19 California Prediction Model

## Intro

In this notebook we build two models to forecast the COVID-19 in California.
* Model to predict confirmed cases
* Model to predict fatalities

## HyMagic and Useful Macros

In [1]:
!pip install hy > /dev/null

In [2]:
# Hy Magic
import IPython
def hy_eval(*args):import hy;return hy.eval(hy.read_str("(do\n"+"".join(map(lambda s:s or "",args))+"\n)\n"),globals())
@IPython.core.magic.register_line_cell_magic
def h(*args):hy_eval(*args) # Silent: Does not print result.
@IPython.core.magic.register_line_cell_magic
def hh(*args): return hy_eval(*args) # Verbose: Prints result.
del h, hh

In [3]:
%%h
(import  [useful [*]])
(require [useful [*]])

## Covid Data Paths

In [4]:
%%h
; Figure out location of data
(s covid-root-kaggle "/kaggle/input")
(s covid-root-laptop "$HOME/d")
(s covid-root 
   (-> "/kaggle" 
       (os.path.exists) 
       (if covid-root-kaggle covid-root-laptop)
       (os.path.expandvars)))
(s covid-prefix (+ covid-root "/covid19-local-us-ca-forecasting-week-1/ca_"))
(s covid-train  (+ covid-prefix "train.csv"))
(s covid-test   (+ covid-prefix "test.csv"))
(s covid-submit (+ covid-prefix "submission.csv"))

## Data Exploration and Visualization

In [5]:
%%hh
; Find out day of week
(-> covid-train
  (pd.read-csv) 
  (pd-keep ["Date" "ConfirmedCases" "Fatalities"]) 
  (.assign :Date (fn [x] (pd.to-datetime :yearfirst True x.Date)))
  (.assign :Day  (fn [x] (-> (x.Date.dt.day_name))))
  (.head)
)

In [6]:
%%h
; Plot the data

(import math)
(-> 39.94 (* 1000) (* 1000) (math.log1p) (s1 ca-pop-log))

(-> covid-train
  (pd.read-csv) 
  (pd-keep ["Date" "ConfirmedCases" "Fatalities"]) 
  (.where (fn1-> (. ConfirmedCases) (> 0.0))) (.dropna)
  (.assign :ConfirmedCases (fn1-> (. ConfirmedCases) (np.log1p))) 
  (.assign :Fatalities     (fn1-> (. Fatalities)     (np.log1p)))
  (.assign :Date           (fn1-> (. Date)           (pd.to-datetime :yearfirst True)))
  (.assign :DayOfYear      (fn1-> (. Date) (. dt) (. dayofyear)))
  (.assign :Day            (fn1-> (. Date) (. dt) ( .day_name))) 
  (.dropna)
  (s1 df1))

(-> df1
  (pd-keep ["Date" "ConfirmedCases" "Fatalities"])
  (.set-index "Date")
  (pd-plot "Log1p")
  (display))

## Diffusion Model

The model is from a marketing paper by Emmanuelle Le Nagard and Alexandre Steyer, that attempts to reflect the social structure of a diffusion process. The paper is available (in French) [here](https://www.jstor.org/stable/40588987). It is inspired by these notebooks:

* [Carl Kirstein's notebook](https://www.kaggle.com/carlkirstein/covid-19-prediction-competition)
* [Alix Martin's notebook](https://www.kaggle.com/alixmartin/covid-19-predictions)
* [Emmanuelle Le Nagard and Alexandre Steyer's paper (in French)](https://www.jstor.org/stable/40588987?seq=1)

The essential idea here is the diffusion formula for product innovation and applying that to viral infection. We can build two models using the same expression: one for confirmed cases, one for fatalities. Here is the formula.

\begin{equation*}
N (1 - e^{-a(t - t_0)})^\alpha
\end{equation*}

The main difference in our implementation is that we are setting $N = 1$. Here is the justification.

* When $t = 0$ the value of this expression is $0$, i.e. zero infections or deaths.
* When $t$ is extremely large this expression converges to a stable value that is $N$ if $\alpha = 1$, or a stable value that is a fraction of $N$ if $\alpha > 1$. Note that $\alpha$ cannot be less than $1$.
* $N$ is the population. In our case the population of California.
* To make the math easier, let us normalize to $N = 1$. This means we will use cases divided by the population of California. Instead of absolute number of cases or fatalities we will instead have percentages of the population.

With this change the expression becomes simpler.

\begin{equation*}
(1 - e^{-a(t-t_0)})^\alpha
\end{equation*}


In [7]:
%%h
; Define data frames.

(s MILLION    (-> 1000 (* 1000)))
(s population (-> 39.94 (* MILLION)))

(-> covid-train
  (pd.read-csv) 
  (pd-keep ["Date" "ConfirmedCases" "Fatalities"]) 
  (.rename :columns {"ConfirmedCases" "Conf" "Fatalities" "Dead" })
  (.assign :Conf (fn1-> (. Conf) (/ population))) 
  (.assign :Dead (fn1-> (. Dead) (/ population)))
  (.dropna)
  (s1 df))

(setv conf-actual (df.Conf.rename "Actual"))
(setv dead-actual (df.Dead.rename "Actual"))

In [8]:
%%h
(import [scipy.optimize [minimize]])
(import [math [exp]])

; Define conf model then run it.

(defn conf-model [a alpha t0 t]
  (setv t-delta (- t t0))
  (if (< t-delta 0)
    0.0
    (** (- 1 (exp (* (- a) t-delta))) alpha)))

(defn conf-model-loss [x df]
  (setv (, a alpha t0) x)
  (setv r 0)
  (for [t (range (len df))]
    (+= r (-> (conf-model a alpha t0 t) (- (get df t)) (** 2))))
  r)

(-> conf-model-loss 
    (minimize :x0 (np.array [0.1 1.0 5]) 
              :args conf-actual 
              :method "Nelder-Mead" :tol 1e-6)
    (s1 conf-opt))

In [9]:
%%h

; Define dead model then run it.

(defn dead-model [death-rate lag t]
  (s (, a alpha t0) conf-opt.x)
  (s t (- t lag))
  (s conf (conf-model a alpha t0 t))
  (s dead (* conf death-rate)))

(defn dead-model-loss [x df]
  (s (, death-rate lag) x)
  (s (, a alpha t0) conf-opt.x)
  (s r 0)
  (for [t (range (len df))]
    (+= r (-> (dead-model death-rate lag t) (- (get df t)) (** 2))))
  r)

(-> dead-model-loss 
    (minimize :x0 (np.array [0.01 15]) 
              :args dead-actual
              :method "Nelder-Mead" :tol 1e-6)
    (s1 dead-opt))

[conf-opt dead-opt]

In [10]:
%%h
(defn model-to-fn [model opt] 
  (fn [&rest args]
    (setv params (-> opt (. x) (list) (+ (list args))))
    (-> model (apply params))))

(-> conf-model (model-to-fn conf-opt) (s1 conf-fn))
(-> dead-model (model-to-fn dead-opt) (s1 dead-fn))

In [11]:
%%h
; Compare actual vs predictions
(-> conf-actual (len) (range) (map1 conf-fn) (pd.Series :name "Predict") (s1 conf-predict))
(-> (pd.concat [conf-actual conf-predict] :axis 1) (s1 conf-eval))
(-> conf-eval (* population) (.plot :title "Confirmed Cases"))

(-> dead-actual (len) (range) (map1 dead-fn) (pd.Series :name "Predict") (s1 dead-predict))
(-> (pd.concat [dead-actual dead-predict] :axis 1) (s1 dead-eval))
(-> dead-eval (* population) (.plot :title "Fatalities"))

In [12]:
%%h
(import [sklearn [metrics]])

; Calculate conf errors.
(print "Confirmed Cases Errors")
(print "Conf MSE =" (metrics.mean-squared-error (conf-eval.Actual.to-numpy) (conf-eval.Predict.to-numpy)))
(print "Conf MAE =" (metrics.mean-absolute-error (conf-eval.Actual.to-numpy) (conf-eval.Predict.to-numpy)))
(print "Conf RMSE =" (np.sqrt (metrics.mean-squared-error (conf-eval.Actual.to-numpy) (conf-eval.Predict.to-numpy))))

; Calculate dead errors.
(print "Fatalities Errors")
(print "Dead MSE =" (metrics.mean-squared-error (dead-eval.Actual.to-numpy) (dead-eval.Predict.to-numpy)))
(print "Dead MAE =" (metrics.mean-absolute-error (dead-eval.Actual.to-numpy) (dead-eval.Predict.to-numpy)))
(print "Dead RMSE =" (np.sqrt (metrics.mean-squared-error (dead-eval.Actual.to-numpy) (dead-eval.Predict.to-numpy))))

## Populate Submission File

In [13]:
%%h
; Next lets build out the test
(defn pd-head-tail [df]
  (print "Rows = "(-> df (len)))
  (-> df (.head 1) (display))
  (-> df (.tail 1) (display)))

(-> covid-train  (pd.read-csv) (s1 df-train))
(-> covid-test   (pd.read-csv) (s1 df-test))

;(pd-head-tail df-train)
;(pd-head-tail df-test)

In [14]:
%%h
; Compute date0 of training data.
(-> df-train (. Date) (get 0) (dateparser.parse) (s1 date0-train))

; Useful functions.
(defn date-string->t [d] 
  (-> d (dateparser.parse) (- date0-train) (. days)))
(defn date-string->confirmed-cases [d]
  (-> d (date-string->t) (conf-fn) (* population) (int)))
(defn date-string->fatalities [d]
  (-> d (date-string->t) (dead-fn) (* population) (int)))

; Submission.
(-> df-test 
    (.to-dict "records") 
    (map1 (fn [r] 
            (-> r
              (assoc1 "ConfirmedCases" (-> r (get "Date") (date-string->confirmed-cases)))
              (assoc1 "Fatalities"     (-> r (get "Date") (date-string->fatalities))))))
    (pd.DataFrame)
    (pd-keep ["ForecastId" "ConfirmedCases" "Fatalities"])
    (.to-csv "submission.csv" :index False))