# Cox Model 
A common disadvantage of the PH or AFT models is that they all assume a specific functional form of the baseline harzard $\lambda_0$. However, avoiding a specification of $\lambda_0(t)$ may be a good idea. <b>First,  it reduces the potential risk of model misspecification. Second, in fact we may not care too much about the baseline hazard. What we care most is, 'why did this guy spend less time on job search on that guy did?',i.e., we are interested in the relative difference of the harzard, and how such difference can be explained by the independent variables. </b> To better illustrate this , let's go back to PH model, and see that
$$\frac{\lambda(x_i,t)}{\lambda(x_j,t)}=e^{x'_i-x'_j}\beta$$
That is to say, the ratio of hazard between $i$ and $j$ only depends on $x$ and $\beta$. This provides a new idea for writing likelihood function. 
## A Working Example
Consider the following situation

|individual|t|x|
|-|-|-|
|1|2|4|
|2|3|1|
|3|6|3|
|4|12|2|

Here we have four individuals. Indivual 1 finds a job 2 months after he graduates, $t_1=2$.Individual 2 finds a job 3 months after he graduates, $t_2=3$.. In the basic model we write down the following likelihood function
$$f(t_1=2|x_1)f(t_2=3|x_2)f(t_3=6|x_3)f(t_4=12|x_4) \tag{2}$$
which requires we estimate the $\lambda_0(t)$.

The above table, however, provides some other information on the 'order' of finding a job.Denote event: <b>1 finds job fastest, 2 the second , the 3 the third, and the 4 the slowest</b>. denote the probability of this Event as $p$. What is the expression of $p$?
- For $t=2$, i.e, two months after each person's graduation, we know that among 1,2,3,4, There is one person that finds a job. Which one? It is 1.Denote $p_1:=$ the probality that 1 finds a job <b>given that there is one person in 1,2,3,4 that finds a job</b> 
- For $t=3$, i.e, 3 months after each person's graduation, we know that among 2,3,4 (i.e., given that 2,3,4 are searching for jobs),one finds a job. which one? It is 2. Denote $p_2:=$ the probability that 2 finds a job <b>given that there is one person in 2,3,4 that finds a job</b>
- For $t=6$, i.e.,6 months after each person's graduation, we know that among 3,4 (i.e.,given that 3,4, are still searching for jobs), one finds a job. Which one? It is 3. Denote $p_3:=$ the probability that 3 finds a job <b>given that there is one person in 3,4 that finds a job</b>
- For $t=12$, i.e., 12 months after each person's graduation, 4 finds a job.Denote $p_4:=$ the probability that 4 finds a job <b>given that there is one person in 4 that finds a job</b>. Of course $p_4=1$.

Intuitively , we have 
$$p=p_1 p_2 p_3 p_4 \tag{3}$$
That is, instead of (2), here the (3) models the relative speed of finding job. We next explore the expression of $p_1$,$p_2$,$p_3$, which is easy given their definitions. 
$$p_1= \frac{\lambda(2|x_1)}{\lambda(2|x_1)+\lambda(2|x_2)+\lambda(2|x_3)+\lambda(2|x_4)}=\frac{e^{x'_1\beta}}{e^{x'_1\beta}+e^{x'_2\beta}+e^{x'_3\beta}+e^{x'_4\beta}}$$
$$p_2 =\frac{\lambda(3|x_2)}{\lambda(3|x_2)+\lambda(3|x_3)+\lambda(3|x_4)}=\frac{e^{x'_2\beta}}{e^{x'_2\beta}+e^{x'_3\beta}+e^{x'_4\beta}}$$
$$p_3=\frac{\lambda(6|x_3)}{\lambda(6|x_3)+\lambda(6|x_4)}=\frac{e^{x'_3\beta}}{e^{x'_3\beta}+e^{x'_4\beta}}$$

<b>Since we have Proportional hazard, the baseline hazard disappears from both the nominator and denominator.</b> Given this, we can write down the the expression of $p$ immediately.<b> To summarize, this $p$ captures the probability for the observed 'order' of the job finding, but not the absolute time point. In some sense, such likelihood function only captures part of the information of the data, therefore it is 'partial likelihood function'</b>

## Another Working Example 
In the above example, different individuals have different $t$. Now consider the following case 

|individual|t|x|
|-|-|-|
|1|2|4|
|2|3|1|
|3|3|3|
|4|12|2|

Now both 2 and 3 find a job 3 months after they graduate. Denote event: <b>1 finds job fastest, 2 and 3 the second ,  the 4 the slowest</b>. denote the probability of this Event as $p$. What is the expression of $p$?
- For $t=2$, i.e, two months after each person's graduation, we know that among 1,2,3,4, There is one person that finds a job. Which one? It is 1.Denote $p_1:=$ the probality that 1 finds a job <b>given that there is one person in 1,2,3,4 that finds a job</b> 
- For $t=3$, i.e, 3 months after each person's graduation, we know that among 2,3,4 (i.e., given that 2,3,4 are searching for jobs),two people find jobs. which two? It is 2 and 3. Denote $p_2:=$ the probability that 2 and 3 find a job <b>given that there two people in 2,3,4 that finds a job</b>
- For $t=12$, i.e., 12 months after each person's graduation, 4 finds a job.Denote $p_4:=$ the probability that 4 finds a job <b>given that there is one person in 4 that finds a job</b>. Of course $p_4=1$.

The expression for $p_1$ is the same as in the first example. The $p_{23}$ is also easy to calculate given its definition. 
$$p_2 =\frac{\lambda(3|x_2)\lambda(3|x_3)}{\lambda(3|x_2)\lambda(3|x_3)+\lambda(3|x_3)\lambda(3|x_4)+\lambda(3|x_4)\lambda(3|x_2)}=\frac{e^{x'_2\beta}}{e^{x'_2\beta}e^{x'_3\beta}+e^{x'_3\beta}e^{x'_4\beta}+e^{x'_4\beta}e^{x'_2\beta}}$$


## Stratified Cox Model
From the above analysis, we know that one assumption of COX model is that it must the satisfy proportion harzard specfication, that is, 
$$\lambda(t,x)=\lambda_0(t)e^{x'\beta}$$
<b> Only under this assumption can we write down the above likelihood functions that are independent of $t$!</b>. But sometimes, we may not be able satisfy such a condition. For example, gender. It is quite reasonable that male and female have different baseline hazards,meaning that gender variable cannot be sepearated from the baseline harzard, and we can no longer kill the baseline harzard part from the likelihood function. 
<p>One method, naturally, is to allow for difference in the baseline harzard for male $\lambda^m_0(t)$ and for female $\lambda^f_0(t)$, and write down the partial likelihood function for male and men respectively. The harzard for male and female are respectively:
    $$\lambda(t,x,g)=\begin{cases}
    \lambda^{m}_0(t)e^{x'\beta},g=male\\
    \lambda^{f}_0(t)e^{x'\beta},g=female\\
    \end{cases}
    $$
    To be specific, for male, we consider the relative order of finding a job among the male, and write down the likelihood  function. male have common baseline hazard function, there we can get a likelihood function independent of baseline hazard. The same is for female. Intuitively, there are two main disadvantages in this approach.</p>

- We are not able to estimate the effect of gender on the job search hazard.
- Information loss: we only consider the order of job finding inside each gender group, but the information on the order of job finding for all individuals are lost....(?)