# 8.Survival Analysis: Non-parametric, Semi-parametric, Parametric

In [1]:
if c(os) == "MacOSX" {
    use wk7output/student_pressure, clear  
}
else {
	use wk7output\student_pressure, clear 
} 
set seed 340600
g withdraw = rbinomial(1, .1)
sort student_id session_date
list if inrange(student_id, 2, 5)
g withdraw_dt = session_date if withdraw==1
sum session_date
*replace withdraw_dt = r(max)
format withdraw_dt %td
list if inrange(student_id, 2, 5)
by student_id: egen end = min(withdraw_dt)
format end %td
list if inrange(student_id, 2, 5)
tab end, mi
sum session_date 
list if inrange(student_id, 2, 5)
replace end=r(max) if missing(end)
tab end, mi
by student_id: egen begin = min(session_date)
format begin %td
list if inrange(student_id, 2, 5)
by student_id: replace withdraw = 1 if withdraw[_n-1]==1  
list if inrange(student_id, 2, 5)
/* --- insertion ---*/
bys student_id: egen inelig=min(withdraw)
l if inelig==1
drop if inelig==1 
codebook student_id 
keep student_id session sbp withdraw end begin 
order student_id session begin end withdraw sbp 
/* --- continue ---*/
stset end, fail(withdraw) enter(begin) origin(begin) scale(7.0242308)
g stressed = sbp>140
#delimit ;   
sts graph, 
    fail per(100)
	ylab(,format(%2.0f))
	by(stressed)
	tmax(7)
	xlab(0(1)7)
	xti("Week")
	ti("Stata I dropout rate, %")
	/*
	legend(
	    order(1 2)
		lab(1 "SBP>140mmHg")
		lab(2 "SBP<140mmHg")
		ring(0)
		pos(11)
	)
	*/
	risktable
	note("Source: From Simulated Data",
	    pos(7)
	)
; 
#delimit cr 
graph export wk7output/class_attrition_wk8.png, replace 
list in 1
capture isid student_id
if _rc !=0 {
    di "These data are nested and each student has several records"	
}
else {
	di "There's only a single record per student"
}
stcox stressed
//creturn list
ereturn list 
return list
matrix m = r(table)
matrix list m
lincom stressed 
return list
qui {  
	local p = r(p)
	local hr : di %3.2f exp(r(estimate))
	local lb : di %3.2f exp(r(lb))
	local ub : di %3.2f exp(r(ub))
    if r(p) < 0.01 {
       local p: di "p < 0.01"
    }
    else if inrange(r(p),0.01,0.05) {
       local p: di %3.2f r(p)
    }
    else {
       local p: di %2.1f r(p)
    }
    if `p' < .05 {
		noi di "There's an association between SBP>140mmHg and withrawal from Stata I: HR=`hr', 95%CI: `lb'-`ub', p=`p'"
	}
	else {
		noi di "There's no association between SBP>140mmHg and withrawal from Stata I: HR=`hr', 95%CI: `lb'-`ub', p=`p'"
	}
} 
 
egen representation = tag(student_id)
l in 1/10 
keep if representation==1
count
capture isid student_id
if _rc !=0 {
    di "These data are nested and each student has several records"	
}
else {
	di "There's only a single record per student"
}
 



file wk7output\student_pressure.dta not found


r(601);


r(601);


r(601);







![](wk7output/class_attrition_wk8.png)

### 8.1 Nonparametric

#### ChatGPT annotation:
This Stata script processes student data for an analysis of dropout rates and generates a survival graph. Below is a detailed explanation of each step in the script:

1. **Loading Data Based on OS**:
   - The script first checks the operating system (OS) and loads the appropriate dataset:
     ```stata
     if c(os) == "MacOSX" {
         use wk7output/student_pressure, clear  
     }
     else {
         use wk7output\student_pressure, clear 
     } 
     ```

2. **Setting Seed and Creating Withdrawal Variable**:
   - Sets a seed for reproducibility and creates a binary variable `withdraw` to indicate whether a student withdraws:
     ```stata
     set seed 340600
     g withdraw = rbinomial(1, .1)
     ```

3. **Sorting and Listing Data**:
   - Sorts the data by `student_id` and `session_date` and lists a subset of the data:
     ```stata
     sort student_id session_date
     list if inrange(student_id, 2, 5)
     ```

4. **Generating Withdrawal Date and Summary Statistics**:
   - Creates a variable `withdraw_dt` to record the session date if the student withdraws and provides summary statistics:
     ```stata
     g withdraw_dt = session_date if withdraw==1
     sum session_date
     ```

5. **Formatting and Listing Data**:
   - Formats the `withdraw_dt` variable and lists the data again:
     ```stata
     format withdraw_dt %td
     list if inrange(student_id, 2, 5)
     ```

6. **Generating and Formatting End Date**:
   - Creates an `end` variable to record the earliest withdrawal date for each student and formats it:
     ```stata
     by student_id: egen end = min(withdraw_dt)
     format end %td
     list if inrange(student_id, 2, 5)
     ```

7. **Handling Missing End Dates**:
   - Replaces missing `end` dates with the maximum session date and tabulates the `end` variable:
     ```stata
     replace end=r(max) if missing(end)
     tab end, mi
     ```

8. **Generating and Formatting Begin Date**:
   - Creates a `begin` variable for the earliest session date for each student and formats it:
     ```stata
     by student_id: egen begin = min(session_date)
     format begin %td
     list if inrange(student_id, 2, 5)
     ```

9. **Ensuring Sequential Withdrawals**:
   - Ensures that once a student withdraws, they are marked as withdrawn in subsequent records:
     ```stata
     by student_id: replace withdraw = 1 if withdraw[_n-1]==1  
     list if inrange(student_id, 2, 5)
     ```

10. **Marking and Dropping Ineligible Students**:
    - Marks ineligible students (those who withdraw) and drops them from the dataset:
      ```stata
      bys student_id: egen inelig=min(withdraw)
      l if inelig==1
      drop if inelig==1 
      ```

11. **Preparing for Survival Analysis**:
    - Keeps relevant variables and orders them, then sets up the survival data structure:
      ```stata
      codebook student_id 
      keep student_id session sbp withdraw end begin 
      order student_id session begin end withdraw sbp 
      stset end, fail(withdraw) enter(begin) origin(begin) scale(7.0242308)
      ```

12. **Creating a Stress Indicator**:
    - Creates a `stressed` variable indicating high systolic blood pressure (SBP):
      ```stata
      g stressed = sbp>140
      ```

13. **Generating Survival Graph**:
    - Generates a survival graph with failure percentages, stratified by the `stressed` variable, and exports the graph as a PNG file:
      ```stata
      #delimit ;   
      sts graph, 
          fail per(100)
          ylab(,format(%2.0f))
          by(stressed)
          tmax(7)
          xlab(0(1)7)
          xti("Week")
          ti("Stata I dropout rate, %")
          /*
          legend(
              order(1 2)
              lab(1 "SBP>140mmHg")
              lab(2 "SBP<140mmHg")
              ring(0)
              pos(11)
          )
          */
          risktable
          note("Source: From Simulated Data",
              pos(7)
          )
      ; 
      #delimit cr 
      graph export wk7output/class_attrition_wk8.png, replace 
      ```

14. **Checking Data Structure**:
    - Checks if the data is nested by student and displays a message accordingly:
      ```stata
      list in 1
      capture isid student_id
      if _rc !=0 {
          di "These data are nested and each student has several records"    
      }
      else {
          di "There's only a single record per student"
      }
      ```

This script performs a thorough analysis of student withdrawal patterns, generates meaningful variables, and produces a clear and informative survival graph.
### 8.2 Semiparametric
#### ChatGPT annotation
This Stata script performs a Cox proportional hazards regression, extracts results, and formats the output. Here’s an annotated breakdown of the script:

1. **Cox Proportional Hazards Model**:
   - The script fits a Cox model with `stressed` as the predictor:
     ```stata
     stcox stressed
     ```

2. **Listing Stored Results**:
   - Lists estimation results and other stored results:
     ```stata
     //creturn list
     ereturn list 
     return list
     ```

3. **Storing Results in a Matrix**:
   - Stores the results of the Cox model in a matrix `m` and lists the matrix:
     ```stata
     matrix m = r(table)
     matrix list m
     ```

4. **Linear Combination of Estimates**:
   - Calculates a linear combination of the `stressed` coefficient:
     ```stata
     lincom stressed 
     return list
     ```

5. **Extracting and Formatting Results**:
   - Quietly extracts the p-value, hazard ratio (HR), and its 95% confidence interval (CI), formats them, and prints a message based on the significance of the association:
     ```stata
     qui {  
         local p = r(p)
         local hr : di %3.2f exp(r(estimate))
         local lb : di %3.2f exp(r(lb))
         local ub : di %3.2f exp(r(ub))
         
         if r(p) < 0.01 {
             local p: di "p < 0.01"
         }
         else if inrange(r(p),0.01,0.05) {
             local p: di %3.2f r(p)
         }
         else {
             local p: di %2.1f r(p)
         }
         
         if `p' < .05 {
             noi di "There's an association between SBP>140mmHg and withdrawal from Stata I: HR=`hr', 95%CI: `lb'-`ub', p=`p'"
         }
         else {
             noi di "There's no association between SBP>140mmHg and withdrawal from Stata I: HR=`hr', 95%CI: `lb'-`ub', p=`p'"
         }
     } 
     ```

### Explanation of Key Components:
1. **Fitting the Cox Model**:
   - `stcox stressed` fits a Cox proportional hazards model with the variable `stressed` (indicating SBP > 140 mmHg) as the predictor.
   
2. **Listing Results**:
   - `ereturn list` and `return list` display the estimation results and stored results from the last estimation command.
   
3. **Extracting Results into a Matrix**:
   - `matrix m = r(table)` stores the results of the Cox model in a matrix named `m`, which is then listed using `matrix list m`.
   
4. **Calculating Linear Combinations**:
   - `lincom stressed` calculates the linear combination of estimates for the `stressed` variable, with `return list` listing the results.
   
5. **Quiet Block for Extracting and Formatting**:
   - `qui` (quietly) suppresses output while extracting and formatting results.
   - The `local` commands assign results (p-value, HR, lower and upper bounds of the 95% CI) to local macros.
   - The `di` (display) command formats these values.
   - The script checks the significance of the p-value and prints a message indicating whether there is a statistically significant association between high SBP and withdrawal from the Stata I class.

This script is comprehensive for fitting a Cox model, extracting results, and providing a clear interpretation based on statistical significance.
### 8.3 Parametric