# Predicting Mortgage Delinquency Risk

**Note: this is a new exercise, so if you find something weird, please bring it to my attention.**

You have been hired by a mortgage servicing firm (a company that buys mortgages and then collects mortgage payments from homeowners) to build a model to answer the question: 

**Given all available information about a newly issued mortgage, what is the likelihood that the mortgage will enter delinquency (the homeowner will be at least 30 days late on a mortgage payment) during the first two years of the mortgage?**

The servicer's hope, obviously, is to differentiate between mortgages to try and purchase (those that will be consistently paid) and mortgages they wish to avoid.

For this task, you have been given [REAL data on a sample of all US Standard single family home mortgages purchased or insured by Freddie Mac](https://www.freddiemac.com/research/datasets/sf-loanlevel-dataset) in a single calendar year along with payment data from that and two subsequent years.


## Gradescope Autograding

Please follow [all standard guidance](https://www.practicaldatascience.org/html/autograder_guidelines.html) for submitting this assignment to the Gradescope autograder, including storing your solutions in a dictionary called `results` and ensuring your notebook runs from the start to completion without any errors.

For this assignment, please name your file `exercise_passive_prediction.ipynb` before uploading.

You can check that you have answers for all questions in your `results` dictionary with this code:

```python
assert set(results.keys()) == {
    "ex2_merge_type",
    "ex4_num_mortgages",
    "ex5_num_obs",
    "ex7_num_mortgages",
    "ex7_share_delinquent",
    "ex10_num_obs",
    "ex12_roc_auc",
    "ex14_false_negative_rate",
    "ex16_num_obs",
    "ex16_share_delinquent",
    "ex17_false_negative_rate",
}
```


### Submission Limits

Please remember that you are **only allowed FOUR submissions to the autograder.** Your last submission (if you submit 4 or fewer times), or your third submission (if you submit more than 4 times) will determine your grade Submissions that error out will **not** count against this total.

That's one more than usual in case there are issues with exercise clarity.

## Data Cleaning and Organization

Data for this exercise can be [found here](https://github.com/nickeubank/MIDS_Data/tree/master/mortgages/2004). This folder includes both the data to be used and documentation, though you can find [supplemental documentation here](https://www.freddiemac.com/research/datasets/sf-loanlevel-dataset).

The only difference between this data and the original Freddie Mac sampled data is that I've limited the scope of service data to three calendar years.

In [2]:
results = {}

### Exercise 1

Begin by loading both: 

- the mortgage origination file (`sample_orig_2004.txt`). This *should* contain information on all mortgages issued in 2004, along with non-time varying features of these mortgages (the initial amount, the credit score of the applicant, etc.), and 
- the servicing data (`sample_svcg_2004orig_3years.txt`). This contains monthly records of all recorded payments (or non-payments) for all mortgages issued in 2004 during the calendar years of 2004, 2005, and 2006.

So the autograder can see the data, be sure to load it directly from a URL (don't download and load from your own system).

Load the data AND ensure your data has column names. You will likely need to reference the documentation to figure out how to do so.

In [1]:
import pandas as pd 

origination_url = 'https://media.githubusercontent.com/media/nickeubank/MIDS_Data/master/mortgages/2004/sample_orig_2004.txt'
servicing_url = 'https://media.githubusercontent.com/media/nickeubank/MIDS_Data/master/mortgages/2004/sample_svcg_2004orig_3years.txt'

origination_columns = [ 'credit_score', '1st_payment_date','1st_time_buyer', 'maturity_date', 'msa', 'mi_percent', 'num_of_units', 'occupancy_status', 'original_cltv', 'original_dti_ratio', 'original_upb', 'original_ltv', 'original_interest_rt', 'channel', 'ppm_flag', 'amortization_type', 'property_state', 'property_type', 'postal_code', 'loan_seq_num', 'loan_purpose', 'original_loan_term', 'num_borrowers', 'seller_name', 'service_name', 'super_conforming', 'prr_loan_seq_num', 'program_indicator', 'rr_indicator', 'prop_val_method', 'i/o_indicator', 'mi_cancel_indicator' ]
servicing_columns = [ 'loan_seq_num', 'monthly_rep_per', 'current_actual_upb', 'current_delinq_status', 'loan_age', 'month_to_maturity', 'defect_settle_date', 'mod_flag', 'zero_bal_code','zero_bal_eff_data', 'curr_int_rate', 'curr_non_int_bearing_upb', 'ddlpi', 'mi_recoveries', 'net_sale_proceeds', 'non_mi_recoveries', 'total_exp', 'legal_costs', 'maint_preserve_costs', 'taxes_insurance', 'misc_exp', 'actual_loss_cal', 'cum_mod_cost', 'step_mod', 'payment_deferral', 'eltv', 'zero_bal_rem_upb', 'delinq_acuired_int', 'disaster_delinq', 'borrower_assist_status', 'curr_month_mod_cost', 'int_bearing_upb' ]

# Load the mortgage origination data with proper column names
origination_data = pd.read_csv(origination_url, sep="|", names=origination_columns)

# Load the servicing data with proper column names
servicing_data = pd.read_csv(servicing_url, sep="|", names=servicing_columns)

# Check the first few rows of the loaded data
print(origination_data.head())
print(servicing_data.head())

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
  servicing_data = pd.read_csv(servicing_url, sep="|", names=servicing_columns)


   credit_score  1st_payment_date 1st_time_buyer  maturity_date      msa  \
0           701            200403              N         203402  45060.0   
1           648            200403              N         202402      NaN   
2           653            200403              Y         203402  20740.0   
3           747            200403              N         203402  30700.0   
4           731            200403              N         201902      NaN   

   mi_percent  num_of_units occupancy_status  original_cltv  \
0          25             1                P             90   
1           0             1                P             80   
2          17             1                P             87   
3           0             2                I             70   
4           0             1                P             80   

   original_dti_ratio  ...  num_borrowers                         seller_name  \
0                  43  ...              2                       Other sellers   
1 

### Exercise 2

What is the unit of observation in `sample_orig_2004.txt` and in `sample_svcg_2004orig_3years.txt`?

**ANSWER**

For the **'sample_orig_2004.txt'** file, which contains origination data, the unit of observation is most likely a unique mortgage loan. Each row in this dataset probably corresponds to a unique loan, with details about the mortgage terms, borrower's credit score, property information, and so on.

For the **'sample_svcg_2004orig_3years.txt'** file, which includes servicing data, the unit of observation is usually a monthly record of a mortgage loan's payment status. Each row in this dataset likely represents a single month's payment information for a specific loan.

### Exercise 3

Merge your two datasets. Be sure to use the `validate` keyword argument in `merge`.

Assuming that you list the data associated with `sample_orig_2004.txt` first and `sample_svcg_2004orig_3years.txt` second, what keyword are you passing to `validate`? Store your answer as a string (use one of: `"1:1"`, `"m:1"`, `"1:m"`, `"m:m"`) in a dictionary called `results` under the key `ex2_merge_type`.

In [3]:
merged_data = origination_data.merge(servicing_data, on='loan_seq_num', validate='1:m')

results['ex2_merge_type'] = '1:m'

print(merged_data.head())
print(results)

   credit_score  1st_payment_date 1st_time_buyer  maturity_date      msa  \
0           701            200403              N         203402  45060.0   
1           701            200403              N         203402  45060.0   
2           701            200403              N         203402  45060.0   
3           701            200403              N         203402  45060.0   
4           701            200403              N         203402  45060.0   

   mi_percent  num_of_units occupancy_status  original_cltv  \
0          25             1                P             90   
1          25             1                P             90   
2          25             1                P             90   
3          25             1                P             90   
4          25             1                P             90   

   original_dti_ratio  ...  cum_mod_cost  step_mod  payment_deferral eltv  \
0                  43  ...           NaN       NaN               NaN  NaN   
1         

### Exercise 4

Mortgages come in many shapes and flavors, however your servicer is only interested in predicting default for the more standard form of mortgage. Subset your data to only include:

- Mortgages taken out for purchase of a property,
- With first payments due in the quarter of origination or the first quarter after origination.

(In a perfect world we would just limit our analysis to mortgages where the first payment is due the month after origination. Unfortunately we only know the *quarter* of origination, so the only way to subset for relatively vanilla mortgages is to look for mortgages where the first payment was due in the same quarter or the quarter after origination.)

Subset for these mortgages. How many unique mortgages remain in the data? 

Hint: You may need to read the documentation for the `Loan Sequence Number` variable.

Store the resulting number of unique mortgages in `results` under the key `ex4_num_mortgages`.

In [4]:
# Filter for purchase loans and explicitly copy to avoid SettingWithCopyWarning
purchase_loans = merged_data[merged_data['loan_purpose'] == 'P'].copy()

# Directly modify the DataFrame without chained indexing
purchase_loans['origination_year'] = 2000 + purchase_loans['loan_seq_num'].str[1:3].astype(int)
purchase_loans['origination_quarter'] = purchase_loans['loan_seq_num'].str[4].astype(int)
purchase_loans['1st_payment_year'] = (purchase_loans['1st_payment_date'] // 100).astype(int)
purchase_loans['1st_payment_month'] = (purchase_loans['1st_payment_date'] % 100).astype(int)
purchase_loans['1st_payment_quarter'] = ((purchase_loans['1st_payment_month'] - 1) // 3 + 1).astype(int)

# Pre-compute and store conditions to avoid reindexing warnings
condition_same_year = (purchase_loans['origination_year'] == purchase_loans['1st_payment_year'])
condition_quarter_sequence = (purchase_loans['1st_payment_quarter'] >= purchase_loans['origination_quarter']) & (purchase_loans['1st_payment_quarter'] <= purchase_loans['origination_quarter'] + 1)
condition_year_end_turnover = ~((purchase_loans['origination_quarter'] == 4) & (purchase_loans['1st_payment_quarter'] == 1) & (purchase_loans['1st_payment_year'] != purchase_loans['origination_year'] + 1))

# Apply conditions to filter valid_loans
valid_loans = purchase_loans[condition_same_year & condition_quarter_sequence & condition_year_end_turnover]

# Reset index if there have been significant row-wise operations
valid_loans.reset_index(drop=True, inplace=True)

# Count the number of unique mortgages
num_unique_mortgages = valid_loans['loan_seq_num'].nunique()

# Store the result in a dictionary and print
results['ex4_num_mortgages'] = num_unique_mortgages

print(f"Number of unique mortgages for purchase with payments due in origination quarter or the next: {num_unique_mortgages}")
print(results)


Number of unique mortgages for purchase with payments due in origination quarter or the next: 14586
{'ex2_merge_type': '1:m', 'ex4_num_mortgages': 14586}


### Exercise 5

The servicer wants to predict delinquency during the first 24 payment due dates (you may assume payments are due every month starting with the month the first payment is due). Subset the data to these first 24 (possible) payment due dates.

Note that not all loans will have 24 records in the servicing file in the first 24 months as a result of data merging issues on behalf of Freddie Mac. As noted in the Freddie Mac documentation:

> For a given loan, each monthly reporting period in the monthly performance data file combines data elements from multiple reporting cycles and systems at Freddie Mac. As such, perceived data anomalies may be a result of timing mismatches between default/delinquency reporting cycles and investor reporting cycles. Examples of some commonly occurring anomalies in the data are included throughout this section. In all cases, the best information available at the time the Dataset is generated, subject to operational constraints, is used.

So subset for the first two years of (possible) payments, resulting in *up to* 24 observations per mortgage (but potentially less given the data cleanliness issues).

After this subsetting, store the number of remaining observations (not mortgages, observation) in `results` under the key `"ex5_num_obs"`.

In [5]:
# Create a copy to avoid SettingWithCopyWarning when modifying
valid_loans = merged_data[merged_data['loan_purpose'] == 'P'].copy()

# Compute the year and month directly
valid_loans.loc[:, 'monthly_rep_year'] = valid_loans['monthly_rep_per'] // 100
valid_loans.loc[:, 'monthly_rep_month'] = valid_loans['monthly_rep_per'] % 100
valid_loans.loc[:, '1st_payment_year'] = valid_loans['1st_payment_date'] // 100
valid_loans.loc[:, '1st_payment_month'] = valid_loans['1st_payment_date'] % 100

# Compute the difference in months between the two dates
valid_loans.loc[:, 'month_diff'] = (valid_loans['monthly_rep_year'] - valid_loans['1st_payment_year']) * 12 + \
                                   (valid_loans['monthly_rep_month'] - valid_loans['1st_payment_month'])

# Filter the DataFrame to keep only the rows where the difference is between 0 and 23
first_24_months = valid_loans[(valid_loans['month_diff'] >= 0) & (valid_loans['month_diff'] < 24)]

# Count the number of remaining observations
num_remaining_observations = len(first_24_months)

# Store the number of remaining observations in the results dictionary
results['ex5_num_obs'] = num_remaining_observations

# Output the result
print(f"Number of remaining observations for the first 24 payment due dates: {num_remaining_observations}")
print(results)


Number of remaining observations for the first 24 payment due dates: 384291
{'ex2_merge_type': '1:m', 'ex4_num_mortgages': 14586, 'ex5_num_obs': 384291}


### Exercise 6

For each unique mortgage in your dataset, create an indicator variable that takes on a value of 1 if, at any time during this period, the mortgage has been delinquent.

Delinquency status is stored in the variable `CURRENT LOAN DELINQUENCY STATUS`, and is coded as:

> CURRENT LOAN DELINQUENCY STATUS – A value corresponding to the number of days the borrower is delinquent, based on the due date of last paid installment (“DDLPI”) reported by servicers to Freddie Mac, and is calculated under the Mortgage Bankers Association (MBA) method.
If a loan has been acquired by REO, then the Current Loan Delinquency Status will reflect the value corresponding to that status (instead of the value corresponding to the number of days the borrower is delinquent).
>
> 0 = Current, or less than 30 days delinquent
> 
> 1 = 30-59 days delinquent
> 
> 2=60–89days delinquent
> 
> 3=90–119days delinquent
> 
> And so on...
> 
> RA = REO Acquisition


In [11]:
# Create a new DataFrame to avoid any potential issues with 'first_24_months' being a view or copy
cleaned_loans = first_24_months.copy()

# Convert 'current_delinq_status' to numeric, errors='coerce' will turn non-numeric into NaN
cleaned_loans.loc[:, 'current_delinq_status'] = pd.to_numeric(cleaned_loans['current_delinq_status'], errors='coerce')

# Fill NaN values with 0
cleaned_loans.loc[:, 'current_delinq_status'] = cleaned_loans['current_delinq_status'].fillna(0)

# Create the delinquency indicator
cleaned_loans.loc[:, 'delinquency_indicator'] = cleaned_loans.groupby('loan_seq_num')['current_delinq_status'] \
    .transform(lambda x: 1 if x.gt(0).any() else 0)

# Get the number of unique mortgages that have been delinquent
num_delinquent_mortgages = cleaned_loans['delinquency_indicator'].sum()

# Store the number in the results dictionary
results['ex6_num_delinquent'] = num_delinquent_mortgages

# Output the result
print(f"Number of unique mortgages that have been delinquent: {num_delinquent_mortgages}")
print(results)


Number of unique mortgages that have been delinquent: 27765
{'ex2_merge_type': '1:m', 'ex4_num_mortgages': 14586, 'ex5_num_obs': 384291, 'ex6_num_delinquent': 27765}


### Exercise 7

At this point, you should be able to drop all servicing variables reported on a monthly basis and just keep information about the original mortgage issuance (and still keep an indicator for whether the mortgage has ever been delinquent).

Store the final number of mortgages in your data under `ex7_num_mortgages` and the share (between 0 and 1) of mortgages that have been delinquent under `ex7_share_delinquent`.

In [12]:
# Assuming 'cleaned_loans' is your current DataFrame after exercise 6

# Define the columns to keep: all columns related to the original mortgage issuance and the delinquency indicator
origination_columns = [
    'credit_score', '1st_payment_date', '1st_time_buyer', 'maturity_date', 'msa', 
    'mi_percent', 'num_of_units', 'occupancy_status', 'original_cltv', 'original_dti_ratio', 
    'original_upb', 'original_ltv', 'original_interest_rt', 'channel', 'ppm_flag', 
    'amortization_type', 'property_state', 'property_type', 'postal_code', 'loan_seq_num', 
    'loan_purpose', 'original_loan_term', 'num_borrowers', 'seller_name', 'service_name', 
    'super_conforming', 'prr_loan_seq_num', 'program_indicator', 'rr_indicator', 'prop_val_method', 
    'i/o_indicator', 'mi_cancel_indicator', 'delinquency_indicator'
]

# Keep only the columns related to original mortgage issuance and the delinquency indicator
final_dataset = cleaned_loans[origination_columns].drop_duplicates()

# Store the final number of mortgages in the results dictionary
results['ex7_num_mortgages'] = len(final_dataset)

# Calculate and store the share of mortgages that have been delinquent
results['ex7_share_delinquent'] = final_dataset['delinquency_indicator'].mean()

# Output the result
print(f"Final number of mortgages: {results['ex7_num_mortgages']}")
print(f"Share of mortgages that have been delinquent: {results['ex7_share_delinquent']}")
print(results)

Final number of mortgages: 17759
Share of mortgages that have been delinquent: 0.07066839349062447
{'ex2_merge_type': '1:m', 'ex4_num_mortgages': 14586, 'ex5_num_obs': 384291, 'ex6_num_delinquent': 27765, 'ex7_num_mortgages': 17759, 'ex7_share_delinquent': 0.07066839349062447}


## Modelling Delinquency Risk

Your data should now be relatively [tidy](https://vita.had.co.nz/papers/tidy-data.pdf), in the technical sense of the term. And that means it should be relatively straightforward for you to build a model that answers the question "Given the features of a newly originated mortgage, how likely is the mortgage holder to fall into delinquency within the first two years after origination?"

### Exercise 8

First, we need to identify the target for our model useful predictors from the data and do feature engineering.

Let's begin with identifying some features that probably *aren't* going to be useful. For example, `"Metropolitan Statistical Area (MSA) Or Metropolitan Division"` is probably *not* an appropriate feature to include in this analysis. Can you figure out why? Make sure to show (quantitatively) why not. 

Hint: should be more than the missing rate.

Hint 2: how many observations for a given city do you think you'd need to determine if that city had especially high mortgage delinquency rates?

Hint 3: if not all possible values of a variable are present in your training data, what problem might that cause during testing and deployment?


In [13]:
# Assuming 'final_dataset' is the DataFrame after exercise 7

# Missing Rate for MSA
msa_missing_rate = final_dataset['msa'].isnull().mean()
print(f"Missing rate for MSA: {msa_missing_rate}")

# Observations per MSA
msa_counts = final_dataset['msa'].value_counts()
print(f"Number of observations per MSA:\n{msa_counts}")

# Variety of MSAs
unique_msas = final_dataset['msa'].nunique()
print(f"Number of unique MSAs: {unique_msas}")

# Association with Delinquency
# For simplicity, we could compare average delinquency rates by MSA
msa_delinquency_rates = final_dataset.groupby('msa')['delinquency_indicator'].mean().sort_values(ascending=False)
print(f"Delinquency rates by MSA:\n{msa_delinquency_rates}")

# Decide whether to include MSA based on these analyses
# If MSA does not seem useful, it may be excluded from the model features

Missing rate for MSA: 0.2793513148262853
Number of observations per MSA:
msa
16974.0    370
26420.0    341
38060.0    309
33460.0    295
35644.0    279
          ... 
48060.0      1
14010.0      1
38220.0      1
44420.0      1
29200.0      1
Name: count, Length: 409, dtype: int64
Number of unique MSAs: 409
Delinquency rates by MSA:
msa
19260.0    1.000000
24420.0    1.000000
49500.0    1.000000
42100.0    1.000000
25060.0    0.833333
             ...   
29340.0    0.000000
29420.0    0.000000
29740.0    0.000000
30020.0    0.000000
29700.0    0.000000
Name: delinquency_indicator, Length: 409, dtype: float64


### Exercise 9

For your analysis, include the following variables: 

```
Credit Score
First Time Homebuyer Flag
Number of Units
Mortgage Insurance Percentage (MI %)
Occupancy Status
Original Debt-to-Income (DTI) Ratio
Original UPB
Original Loan-to-Value (LTV)
Original Interest Rate
Channel
Prepayment Penalty Mortgage (PPM) Flag
Amortization Type (Formerly Product Type)
Property State
Property Type
Original Loan Term
Number of Borrowers
Interest Only (I/O) Indicator
```

Be sure to clean these variables. When doing so, please treat missing data as missing (e.g., `np.nan`, not as a distinct category).

In [14]:
# Assuming 'final_dataset' is your current DataFrame after exercise 7

# Define the columns to keep for the analysis
analysis_columns = [
    'credit_score', 'first_time_homebuyer_flag', 'num_of_units', 'mi_percent', 'occupancy_status',
    'original_dti_ratio', 'original_upb', 'original_ltv', 'original_interest_rt', 'channel',
    'ppm_flag', 'amortization_type', 'property_state', 'property_type', 'original_loan_term',
    'num_borrowers', 'i/o_indicator', 'delinquency_indicator'
]

# Keep only the defined columns
analysis_dataset = final_dataset[analysis_columns]

# Clean the data
# This is a placeholder: actual cleaning will depend on the specifics of your data
analysis_dataset['credit_score'].fillna(np.nan, inplace=True)  # for numerical columns
analysis_dataset['first_time_homebuyer_flag'].replace({'N': 0, 'Y': 1}, inplace=True)  # for flag columns

# Now, proceed to check and clean the rest of the variables as needed
# ...

# Once the cleaning is done, your dataset is ready for further analysis and modeling
# You might want to check the cleaned dataset before proceeding
print(analysis_dataset.head())


KeyError: "['first_time_homebuyer_flag'] not in index"

### Exercise 10

The next step in our analysis is to convert our categorical variables to one-hot-encodings and use `train_test_split` to split our data.

To ensure replicability, **before** you `train_test_split` your data, please sort your data by `Loan Sequence Number`. This will ensure when we split the data with a random seed below, everyone will get the same split and the autograder will function.

You may create your one-hot-encodings however you wish, but I'm a fan of the [patsy library's](https://patsy.readthedocs.io/en/latest/overview.html) `dmatrices` function.

Hint: You should end up with 8 categorical variables, including some binary flags and `Number_of_Borrowers`, `Number_of_Units` (which you could argue should be continuous, but I think are better treated as categorical).

Store the number of observations in your final dataset in `ex10_num_obs`.

### Exercise 11

Use `train_test_split` from `sklearn.model_selection` to split the data. 

Before you do, Use `0.2` as the `test_size` and use `random_state=42`.

### Exercise 12

Now fit a `GradientBoostingClassifier` to the data (from `sklearn.ensemble`). Set `random_state=42`. using `roc_auc_score`, get your ROC AUC score against the test data. Store in `results` under the key `"ex12_roc_auc"`.

### Exercise 13

Use the `predict` method to generate a confusion matrix. What problem do you see with the result?

### Exercise 14

To address the problem from Exercise 13, use `.predict_proba()` to set your own threshold for classification. Your stakeholder is mostly concerned with False Negatives (mortgages classified as safe that actually are not), so use a 8% probability threshold to get a good balance of a low False Negative rate with a reasonable amount of mortgages still being considered "viable."

What is the False Negative rate at an 8% classification threshold from the model above?

Store the result under the key `"ex14_false_negative_rate"`.

### Exercise 15

Your stakeholder wants to by as many mortgages as it can while maintaining a delinquency rate of purchased mortgages below 5%. Based on your answer above, do you feel like your model can provide that level of performance?

## Now To The Future

The preceding analysis is precisely the type of analysis you would do if, in late 2006, you'd been asked to evaluate mortgage performance in the last two years for use going forward. So let's see how your model performs now!

In this [folder](https://github.com/nickeubank/MIDS_Data/tree/master/mortgages/2007) you will find data on mortgages originated in 2007 along with servicing data from 2007, 2008, and 2009.

### Exercise 16

Please load this data (again, from a URL to help the autograder) and clean it in the same manner as before. As a sanity check, how many observations do you have in the final dataset (after you've removed observations with missing values to allow you to generate predicted delinquency rates)? 

Store the final number of observations in `"ex16_num_obs"` and the share of those mortgages that are delinquent in `"ex16_share_delinquent"`


### Exercise 17




Had your stakeholder purchased mortgages using your model, what would the resulting False Negative rate have been? (e.g., compare the predicted values for mortgages using the model trained above with realized outcomes). Store your result under the key `"ex17_false_negative_rate"`


### Exercise 18

How did the retrospective performance of your model compare to its actual performance moving forward? Why? Did you stay below the 5% target for False Negatives set by the stakeholder?