# Creating necessary files for library

## Weight files

CMS provides the coefficients for the categories in a SAS CPORT file, which is a proprietary file format not documented openly by SAS. Thus, the only
way to see the coefficients (also referred to as weights) for the categories is use the Final Announcement documents that contain the weights in tables. The announcement documents can be
found on [CMS's website](https://www.cms.gov/medicare/payment/medicare-advantage-rates-statistics/announcements-and-documents)

One important things to be aware of is that when CMS uses the same model in for multiple years without any coefficient changes, they do not republish the information in each year's announcement document. Thus, you must go back to the first time the model was used to find the information. To help make it concrete, consider the CMS Model Version V24. It was first used in Payment Year (PY) 2020 and is seen in the [2020 Final Notice](https://www.cms.gov/medicare/health-plans/medicareadvtgspecratestats/announcements-and-documents-items/2020announcement). It was also used in [PY 2021](https://www.cms.gov/medicarehealth-plansmedicareadvtgspecratestatsannouncements-and-documents/2021-0), [PY2022](https://www.cms.gov/medicarehealth-plansmedicareadvtgspecratestatsannouncements-and-documents/2022), [PY2023](https://www.cms.gov/medicarehealth-plansmedicareadvtgspecratestatsannouncements-and-documents/2023), etc. however none of the announcement documents linked contain coefficients for V24 as they did not change from the original implementation in 2020.

To minimize manual work, the [tabula-py](https://pypi.org/project/tabula-py/) library will be used to read table objects from PDFs. It requires Java to be installed, please follow instructions linked in official documentation to get tabula-py working.

Key Assumptions:
1. The headers of the tables are identitified and outlined in column_dict
2. Manual editing is sometimes still needed based on the table layout, thus output file should be checked.
3. Might require lots of editing/customization based on PDF

In [8]:
import tabula
import pandas as pd
from pathlib import Path

# Specify the path to your PDF file
file_path = Path.cwd() / 'cms_files'
pdf_file = "Announcement2020.pdf"

# Use tabula to extract tables from the PDF
# You have to open the PDF up to find the relevant pages
tables = tabula.read_pdf(file_path / pdf_file, pages="75-80", multiple_tables=True)

for i, table in enumerate(tables, start=1):
    df = table
    if i == 1:
        combined_df = df.copy()
    else:
        combined_df = pd.concat([combined_df, df], ignore_index=True)

In [9]:
combined_df

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,"Community,","Community,.1","Community,.2","Community,.3","Community,.4","Community,.5",Unnamed: 3
0,,Variable,Description Label,"NonDual,","NonDual,","FBDual,","FBDual,","PBDual,","PBDual,",Institutional
1,,,,Aged,Disabled,Aged,Disabled,Aged,Disabled,
2,Medicaid,,,-,-,-,-,-,-,0.061
3,"Originally Disabled, Female",,,0.250,-,0.173,-,0.136,-,-
4,"Originally Disabled, Male",,,0.147,-,0.182,-,0.083,-,-
...,...,...,...,...,...,...,...,...,...,...
150,Payment HCC Counts,,,,,,,,,
151,D1,1 payment HCC,-,-,-,-,-,-,-,
152,D2,2 payment HCCs,-,-,-,-,-,-,-,
153,D3,3 payment HCCs,-,-,-,-,-,-,-,


In [None]:
# This may change as needed based on the headers
column_dict = {
    'Variable': 'category',
    'Description Label': 'description',
    'Community,NonDual,Aged': 'CNA',
    'Community,NonDual,Disabled': 'CND',
    'Community,FBDual,Aged': 'CFA',
    'Community,FBDual,Disabled': 'CFD',
    'Community,PBDual,Aged': 'CPA',
    'Community,PBDual,Disabled': 'CPD',
    'Institutional': 'INS'
}
final_df = (
    combined_df
    .rename(columns=lambda x: x.replace('\r', ''))
    .rename(columns=column_dict)
)
final_df['category'] = final_df.apply(lambda x: x['category'].replace('\r', ''), axis=1)
# final_df['description'] = final_df.apply(lambda x: x['description'].replace('\r', ''), axis=1)
final_df.to_csv(file_path / 'v24_weights.txt', index=False, header=True, sep='|')

## Category Definition files

CMS provides a SAS file that contains the category definitions (also called label definitions). However, this file only contains the category definitions for "disease HCCs", in other words HCCs that have a number. It does not contain information for demographic categories or disease interactions. These exist in other SAS files as part of the macro and need to be extracted manually.

The code below reads in that file, and parses it to a YAML. It assumes the file contents look like the below.


## V24 Category File

In [8]:
import re
import yaml
from pathlib import Path

# Define category lists
demo_category_list = [
        'F0_34', 'F35_44', 'F45_54', 'F55_59', 'F60_64',
        'F65_69', 'F70_74', 'F75_79', 'F80_84', 'F85_89', 
        'F90_94', 'F95_GT', 'M0_34', 'M35_44', 'M45_54', 'M55_59', 'M60_64',
        'M65_69', 'M70_74', 'M75_79', 'M80_84', 'M85_89', 
        'M90_94', 'M95_GT'
    ]
nedemo_category_list = [
    'NEF0_34', 'NEF35_44', 'NEF45_54', 'NEF55_59', 'NEF60_64',
        'NEF65', 'NEF66', 'NEF67', 'NEF68', 'NEF69', 'NEF70_74', 
        'NEF75_79', 'NEF80_84', 'NEF85_89', 'NEF90_94', 'NEF95_GT',
        'NEM0_34', 'NEM35_44', 'NEM45_54', 'NEM55_59', 'NEM60_64',
        'NEM65', 'NEM66', 'NEM67', 'NEM68', 'NEM69', 'NEM70_74', 
        'NEM75_79', 'NEM80_84', 'NEM85_89', 'NEM90_94', 'NEM95_GT'
]
demo_interaction_dict = {
    'LTIMCAID': {'descr': 'Medicaid', 'type': 'demographic_interaction'},
    'OriginallyDisabled_Female': {'descr': 'Originally Disabled, Female', 'type': 'demographic_interaction'},
    'OriginallyDisabled_Male': {'descr': 'Originally Disabled, Male', 'type': 'demographic_interaction'}
}
disease_interaction_dict = {
    'HCC47_gCancer': {'descr': 'Immune Disorders*Cancer', 'type': 'disease_interaction'},
    'DIABETES_CHF': {'descr': 'Congestive Heart Failure*Diabetes', 'type': 'disease_interaction'},
    'CHF_gCopdCF': {'descr': 'Congestive Heart Failure*Chronic Obstructive Pulmonary Disease', 'type': 'disease_interaction'},
    'HCC85_gRenal_V24': {'descr': 'Congestive Heart Failure*Renal', 'type': 'disease_interaction'},
    'gCopdCF_CARD_RESP_FAIL': {'descr': 'Cardiorespiratory Failure*Chronic Obstructive Pulmonary Disease', 'type': 'disease_interaction'},
    'HCC85_HCC96': {'descr': 'Congestive Heart Failure*Specified Heart Arrhythmias', 'type': 'disease_interaction'},
    'gSubstanceUseDisorder_gPsych': {'descr': 'Substance Use Disorder*Psychiatric', 'type': 'disease_interaction'},
    'SEPSIS_PRESSURE_ULCER': {'descr': 'Sepsis*Pressure Ulcer', 'type': 'disease_interaction'},
    'SEPSIS_ARTIF_OPENINGS': {'descr': 'Sepsis*Artificial Openings for Feeding or Elimination', 'type': 'disease_interaction'},
    'ART_OPENINGS_PRESS_ULCER': {'descr': 'Artificial Openings for Feeding or Elimination*Pressure Ulcer', 'type': 'disease_interaction'},
    'gCopdCF_ASP_SPEC_B_PNEUM': {'descr': 'Chronic Obstructive Pulmonary Disease*Aspiration and Specified Bacterial Pneumonias', 'type': 'disease_interaction'},
    'ASP_SPEC_B_PNEUM_PRES_ULC': {'descr': 'Aspiration and Specified Bacterial Pneumonias*Pressure Ulcer', 'type': 'disease_interaction'},
    'SEPSIS_ASP_SPEC_BACT_PNEUM': {'descr': 'Sepsis*Aspiration and Specified Bacterial Pneumonias', 'type': 'disease_interaction'},
    'SCHIZOPHRENIA_gCopdCF': {'descr': 'Schizophrenia*Chronic Obstructive Pulmonary Disease', 'type': 'disease_interaction'},
    'SCHIZOPHRENIA_CHF': {'descr': 'Schizophrenia*Congestive Heart Failure', 'type': 'disease_interaction'},
    'SCHIZOPHRENIA_SEIZURES': {'descr': 'Schizophrenia*Seizure Disorders and Convulsions', 'type': 'disease_interaction'},
    'DISABLED_HCC85': {'descr': 'Disabled, Congestive Heart Failure', 'type': 'disease_interaction'},
    'DISABLED_PRESSURE_ULCER': {'descr': 'Disabled, Pressure Ulcer', 'type': 'disease_interaction'},
    'DISABLED_HCC161': {'descr': 'Disabled, Chronic Ulcer of the Skin, Except Pressure Ulcer', 'type': 'disease_interaction'},
    'DISABLED_HCC39': {'descr': 'Disabled, Bone/Joint Muscle Infections/Necrosis', 'type': 'disease_interaction'},
    'DISABLED_HCC77': {'descr': 'Disabled, Multiple Sclerosis', 'type': 'disease_interaction'},
    'DISABLED_HCC6': {'descr': 'Disabled, Opportunistic Infections', 'type': 'disease_interaction'},
}

cat_dict = {}
demo_category_dict = {}
demo_pttrn = r'[MF]\d{1,2}_?|[MF]6[5-9]|OriginallyDisabled_[MF]|LTIMCAID|ORIGDS'

# Helper function
def replace_chars(input_string, chars_to_replace, replacement):
    for char in chars_to_replace:
        input_string = input_string.replace(char, replacement)
    return input_string
    
# Get demo categories
for category in demo_category_list:
    x = re.search(demo_pttrn, category)
    # print(type(x.group(0)))
    # print(x.group(0))
    # print(x.group(0)[-1])
    first_age = replace_chars(x.group(0), ['F', 'M', '_'], '')
    last_age = category[category.find('_')+1:]
    if category.startswith('F'):
        gender = 'Female'
    else:
        gender = 'Male'
    if last_age == 'GT':
        demo_category_dict[category] = {'descr': f'{gender}, {first_age} Years old or Over', 'type': 'demographic'}
    else:
        demo_category_dict[category] = {'descr': f'{gender}, {first_age} to {last_age} Years old', 'type': 'demographic'}

for category in nedemo_category_list:
    x = re.search(demo_pttrn, category)
    # print(type(x.group(0)))
    # print(x.group(0))
    # print(x.group(0)[-1])
    first_age = replace_chars(x.group(0), ['F', 'M', '_'], '')
    last_age = category[category.find('_')+1:]
    if category.startswith('NEF'):
        gender = 'Female'
    else:
        gender = 'Male'
    if last_age == 'GT':
        demo_category_dict[category] = {'descr': f'New Enrollee, {gender}, {first_age} Years old or Over', 'type': 'demographic'}
    else:
        if 'NE' in last_age:
            demo_category_dict[category] = {'descr': f'New Enrollee, {gender}, {first_age} Years old', 'type': 'demographic'}
        else:
            demo_category_dict[category] = {'descr': f'New Enrollee, {gender}, {first_age} to {last_age} Years old', 'type': 'demographic'}
        

# update cat dict
cat_dict.update(demo_category_dict)
cat_dict.update(demo_interaction_dict)
cat_dict.update(disease_interaction_dict)



In [9]:
# Disease categories

# This assumes the file is in the current directory, change to the appropriate location
file_path = Path.cwd() / 'cms_files'
# Will need to change the below as appropriate
input_file_name = 'V24H86L1.txt'
output_file_name = 'v24_category_definition.yaml'

with open(file_path / input_file_name, mode="r") as file:
    lines = file.readlines()
hcc_cat_dict = {}

for line in lines:
    cat_code = re.search(r'HCC\d+', line)
    if cat_code:
        s = re.split(r'HCC\d+', line)
        descr = s[1].replace('=', '').replace('"', "").strip()
        number = cat_code[0].split('HCC')[1]
        hcc_cat_dict[cat_code[0]] = {
            'descr': descr, 
            'number': int(number),
            'type': 'disease'
        }

cat_dict.update(hcc_cat_dict)



cat_dict

{'F0_34': {'descr': 'Female, 0 to 34 Years old', 'type': 'demographic'},
 'F35_44': {'descr': 'Female, 35 to 44 Years old', 'type': 'demographic'},
 'F45_54': {'descr': 'Female, 45 to 54 Years old', 'type': 'demographic'},
 'F55_59': {'descr': 'Female, 55 to 59 Years old', 'type': 'demographic'},
 'F60_64': {'descr': 'Female, 60 to 64 Years old', 'type': 'demographic'},
 'F65_69': {'descr': 'Female, 65 to 69 Years old', 'type': 'demographic'},
 'F70_74': {'descr': 'Female, 70 to 74 Years old', 'type': 'demographic'},
 'F75_79': {'descr': 'Female, 75 to 79 Years old', 'type': 'demographic'},
 'F80_84': {'descr': 'Female, 80 to 84 Years old', 'type': 'demographic'},
 'F85_89': {'descr': 'Female, 85 to 89 Years old', 'type': 'demographic'},
 'F90_94': {'descr': 'Female, 90 to 94 Years old', 'type': 'demographic'},
 'F95_GT': {'descr': 'Female, 95 Years old or Over', 'type': 'demographic'},
 'M0_34': {'descr': 'Male, 0 to 34 Years old', 'type': 'demographic'},
 'M35_44': {'descr': 'Male, 3

In [12]:
import json
with open(file_path / "v24_category.json", "w") as json_file:
    json.dump(cat_dict, json_file)

## V28 Category File

In [2]:
import re
import yaml
from pathlib import Path

# Define category lists
demo_category_list = [
        'F0_34', 'F35_44', 'F45_54', 'F55_59', 'F60_64',
        'F65_69', 'F70_74', 'F75_79', 'F80_84', 'F85_89', 
        'F90_94', 'F95_GT', 'M0_34', 'M35_44', 'M45_54', 'M55_59', 'M60_64',
        'M65_69', 'M70_74', 'M75_79', 'M80_84', 'M85_89', 
        'M90_94', 'M95_GT'
    ]
nedemo_category_list = [
    'NEF0_34', 'NEF35_44', 'NEF45_54', 'NEF55_59', 'NEF60_64',
        'NEF65', 'NEF66', 'NEF67', 'NEF68', 'NEF69', 'NEF70_74', 
        'NEF75_79', 'NEF80_84', 'NEF85_89', 'NEF90_94', 'NEF95_GT',
        'NEM0_34', 'NEM35_44', 'NEM45_54', 'NEM55_59', 'NEM60_64',
        'NEM65', 'NEM66', 'NEM67', 'NEM68', 'NEM69', 'NEM70_74', 
        'NEM75_79', 'NEM80_84', 'NEM85_89', 'NEM90_94', 'NEM95_GT'
]
demo_interaction_dict = {
    'LTIMCAID': {'descr': 'Medicaid', 'type': 'demographic_interaction'},
    'OriginallyDisabled_Female': {'descr': 'Originally Disabled, Female', 'type': 'demographic_interaction'},
    'OriginallyDisabled_Male': {'descr': 'Originally Disabled, Male', 'type': 'demographic_interaction'}
}
disease_interaction_dict = {
    'DIABETES_HF_V28': {'descr': 'Diabetes*Heart Failure', 'type': 'disease_interaction'},
    'HF_CHR_LUNG_V28': {'descr': 'Heart Failure*Chronic Lung Disorder', 'type': 'disease_interaction'},
    'HF_KIDNEY_V28': {'descr': 'Heart Failure*Kidney', 'type': 'disease_interaction'},
    'CHR_LUNG_CARD _RESP_FAIL_V28': {'descr': 'Chronic Lung Disorder*Cardiorespiratory Failure', 'type': 'disease_interaction'},
    'HF_HCC238_V28': {'descr': 'Cardiorespiratory Failure*Chronic Obstructive Pulmonary Disease', 'type': 'disease_interaction'},
    'gSubUseDisorder_gPsych_V28': {'descr': 'Substance Use Disorder*Psychiatric', 'type': 'disease_interaction'},
    'DISABLED_CANCER_V28': {'descr': 'Disabled, Cancer', 'type': 'disease_interaction'},
    'DISABLED_NEURO_V28': {'descr': 'Disabled, Neurological', 'type': 'disease_interaction'},
    'DISABLED_HF_V28': {'descr': 'Disabled, Heart Failure', 'type': 'disease_interaction'},
    'DISABLED_CHR_LUNG_V28': {'descr': 'Disabled, Chronic Lung Disorder', 'type': 'disease_interaction'},
    'DISABLED_ULCER_V28': {'descr': 'Disabled, Skin Ulcer', 'type': 'disease_interaction'},
    'D1': {'descr': '1 payment HCCs', 'type': 'disease_count'},
    'D2': {'descr': '2 payment HCCs', 'type': 'disease_count'},
    'D3': {'descr': '3 payment HCCs', 'type': 'disease_count'},
    'D4': {'descr': '4 payment HCCs', 'type': 'disease_count'},
    'D5': {'descr': '5 payment HCCs', 'type': 'disease_count'},
    'D6': {'descr': '6 payment HCCs', 'type': 'disease_count'},
    'D7': {'descr': '7 payment HCCs', 'type': 'disease_count'},
    'D8': {'descr': '8 payment HCCs', 'type': 'disease_count'},
    'D9': {'descr': '9 payment HCCs', 'type': 'disease_count'},
    'D10P': {'descr': '10 or more payment HCCs', 'type': 'disease_count'},
}

cat_dict = {}
demo_category_dict = {}
demo_pttrn = r'[MF]\d{1,2}_?|[MF]6[5-9]|OriginallyDisabled_[MF]|LTIMCAID|ORIGDS'

# Helper function
def replace_chars(input_string, chars_to_replace, replacement):
    for char in chars_to_replace:
        input_string = input_string.replace(char, replacement)
    return input_string
    
# Get demo categories
for category in demo_category_list:
    x = re.search(demo_pttrn, category)
    # print(type(x.group(0)))
    # print(x.group(0))
    # print(x.group(0)[-1])
    first_age = replace_chars(x.group(0), ['F', 'M', '_'], '')
    last_age = category[category.find('_')+1:]
    if category.startswith('F'):
        gender = 'Female'
    else:
        gender = 'Male'
    if last_age == 'GT':
        demo_category_dict[category] = {'descr': f'{gender}, {first_age} Years old or Over', 'type': 'demographic'}
    else:
        demo_category_dict[category] = {'descr': f'{gender}, {first_age} to {last_age} Years old', 'type': 'demographic'}

for category in nedemo_category_list:
    x = re.search(demo_pttrn, category)
    # print(type(x.group(0)))
    # print(x.group(0))
    # print(x.group(0)[-1])
    first_age = replace_chars(x.group(0), ['F', 'M', '_'], '')
    last_age = category[category.find('_')+1:]
    if category.startswith('NEF'):
        gender = 'Female'
    else:
        gender = 'Male'
    if last_age == 'GT':
        demo_category_dict[category] = {'descr': f'New Enrollee, {gender}, {first_age} Years old or Over', 'type': 'demographic'}
    else:
        if 'NE' in last_age:
            demo_category_dict[category] = {'descr': f'New Enrollee, {gender}, {first_age} Years old', 'type': 'demographic'}
        else:
            demo_category_dict[category] = {'descr': f'New Enrollee, {gender}, {first_age} to {last_age} Years old', 'type': 'demographic'}
        

# update cat dict
cat_dict.update(demo_category_dict)
cat_dict.update(demo_interaction_dict)
cat_dict.update(disease_interaction_dict)


In [3]:
# Disease categories

# This assumes the file is in the current directory, change to the appropriate location
file_path = Path.cwd() / 'cms_files'
# Will need to change the below as appropriate
input_file_name = 'V28115L3.txt'

with open(file_path / input_file_name, mode="r") as file:
    lines = file.readlines()
hcc_cat_dict = {}

for line in lines:
    cat_code = re.search(r'HCC\d+', line)
    if cat_code:
        s = re.split(r'HCC\d+', line)
        descr = s[1].replace('=', '').replace('"', "").strip()
        number = cat_code[0].split('HCC')[1]
        hcc_cat_dict[cat_code[0]] = {
            'descr': descr, 
            'number': int(number),
            'type': 'disease'
        }

cat_dict.update(hcc_cat_dict)

In [4]:
cat_dict

{'F0_34': {'descr': 'Female, 0 to 34 Years old', 'type': 'demographic'},
 'F35_44': {'descr': 'Female, 35 to 44 Years old', 'type': 'demographic'},
 'F45_54': {'descr': 'Female, 45 to 54 Years old', 'type': 'demographic'},
 'F55_59': {'descr': 'Female, 55 to 59 Years old', 'type': 'demographic'},
 'F60_64': {'descr': 'Female, 60 to 64 Years old', 'type': 'demographic'},
 'F65_69': {'descr': 'Female, 65 to 69 Years old', 'type': 'demographic'},
 'F70_74': {'descr': 'Female, 70 to 74 Years old', 'type': 'demographic'},
 'F75_79': {'descr': 'Female, 75 to 79 Years old', 'type': 'demographic'},
 'F80_84': {'descr': 'Female, 80 to 84 Years old', 'type': 'demographic'},
 'F85_89': {'descr': 'Female, 85 to 89 Years old', 'type': 'demographic'},
 'F90_94': {'descr': 'Female, 90 to 94 Years old', 'type': 'demographic'},
 'F95_GT': {'descr': 'Female, 95 Years old or Over', 'type': 'demographic'},
 'M0_34': {'descr': 'Male, 0 to 34 Years old', 'type': 'demographic'},
 'M35_44': {'descr': 'Male, 3

In [5]:
import json
with open(file_path / "v28_category.json", "w") as json_file:
    json.dump(cat_dict, json_file)

## Hierarchy Definition Files

CMS provides a SAS file that contains the hieararchy definitions: The name of the hierarchy, and what categories are dropped by the "higher" category.

The code below reads in that file, and parses it to a YAML. It assumes the file contents look like the below.
```sas
%MACRO V24H86H1; 
 %**********************************************************************
 1  MACRO NAME: V24H86H1
 2  PURPOSE:    HCC HIERARCHIES: version 24 of HCCs,
                only 86 CMS HCCs are included
 3  COMMENT:    it is assumed that:
                 -MAX number of CCs are placed into global macro 
                  variable N_CC in the main program
                 -the following arrays are set in the main program
                  ARRAY C(&N_CC)   CC1-CC&N_CC
                  ARRAY HCC(&N_CC) HCC1-HCC&N_CC
                 -format ICD to CC creates only 86 out of &N_CC CMS CCs
 **********************************************************************;
 %* set to 0 HCCs in HIER parameter;
 %MACRO SET0( CC=, HIER= );
     IF HCC&CC=1 THEN DO I = &HIER; HCC(I) = 0; END;
 %MEND SET0;

 %*to copy CC into HCC;
  DO K=1 TO &N_CC;
     HCC(K)=C(K);
  END;

 %*imposing hierarchies;
 /*Neoplasm 1 */   %SET0(CC=8     , HIER=%STR(9, 10, 11, 12 ));
 /*Neoplasm 2 */   %SET0(CC=9     , HIER=%STR(10, 11, 12 ));
 /*Neoplasm 3 */   %SET0(CC=10    , HIER=%STR(11, 12 ));
 /*Neoplasm 4 */   %SET0(CC=11    , HIER=%STR(12 ));
 /*Diabetes 1 */   %SET0(CC=17    , HIER=%STR(18, 19 ));
 
```
Note: It was discovered for V28, the hierarchies sometimes spanned multiple lines, see below. This needed to be manual edited to one line
for the code to work.
```sas
 /*Spinal 2 */     %SET0(CC=181   , HIER=%STR(182, 254 ));
 /*Neuro 2 */      %SET0(CC=191   , HIER=%STR(
180, 181, 182, 192, 253, 254 ));
 /*Neuro 3 */      %SET0(CC=192   , HIER=%STR(180, 181, 182, 253, 254
));
 /*Neuro 6 */      %SET0(CC=195   , HIER=%STR(196 ));
```


In [1]:
import re
import json
from pathlib import Path

hier_dict = {}
# This assumes the file is in the current directory, change to the appropriate location
file_path = Path.cwd() / 'cms_files'
# Will need to change the below as appropriate
input_file_name = 'V28115H1.TXT'
output_file_name = 'v28_hierarchy_definition.yaml'

with open(file_path / input_file_name, mode="r") as file:
    h_lines = file.readlines()
for line in h_lines:
    cc = re.search(r'CC=\d+', line)
    hier = re.search(r'HIER=%STR', line)
    if cc and hier:
        # Get Category Number
        number = cc[0].split('=')[1]
        # Get the Drop Numbers
        drop_string = line.split('HIER=')[1]
        s = re.search(r'\((.*?)\)', drop_string)
        drop_list_s = re.findall(r'\d+', s[0])
        drop_numbers = ['HCC'+str(eval(i)) for i in drop_list_s]
        # Get the hierarchy name
        hier_descr = re.search(r'\*(.*?)\*', line)[0].replace('*', '').strip()
        # Populate the dict
        hier_dict['HCC'+number] = {'descr': hier_descr, 'remove_code': drop_numbers}


import json
with open(file_path / 'v28_hierarchy.json', "w") as json_file:
    json.dump(hier_dict, json_file)