# Team Meeting Update

## Objective
To standardize historical SIC and NAICS codes in the dataset to the 2022 NAICS standard for consistency and completeness.

---

## Approach

1. **SIC-to-NAICS Conversion:**
   - Where `NAICS_CODE` is missing but `SIC_CODE` is available, map `SIC_CODE` to `NAICS_CODE` using the 1987 SIC-to-2002 NAICS concordance.

2. **NAICS Code Harmonization:**
   - Sequentially process NAICS concordance tables (e.g., 2002→2007, 2007→2012) to iteratively map older NAICS codes to the latest 2022 NAICS codes.

3. **Final Mapping Application:**
   - Replace all NAICS codes in the dataset with their harmonized 2022 equivalents.

---

## Outcome
The dataset now has harmonized NAICS codes aligned to the 2022 standard, ensuring consistency across historical records. Unmapped codes, if any, are flagged for review.


In [1]:
import pandas as pd
import numpy as np 
import os 

from config_management import UnifiedConfiguration
from raw_processing import (
    cehd_cleaning, 
    usis_cleaning,
    osha_cleaning,
    cehd_processing,
    usis_processing,
    osha_processing
)
import data_management
import plot

config = UnifiedConfiguration()

In [2]:
usis_cleaner = usis_cleaning.UsisCleaner(config.usis, config.path, config.comptox)
cehd_cleaner = cehd_cleaning.CehdCleaner(config.cehd, config.path, config.comptox)

In [3]:
usis_data = usis_cleaner.clean_exposure_data()
cehd_data = cehd_cleaner.clean_exposure_data()

In [9]:
usis_data.reset_index().to_feather(config.path['test_usis_file'])
cehd_data.reset_index().to_feather(config.path['test_cehd_file'])

In [3]:
usis_data = pd.read_feather(config.path['test_usis_file']).set_index('index')
cehd_data = pd.read_feather(config.path['test_cehd_file']).set_index('index')

In [4]:
# 1. 
combined_targets = osha_processing.combined_targets_from_raw(
    config.usis, 
    config.cehd, 
    config.path, 
    comptox_settings=config.comptox,
    write_dir=None # config.path['target_dir']
)

In [10]:
# 2.
combined_targets = osha_processing.combined_targets_from_data(
    usis_data, 
    cehd_data, 
    config.usis, 
    config.cehd, 
    write_dir=None # config.path['target_dir']
)

In [13]:
combined_test_targets = data_management.read_targets(config.path['target_dir'])

for k, y in combined_targets.items():
    print(k)
    y_test = combined_test_targets[k]
    print(pd.testing.assert_series_equal(y, y_test, check_index=False))
    break

sector
None
