The codes for data interpolation, data imbalance correction and principal component regression involved in the paper Dealing with the missing, imbalanced and sparse features problems in emergency medicine data: a machine learning approach are provided.
The program is developed in windows10 system (Intel (R) core (TM) i5-9500 CPU, 3GHz). All data preprocessing and model building were completed in Python programs (Python 3.8 Anaconda), using multiple Python data science libraries including Numpy, Panda, Matplotlib and Sklearn. See supplementary materials for relevant codes of data interpolation, imbalanced under-sampling, principal component regression and model training.