This project explores and analyzes the Building and Construction Industry Security of Payment (SOP) adjudication data to understand and predict dispute outcomes and adjudication amounts. The code and data files serve the following purposes:
- Preprocessing and Wrangling: Data from
adjudication.csv
,australian_postcodes.csv
, andpost_code_data.csv
is cleaned and transformed for analysis. - K-Nearest Neighbors (KNN) Prediction: Utilizes KNN to predict determination status of a claim using the mentioned datasets.
- Linear Regression Analysis: Applies linear regression to forecast the adjudication amount based on the claimed amount in
adjudication.csv
. - Visualization: Generates various visual representations to provide insights into the results.
adjudication.csv
: Contains data related to claims made to the VBA based on the SOP Act. Sourceaustralian_postcodes.csv
: Includes information on Australian postcodes. Sourcepost_code_data.csv
: Encompasses socioeconomic status data of Australian postcodes. Source
1a_data_wrangling.py
: Performs preprocessing and wrangling of the adjudication dataset.1b_normal_distribution_claimedamount.py
: Produces a normal distribution plot of claimed amounts.2_corr_matrix.py
: Generates a correlation matrix based on features of the adjudication dataset.3_knn.py
: Implements the KNN supervised learning model for prediction.4_lin_regression.py
: Deploys a linear regression supervised learning model for forecasting.
adjudication.csv
: Primary dataset for analysis.australian_postcodes.csv
andpost_code_data.csv
: Support datasets for in-depth analysis.
Execute the following code in Python, replacing "[document name]" with script file names in the order listed above:
python [document name]
- Python Version: 3.x
- Libraries Used:
- Pandas
- numPy
- sklearn
- matplotlib
- seaborn
- re
#Add table #Add thing where you can enter emperical data and it returns yes or no and why #Add the video