CZ4041---ML-SSL

CZ4041 Machine Learning -- Semi-Supervised Learning (Research Base Project)

Version Information

Python 3.6.2

Install python requirements

pip install -r requirements.txt

Description of files in root folder

dual-learning-ssl.py Python source code implementing the dual learning-based safe semi-supervised learning. It uses the nursery-ssl10 dataset in the data/ folder for SSL training. It generates .csv files in the result folder.

Description of files in data folder

1. `data\nursery-ssl10-10-1tra.csv`

This csv dataset file contains 10% labelled training data and 90% unlabelled training data. This dataset is used to train the initial SSL model and allow it to perform labelling for the 90% of unlabelled data. The SSL model will then be retrained on the dataset which it has just labelled.

2. `data\nursery-ssl10-10-1trs.csv`

This csv dataset contains all the labelled class information which is not available in the nursery-ssl10-10-1-tra.csv dataset. In other words, it is the answers key for the unlabelled data instances in training dataset. We use this to confirm the quality of our SSL model in labelling the unlabelled training data.

3. `data\nursery-ssl10-10-1tst.csv`

This csv dataset is the test dataset used to measure the accuracy of the SSL model implemented.

Description of files in result folder

1. `result\dataset-ssl-safe-unlabelled.csv`

This file is generated by dual-learning-ssl.py. It contains a safe subset of the data\nursery-ssl10-10-1tra.csv which are filtered by the dual-learning algorithm. Therefore, this dataset contains labelled and unlabelled data. This dataset can be safely used for training by any SSL algorithm.

2. `result\dataset-ssl-safe-w-real-y.csv`

This file is generated by dual-learning-ssl.py. The difference between dataset-ssl-safe-w-real-y.csv and dataset-ssl-safe-unlabelled.csv is that the former are all labelled while the latter contains unlabelled instances.

3. `result\dataset-ssl-safe.csv`

This file is generated by dual-learning-ssl.py and is a safe subset of the training dataset. The class label for each the unlabeled instances in this dataset was pre-labelled using a prediction from Regularised Least Squares model. Therefore the class label in this .csv file may not be 100% correct. Another supervised model can be trained on this dataset to complete the SSL model.

4. `result\dual-ssl-stats.csv`

This file is generated by dual-learning-ssl.py. It contains the statistics for risk calculation by the dual learning model. It calculate number of true positive and false positive for which a data instances is truly safe at a particular risk threshold. A data instance is defined as really safe when it is below the risk threshold, and the dual learning model prediction of its label is exactly the same as its real label.

5. `result\tst-dummy.csv`

This file is the dummy coded version of the data\nursery-ssl10-10-1tst.csv test dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
result		result
.gitignore		.gitignore
README.md		README.md
dual-learning-ssl.py		dual-learning-ssl.py
report.pdf		report.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CZ4041---ML-SSL

Version Information

Install python requirements

Description of files in root folder

Description of files in data folder

1. `data\nursery-ssl10-10-1tra.csv`

2. `data\nursery-ssl10-10-1trs.csv`

3. `data\nursery-ssl10-10-1tst.csv`

Description of files in result folder

1. `result\dataset-ssl-safe-unlabelled.csv`

2. `result\dataset-ssl-safe-w-real-y.csv`

3. `result\dataset-ssl-safe.csv`

4. `result\dual-ssl-stats.csv`

5. `result\tst-dummy.csv`

About

Releases

Packages

Languages

sohjunjie/CZ4041---ML-SSL

Folders and files

Latest commit

History

Repository files navigation

CZ4041---ML-SSL

Version Information

Install python requirements

Description of files in root folder

Description of files in data folder

1. data\nursery-ssl10-10-1tra.csv

2. data\nursery-ssl10-10-1trs.csv

3. data\nursery-ssl10-10-1tst.csv

Description of files in result folder

1. result\dataset-ssl-safe-unlabelled.csv

2. result\dataset-ssl-safe-w-real-y.csv

3. result\dataset-ssl-safe.csv

4. result\dual-ssl-stats.csv

5. result\tst-dummy.csv

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. `data\nursery-ssl10-10-1tra.csv`

2. `data\nursery-ssl10-10-1trs.csv`

3. `data\nursery-ssl10-10-1tst.csv`

1. `result\dataset-ssl-safe-unlabelled.csv`

2. `result\dataset-ssl-safe-w-real-y.csv`

3. `result\dataset-ssl-safe.csv`

4. `result\dual-ssl-stats.csv`

5. `result\tst-dummy.csv`

Packages