ForestSubtype

An integrated learning approach for cancer subtype classification of breast cancer using the TCGA public high-dimensional dataset.

Code Organization

1.preprocessing

idconvert.zip #Breast cancer data from the Sangerbox 3.0 platform. Remember to unzip the dataset when using it.
breastPam50Classification.R #PAM50 roughly layered.

2.ProjectCode

comparison #This catalogue is a comparative experiment of the sparesk method.
dataset #The catalogue is a data set.
imageResults #This directory contains the processing results.
model #This directory contains the AE autoencoder training parameters.
processData #The catalogue loads data and processes it.
sClass #The catalogue contains supervised models, reviews of machine learning methods and selected TOP features learned.
usClass #This directory contains the AE Autoencoder model.
flag.py #Parameter settings
main_jupyter.ipynb #A priori space to guide the extraction of TOP features.
main.py #Discovery of cancer subtypes.
MainSteps.py #main.py method wrapper.

Requirements

Python
R

Use the software

Data format: filename.csv file.
Data set description:

X: Numerical matrix. Data other than the first two columns.Each row is a sample and each column is a gene.

Y: Numerical vector. Column PAM50, i.e. column 0. The i-th element indicates the class to which the i-th sample belongs.

Parameters：

DATA_DIR   #Dataset Storage Directory
DATA_FILE     #Dataset  breast_1211_23900.csv、breast_2133_20000.csv
standardization_way   # Standardized approach
isOneHot  #is onthot,1 is,0 not is
RESULT_DIR  # Visualization of results storage paths

'''Clustering'''
DATA_CLUSTER_DIR  # top feature data subset storage directory
DATA_CLUSTER_FILE # top feature data subset samples
DATA_CLUSTER_FILE_y # top feature data subset labels
n_clusters  #Number of clusters

'''AE'''
IN_OUT_SIZES  # Number of input and output neurons
Encoder_hidden_size  # Encoder hidden layer
hidden_size  # Number of neurons in the core layer
Decoder_hidden_size  # Decoder hidden layer
AE_epochs  # Number of model training rounds
AE_batch_size  # batchSize

Run the program：

First, unpack the breast_1211_23900 dataset located in dataset.

Then, run main_jupyter.ipynb, and finally run main.py.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
ProjectCode/ten2DatasetClass_up		ProjectCode/ten2DatasetClass_up
preprocessing		preprocessing
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ForestSubtype

Code Organization

1.preprocessing

2.ProjectCode

Requirements

Use the software

Data set description:

Parameters：

Run the program：

About

Releases

Packages

Languages

lffyd/ForestSubtype

Folders and files

Latest commit

History

Repository files navigation

ForestSubtype

Code Organization

1.preprocessing

2.ProjectCode

Requirements

Use the software

Data set description:

Parameters：

Run the program：

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages