Gene Expression Analysis Project

This project involves the analysis of gene expression data to identify patterns and clusters related to different types of cancer. The analysis includes hierarchical clustering, k-means clustering, and k-medoids clustering methods to explore the relationships between gene expression profiles and cancer types.

Overview

The project aims to:

Identify gene expression patterns associated with different types of cancer.
Cluster genes based on their expression profiles to uncover potential biomarkers.
Evaluate the effectiveness of different clustering algorithms in grouping similar genes and cancer types.

Data

The gene expression data used in this project is sourced from the Golub et al. (1999) study, which includes gene expression profiles for patients with acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML). The dataset comprises gene expression levels for oncogenes and antigens.

Analysis Methods

Hierarchical Clustering

Hierarchical clustering is performed using single and complete linkage methods to group genes based on their expression profiles. The resulting dendrograms visualize the hierarchical relationships between genes.

K-means Clustering

K-means clustering is applied to identify clusters of genes with similar expression patterns. The optimal number of clusters is determined by evaluating the sum of squared errors (SSE) for different values of k.

K-medoids Clustering

K-medoids clustering, also known as partitioning around medoids (PAM), is utilized to cluster genes based on dissimilarity measures. The clustering results are compared with those obtained from k-means clustering to assess the effectiveness of each method.

Results and Interpretation

The clustering analyses reveal distinct groups of genes associated with different types of cancer. By examining the clusters and their gene composition, potential biomarkers for specific cancer types can be identified. Additionally, the comparison between hierarchical, k-means, and k-medoids clustering methods provides insights into their respective strengths and limitations in clustering gene expression data.

How to Use

To replicate the analysis:

Load the gene expression data.
Perform hierarchical clustering using single and complete linkage methods.
Apply k-means clustering with different values of k and evaluate SSE.
Conduct k-medoids clustering with appropriate dissimilarity measures.
Compare the clustering results and interpret the findings in the context of cancer biology.

Dependencies

R programming language
Required R packages: multtest, cluster, FNN

Credits

Golub et al. (1999) for providing the gene expression dataset.
R Core Team and contributors for developing the R programming language and associated packages.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
gene_expression_analysis.R		gene_expression_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gene Expression Analysis Project

Overview

Data

Analysis Methods

Hierarchical Clustering

K-means Clustering

K-medoids Clustering

Results and Interpretation

How to Use

Dependencies

Credits

About

Releases

Packages

Languages

njifack/gene-expression-analysis

Folders and files

Latest commit

History

Repository files navigation

Gene Expression Analysis Project

Overview

Data

Analysis Methods

Hierarchical Clustering

K-means Clustering

K-medoids Clustering

Results and Interpretation

How to Use

Dependencies

Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages