Biological Data Project: Protein Domain Family Characterization

This repository contains the code, models, data, and documentation for the project "Functional and Structural Characterization of a Protein Domain Family". The focus of the project is the Lipase/Vitellogenin domain family (Pfam ID: PF00151), specifically in Homo sapiens. The analysis integrates model building, functional characterization, and motif identification.

Repository Structure

Function/: Contains the outputs generated by the notebook, such as processed data and results.
Model/:
- Building/: Contains outputs related to the PSSM and HMM model construction.
- Evaluation/: Contains evaluation results of the models against SwissProt annotations.
Taxonomy/: Taxonomic analysis outputs, including lineage data and phylogenetic trees.
Motifs/: Outputs related to motif discovery within disordered regions.
BD Report/: final PDF of the project report.

Project Overview

Objective

The aim of the project is to characterize the Lipase/Vitellogenin domain family by:

Constructing sequence models (PSSM and HMM).
Evaluating these models against SwissProt database annotations.
Analyzing taxonomic distribution, functional enrichment, and motif conservation.

Key Features

Sequence Models: Built using PSI-BLAST and HMMER based on a representative sequence.
Taxonomy Analysis: Phylogenetic insights into protein distribution across species.
Functional Insights: Gene Ontology enrichment analysis for biological functions.
Motif Discovery: Identification of conserved motifs within disordered regions using ProSite and ELM patterns.

Usage

Prerequisites

Software:
- NCBI-BLAST+
- HMMER
- Clustal Omega
- JalView
- Python 3.x
Databases:

Steps to Reproduce

Clone the repository:

git clone https://github.com/maloooon/Protein_Domain_Family_Characterization.git
cd Protein_Domain_Family_Characterization

Open the Jupyter Notebook:

jupyter notebook BD_Project_notebook.ipynb

Follow the steps in the notebook:
- Model Building: Generate PSSM and HMM models.
- Evaluation: Assess the models against SwissProt annotations.
- Taxonomy Analysis: Visualize lineage distribution.
- Motif Discovery: Identify conserved motifs in disordered regions.

Results

Model Performance: PSSM and HMM models evaluated for precision, recall, and other metrics.
Taxonomic Insights: Visualization of domain conservation across species.
Functional Enrichment: Key biological processes identified via GO annotations.
Motif Discovery: Conserved motifs linked to functional hotspots.

Contributions

This project was developed by:

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Biological Data Project: Protein Domain Family Characterization

Repository Structure

Project Overview

Objective

Key Features

Usage

Prerequisites

Steps to Reproduce

Results

Contributions

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
BD Report		BD Report
Function		Function
Model		Model
Motifs		Motifs
Project		Project
Taxonomy		Taxonomy
.DS_Store		.DS_Store
.gitignore		.gitignore
BD_Project_notebook.ipynb		BD_Project_notebook.ipynb
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Biological Data Project: Protein Domain Family Characterization

Repository Structure

Project Overview

Objective

Key Features

Usage

Prerequisites

Steps to Reproduce

Results

Contributions

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages