Skip to content

maloooon/Protein_Domain_Family_Characterization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

101 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Biological Data Project: Protein Domain Family Characterization

This repository contains the code, models, data, and documentation for the project "Functional and Structural Characterization of a Protein Domain Family". The focus of the project is the Lipase/Vitellogenin domain family (Pfam ID: PF00151), specifically in Homo sapiens. The analysis integrates model building, functional characterization, and motif identification.


Repository Structure

  • Function/: Contains the outputs generated by the notebook, such as processed data and results.
  • Model/:
    • Building/: Contains outputs related to the PSSM and HMM model construction.
    • Evaluation/: Contains evaluation results of the models against SwissProt annotations.
  • Taxonomy/: Taxonomic analysis outputs, including lineage data and phylogenetic trees.
  • Motifs/: Outputs related to motif discovery within disordered regions.
  • BD Report/: final PDF of the project report.

Project Overview

Objective

The aim of the project is to characterize the Lipase/Vitellogenin domain family by:

  1. Constructing sequence models (PSSM and HMM).
  2. Evaluating these models against SwissProt database annotations.
  3. Analyzing taxonomic distribution, functional enrichment, and motif conservation.

Key Features

  • Sequence Models: Built using PSI-BLAST and HMMER based on a representative sequence.
  • Taxonomy Analysis: Phylogenetic insights into protein distribution across species.
  • Functional Insights: Gene Ontology enrichment analysis for biological functions.
  • Motif Discovery: Identification of conserved motifs within disordered regions using ProSite and ELM patterns.

Usage

Prerequisites

Steps to Reproduce

  1. Clone the repository:
    git clone https://github.com/maloooon/Protein_Domain_Family_Characterization.git
    cd Protein_Domain_Family_Characterization
  2. Open the Jupyter Notebook:
    jupyter notebook BD_Project_notebook.ipynb
  3. Follow the steps in the notebook:
    • Model Building: Generate PSSM and HMM models.
    • Evaluation: Assess the models against SwissProt annotations.
    • Taxonomy Analysis: Visualize lineage distribution.
    • Motif Discovery: Identify conserved motifs in disordered regions.

Results

  • Model Performance: PSSM and HMM models evaluated for precision, recall, and other metrics.
  • Taxonomic Insights: Visualization of domain conservation across species.
  • Functional Enrichment: Key biological processes identified via GO annotations.
  • Motif Discovery: Conserved motifs linked to functional hotspots.

Contributions

This project was developed by:


License

This project is licensed under the MIT License. See the LICENSE file for details.


Acknowledgments

Python Jupyter Visual Studio Code Anaconda Google Colab Latex Overleaf

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors