Skip to content

jpoullet2000/cgs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The CGS project

The CGS project stands for Centralized genomics system. The goal of this project is to offer a big data infrastructure for genomics data, and in particular for variant information.

CGS project

This project contains different modules that are related to each other but still can be used independently (up to some extent). Here are the modules available as github repositories:

  • cgs-data: this package defines how the data are built in this project, in particular how to upload variant information (VCF) in HBase, how to build a corresponding table in the metastore that will be accessible by tools like Hive, Impala, etc, allowing you to make SQL-like requests on your database. This also allows you to parametrize security on HBase (on column families, or even on HBASE cells).
  • cgs-apps: this package provides apps, for instance an apps allows you to access resources (or data) similar to the one you would access with Google Genomics. Those apps are developed as plugins for Hue.
  • cgs-analysis: this package implements analysis tools that can be performed on data available in this project, for instance machine learning techniques distributed in the hadoop framework.
  • cgs-benchmarks: this package presents different benchmark studies that have been performed using this system (CGS).

Since *cgs-apps may depend on cgs-analysis, which on its turn may depend on cgs-data, we can represent the packages in a stack, as illustrated below.

CGS project

This project has been initiated within the BridgeIris project funded by Innoviris.

Note that this project is at a very infant stage.

About

Centralized genomics system

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published