Database of circadian rhythms
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.rdoc

CircaDB

A Database of Circadian Gene Expression Profiles

Welcome to the CircaDB project. This is a data set of time course series data, highlighting circadian gene expression cycles through the use of the Cosopt algorithm for determining rhythmicity.

It is a Ruby on Rails application with a single controller and large data pre-set data set. Notable aspects include full-text search capability of gene annotation data via the Sphinx search engine, and data graphs dynamically produced by the Google Chart API.

If you have any questions contact the Hogenesch Lab at the University of Pennsylvania

bioinf.itmat.upenn.edu/hogeneschlab

License

All of the source code for data loading and the web application is licensed under the GNU General Public License (GPL-2.0). See the LICENSE file for more details.

Installing

Prerequisites

Install Ruby (www.ruby-lang.org/en/), RubyGems (rubygems.org/) and the bundler gem (gembundler.com/). Install MySQL (version >= 5.0) or have one you can connect to and create a database Install the Sphinx full text search engine (freelancing-god.github.com/ts/en/installing_sphinx.html)

Install

If you are only interested in reviewing the site and running a local instance, then:

  1. Configure the config/database.yml file from the example config/database.yml.example to connect to your MySQL database.

  2. Use the provided rake tasks to create the database

rake bootstrap:all

Visualize your own dataset

If you are interested in loading your own data, then you will need to load the array annotation information, add a Assay object for your experiment, then add the raw and statistic data from your experiment. See the lib/tasks/seed.rake for an example of our routines that manipulate and load data from source data files.

Examples to prepare your data can be found in /prepare_data_scripts.

  1. You will need to find the matching annotation file for your gene chip. These can be found at websites like Affymetrix. To make the annotation files compatible with the database, they need to be in the following format (Checkout prepare_affy_annotation for an example):

    %w{ gene_chip_id probeset_name genechip_name species annotation_date sequence_type
    sequence_source transcript_id target_description representative_public_id
    archival_unigene_cluster unigene_id genome_version  alignments
    gene_title gene_symbol chromosomal_location unigene_cluster_type
    ensembl entrez_gene swissprot ec omim
    refseq_protein_id refseq_transcript_id flybase agi  wormbase
    mgi_name rgd_name sgd_accession_number go_biological_process
    go_cellular_component go_molecular_function
    pathway interpro trans_membrane qtl annotation_description
    annotation_transcript_cluster
    transcript_assignments annotation_notes }.join(",")
  2. Now you will need to prepare the microarray data itself. To fill the database the following format is expected (Checkout prepare_hughes_liver_data for an example), where Cubase is the link to the google API plot:

    %w{ probe_set time_points.join(",") data_points.join(",") cubase }.join("@")
  3. After the statistic test are finished you can use prepare_stats.

    %w{row_names p_lomb_scargle q_lomb_scargle period_lomb_scargle phase_shift_lomb_scargle p_values_lichtenberg, q_values_lichtenberg
    period_lichtenberg BH_Q_jtk ADJ_P_jtk
    PER_jtk LAG_jtk AMP_jtk}.join(",")
  4. Place your prepared datasets into /seed_data folder.

  5. Before the dataset can be added with rake seed:fill , we will need to take a look at /lib/tasks/seed.rake. It is important to comment out the pre-existing datasets that are not required for your own database.

    task :genechips => :environment do
      c = ActiveRecord::Base.connection
      c.execute "delete from gene_chips"
      g  = GeneChip.new(:slug => "Mouse430_2", :name => "Mouse Genome 430 2.0 ( Affymetrix)")
      g.save
      ...
      ### Add your gene chip here
      #g  = GeneChip.new(:slug => "SLUG-NAME", :name => "DESCRIPTION")
      #g.save
    end
    ...
    desc "Seed SLUG-NAME annotations"
    task :SLUG-NAME_probesets => :environment do
      # probes
      fields = %w{ gene_chip_id probeset_name genechip_name species   annotation_date sequence_type sequence_source transcript_id   target_description representative_public_id archival_unigene_cluster  unigene_id genome_version alignments gene_title gene_symbol  chromosomal_location unigene_cluster_type ensembl entrez_gene swissprot ec   omim refseq_protein_id refseq_transcript_id flybase agi wormbase mgi_name   rgd_name sgd_accession_number go_biological_process go_cellular_component   go_molecular_function pathway interpro trans_membrane qtl   annotation_description annotation_transcript_cluster transcript_assignments   annotation_notes }
      g  = GeneChip.find(:first, :conditions => ["slug like ?", "SLUG-NAME"])
      count = 0
      buffer = []
      puts "=== Begin Probeset insert ==="
      FasterCSV.foreach("#{RAILS_ROOT}/seed_data/prepared_SLUG-NAME.csv",     :headers=> true ) do |ps|
        count += 1
        buffer << [g.id] + ps.values_at
        if count % 1000 == 0
          Probeset.import(fields,buffer)
          buffer = []
          puts count
        end
      end
      Probeset.import(fields,buffer)
      puts count
      puts "=== End Probeset insert ==="
    end
    ...
    v = [["liver","Mouse Liver 48 hour (Affymetrix)", affy_id],
       ["pituitary","Mouse Pituitary 48 hour (Affymetrix)",affy_id],
       ["NIH3T3","NIH 3T3 Immortilized Cell Line 48 hour (Affymetrix)",affy_id],
       ["WT_liver","Wild Type Liver (GNF microarray)", gnf_id],
       ["WT_muscle","Wild Type Muscle (GNF microarray)",gnf_id],
       ["WT_SCN","Wild Type SCN (GNF microarray)", gnf_id],
       ["panda_liver","Liver Panda 2002 (Affymetrix)",u74av1_id],
       ["panda_SCN_MAS4","SCN MAS4 Panda 2002 (Affymetrix)", u74av1_id],
       ["panda_SCN_gcrma","SCN gcrma Panda 2002 (Affymetrix)", u74av1_id]]
       ### ADD YOUR ASSAY HERE
       # ["SLUG-NAME","description", GENE-CHIP_id]]
    
    #v = []
    Assay.import(f,v)
    ...
    ### Add your dataset
    %w{ YOUR_PREPARED_DATA_SET }.each do |etype|
    count = 0
    buffer = []
    a = Assay.find(:first, :conditions => ["slug = ?", etype])
    puts "=== Raw Data #{etype} insert starting ==="
    
    File.open("#{RAILS_ROOT}/seed_data/#{etype}_data","r" ).each do |line|
      count += 1
      line = line.split("@")
      time_points = line[1].split(",").map {|element| element}
      data_points = line[2].split(",").map {|element| element}
      cubase = line[3]
      psid = probesets[line[0]]
      buffer << [a.id(), a.slug, psid, line[0], time_points.to_json, data_points.to_json,cubase]
    
      if count % 1000 == 0
        ProbesetData.import(fields,buffer)
        puts count
        buffer = []
      end
    end
    ProbesetData.import(fields,buffer)
    puts "=== Raw Data #{etype} end (count= #{count}) ==="
    ...
    ### Add your stats
    %w{ YOUR_PREPARED_DATA_SET }.each do |etype|
    
      count = 0
      buffer = []
      a = Assay.find(:first, :conditions => ["slug = ?", etype])
      puts "=== Stat Data #{etype} start ==="
    
      FasterCSV.foreach("#{RAILS_ROOT}/seed_data/hughes_#{etype}_stats") do |row|
        count += 1
        aslug, psname = 0,row[0].to_i
        psid = probesets[row[0]]
        buffer << [a.id, a.slug,psid, psid, psname] + row[1..-1].to_a
        if count % 1000 == 0
          ProbesetStat.import(fields,buffer)
          buffer = []
          puts count
        end
      end
      ProbesetStat.import(fields,buffer)
      puts "=== Stat Data #{etype} end (count = #{count}) ==="
    end
    ...
  6. Run:

    rake seed:fill