# Package Manager

RubyGems is a PackageManager - that is, it is software that makes it easier to find, share, and reuse other people's classes

The Website for RubyGems is:  http://rubygems.org

browse to that site now...

search for "xml" - we are looking for a package that will give us an easy way to manage XML files  (please tell me if you need me to give you a lecture on XML...). XML is a syntax for carrying structured data. It follows a tree structure and returns the information as hashes and lists.



Found one here:  https://rubygems.org/gems/xml-simple

        
## xml-simple 1.1.5

A simple API for XML processing.


Click on the "documentation" link:  http://www.rubydoc.info/gems/xml-simple/1.1.5


    Class: XmlSimple

    Inherits:    Object
    Includes:  REXML
    Defined in:   lib/xmlsimple.rb


The "Class" tells you the name of the object
The "Defined in" tells you the name of the package you need to require.




In [1]:
require 'xmlsimple'
simple = XmlSimple.new

#<XmlSimple:0x0000000001867238 @default_options=nil, @options={}, @_var_values=nil>

# INSTALLING A GEM

To install this package (called a "Gem" in Ruby):

Open a terminal window, and type:

    $ gem install xml-simple  (note - same capitalization and punctuation as the **package** name)
    
After you install a Gem, you will need to restart the Ruby kernel in your Jupyter Notebook.  In the menu bar of this web page, you will see "Kernel", select that, then "restart and clear output".

Now your package is available to use in your code!


## Look at more of the documentation


### Class Method Summary

    .xml_in(string = nil, options = nil) ⇒ Object
    This is the functional version of the instance method xml_in.
    
    
    .xml_out(hash, options = nil) ⇒ Object
    This is the functional version of the instance method xml_out.

### Instance Method Summary

    #initialize(defaults = nil) ⇒ XmlSimple constructor
    Creates and initializes a new XmlSimple object.
    
    
    #xml_in(string = nil, options = nil) ⇒ Object
    Converts an XML document in the same way as the Perl module XML::Simple.
    
    
    #xml_out(ref, options = nil) ⇒ Object
    Converts a data structure into an XML document.


The documentation tells you that this object is extremely simple - basically, it can do two things:  read XML in, and write XML out.   Interestingly, it also tells you that the object has both Class methods, and Instance methods, and that these methods (xml_in and xml_out) are identical.  

That means that this are the same:



In [2]:
require 'xmlsimple'

simple = XmlSimple.new  # create an instance of XmlSimple
data1 = simple.xml_in("<xml>hello1</xml>")  # call the instance xml_in method
data2 = XmlSimple.xml_in("<xml>hello2</xml>")  # call the class xml_in method

puts data1
puts data2

hello1
hello2




# Let's get some interesting XML data


surf to:  http://rest.ensembl.org  (https://academic.oup.com/bioinformatics/article/31/1/143/2366240)

This is another API into the EnsEMBL database.  Like DB Fetch, it provides predictably structured URLs for access to the data in EnsEMBL (these ones are somewhat "cleaner" than DB Fetch, but DB Fetch can access things that this API cannot)

Scroll down to the "Ontologies and Taxonomy" section.  

Click on "taxonomy/id/:id"

The documentation tells you that this will retrieve the taxonomy information for a given species.  Their examples are human (taxon:9606).  Arabidopsis is taxon:3701.

We want to know the names of all Arabodiopsis species.  We will retrieve the taxon information for Arabidopsis, and using our new XmlSimple package, we will parse the data in xml format:

http://rest.ensembl.org/taxonomy/id/3701?content-type=text/xml


Open this link in a new tab so you can see the structure of the document.  
**NOTE:  The EnsEMBL XML is incorrectly formatted right now, so your browser will reject it!  You must press CTRL-U to see it!    ...sorry.... not my fault...**



In [3]:
require 'net/http'   # this is how you access the Web
require 'xmlsimple'
#require 'pp'

address = URI('http://rest.ensembl.org/taxonomy/id/3701?content-type=text/xml')  # create a "URI" object
response = Net::HTTP.get_response(address)  # use the Net::HTTP object "get_response" method to call that address
#puts response.body

# The code below fixes a problem with the EnsEMBL REST output XML
# EnsEMBL fixed this problem themselves, but have not published the change yet
# you can try to understand what it was doing!  :-)
# http://ruby-doc.org/core-1.9.3/String.html#method-i-gsub 
cleaned_body = response.body.gsub(/<(\/?)(\w+)\s(\w+)>/, '<\1\2\3>')
cleaned_body.gsub!(/<(\/?)(\w+)\s(\w+)\s(\w+)>/, '<\1\2\3\4>')

data = XmlSimple.xml_in(cleaned_body)   # create the XML object (hashes and lists)
for child in data["data"][0]["children"].each 
  puts child["name"]
end

Arabidopsis thaliana x Arabidopsis arenosa
Arabidopsis thaliana x Arabidopsis lyrata
Arabidopsis sp. hda9-2
Arabidopsis petrogena
Arabidopsis arenosa x Arabidopsis thaliana
Arabidopsis thaliana x Arabidopsis halleri subsp. gemmifera
Arabidopsis pedemontana
Arabidopsis halleri
Arabidopsis cebennensis
Arabidopsis thaliana x Arabidopsis halleri
Arabidopsis sp. NH-2014a
Arabidopsis arenicola
Arabidopsis lyrata
Arabidopsis thaliana
Arabidopsis sp.
Arabidopsis croatica
Arabidopsis neglecta
Arabidopsis arenosa
Arabidopsis lyrata x Arabidopsis halleri
Arabidopsis suecica
Arabidopsis kamchatica
(Arabidopsis thaliana x Arabidopsis arenosa) x Arabidopsis suecica
Arabidopsis umezawana


[{"name"=>"Arabidopsis thaliana x Arabidopsis arenosa", "id"=>"1240361", "leaf"=>"1", "scientific_name"=>"Arabidopsis thaliana x Arabidopsis arenosa", "tags"=>[{"name"=>["Arabidopsis thaliana x Arabidopsis arenosa"], "scientific_name"=>["Arabidopsis thaliana x Arabidopsis arenosa"]}]}, {"name"=>"Arabidopsis thaliana x Arabidopsis lyrata", "id"=>"869750", "leaf"=>"1", "scientific_name"=>"Arabidopsis thaliana x Arabidopsis lyrata", "tags"=>[{"name"=>["Arabidopsis thaliana x Arabidopsis lyrata"], "scientific_name"=>["Arabidopsis thaliana x Arabidopsis lyrata"]}]}, {"name"=>"Arabidopsis sp. hda9-2", "id"=>"1746102", "leaf"=>"1", "scientific_name"=>"Arabidopsis sp. hda9-2", "tags"=>[{"name"=>["Arabidopsis sp. hda9-2"], "scientific_name"=>["Arabidopsis sp. hda9-2"]}]}, {"name"=>"Arabidopsis petrogena", "id"=>"302551", "leaf"=>"0", "scientific_name"=>"Arabidopsis petrogena", "tags"=>[{"name"=>["Arabidopsis petrogena"], "authority"=>["Arabidopsis petrogena (A.Kern.) V.I.Dorof.", "Cardaminopsis

## TASK 5A: Prove that you understand Gems and Documentation by using a different Web resource

Use the Gene Ontology again.  Find the Ruby Gem that handles Gene Ontology (GO) files.

Reading the documentation, you see that it reads GO from a file, so you will need to create that data file.  Jupyter has specific locations for data files (see the documentation here:  http://jupyter.readthedocs.io/en/latest/projects/jupyter-directories.html)

In your code, retrieve the GO Slim Plant Ontology:
http://www.geneontology.org/ontology/subsets/goslim_plant.obo
and write it to a file (this is how you do that:)

<code>
    File.open('geneontology.obo', 'w') do |myfile|  # w makes it writable
      myfile.puts geneontologycontent  
    end  
</code>

now follow the documentation for the GeneOntology object and try to create a ruby script that outputs the name and definition for GO:0005634

In [5]:
require 'gene_ontology'
go = GeneOntology.from_file("geneontology.obo")
term = go.id_to_term["GO:0005634"]
term.name
term.xref

["NIF_Subcellular:sao1702920020", "Wikipedia:Cell_nucleus"]

# Learn something new in Ruby --> Blocks

Look at the documentation for the "each" method of the GeneOntology Term object:

--------------

### Instance Method Details

__#each(&block) ⇒ Object__

starting with that term, traverses upwards in the tree so if you call term.each it will go up the Gene Ontology tree until it gets to "root" (depending on the tree, this will be "biological proces", "molecular function", or "cellular component").... but what does it do with those terms???

This is what a "&block" is.  It gives you the chance to tell the Object what __you want__ it to do!

&block is, therefore, a piece of code that you provide to the method, where the object sends information into your block of code.

For example:



In [7]:
require 'net/http'   # this is how you access the Web
require 'gene_ontology'  # the gem for gene ontology obo files

go = GeneOntology.new.from_file("geneontology.obo")
term = go.id_to_term['GO:0003676']

# There are two ways to pass a block of code.  You can do it all on one line:
term.each {|thisterm| puts "The term #{thisterm.name} is at level #{thisterm.level} of the ontology"}
puts ""; puts""

# or you can do it on multiple lines as follows
term.each do |thisterm| 
  puts "The term #{thisterm.name} is at level #{thisterm.level} of the ontology"
end

The term nucleic acid binding is at level 2 of the ontology
The term binding is at level 1 of the ontology
The term molecular_function is at level 0 of the ontology


The term nucleic acid binding is at level 2 of the ontology
The term binding is at level 1 of the ontology
The term molecular_function is at level 0 of the ontology


[<[1]GO:0005488: binding is_a.size=1>]

## note that this is an example of "method overriding"

You already know what ".each" does on list objects...



In [8]:
[1,2,3,4].each do |number|
  puts "#{number} plus 1 equals #{number + 1}"
end

1 plus 1 equals 2
2 plus 1 equals 3
3 plus 1 equals 4
4 plus 1 equals 5


[1, 2, 3, 4]

The author of the GeneOntology object wanted to provide exactly the same functionality, but could not use the native .each method of the object, because.... (well... honestly, because the object is not a list!)  So the author implemented their own ".each" method, which takes a block, and then traverses along the ontology tree.  You can see the code in 

    /home/osboxes/.rvm/gems/ruby-2.4.2/gems/gene_ontology-0.0.1/lib/gene_ontology.rb

<code>
    def each(&block)
      block.call(self)
      is_a.each do |term|
        term.each(&block)
      end
    end

</code>

So now their code functions exactly like .each does on a list, but the list is generated by calling other methods (is_a) that traverse up the ontology tree.

That's quite cool!

# Prove you understand

Create code that will search for the term "receptor activity", then it reports the GO number, GO Term, and the definition (def) for each of the parent terms.

# Ruby Documentation

The documentation provided on the rubygems website is contained inside of the Gems.  The authors provide documentation in either RDoc format, or YARD format.  We are going to look at YARD.

Good documentation is __critical__ if you write code for others to use!  

__NOTE:  I will now start including the quality of documentation in my evaluation of your assignments!!!__


YARD is explained on their website:  https://yardoc.org/guides/index.html

We will begin with a simple Patient.rb class:



In [9]:
class Patient

  attr_accessor :name
  attr_accessor :age 
  
  def initialize (thisname = "Some Person", thisage = "10") 
      @name = thisname 
      @age = 10
  end
  
end

:initialize

This class has three methods:  "initialize", (called with Patient.new()),  "name" and "age"

We should document the behavior of these.   This is how it looks when we document the object using YARD tags


In [10]:
# == Patient
#
# This is a simple representation of a patient
# with name and age attributes
#
# == Summary
# 
# This can be used to represent aspects of sick people
#

class Patient

  # Get/Set the patient's name
  # @!attribute [rw]
  # @return [String] The name
  attr_accessor :name

  # Get/Set the patient's age
  # @!attribute [rw]
  # @return [Integer] The age
  attr_accessor :age 
  
  # Create a new instance of Patient

  # @param name [String] the name of the patient as a String
  # @param age [Integer] the age of the patient as a Integer
  # @return [Patient] an instance of Patient
  def initialize (name = "Some Person", age = 10) 
      @name = name 
      @age = age
  end
end

:initialize

# Generating YARD documentation

open a terminal window.  Browse to /home/osboxes/UPM_BioinfoCourse/Lectures

Edit the Patient.rb file to include the YARD documentation above, then save it.

Now, in your terminal type:

    $  yard doc Patient.rb
    
You get a short report about how many things were documented.  All of the documentation is in a new folder called "doc"

Staying at the command prompt, type:

    $ firefox ./doc/index.html
    
There is your documentation!  



# try it yourself

Explore the documentation for yourself.  look at the http://www.rubydoc.info/gems/yard/file/docs/GettingStarted.md  website and try some other tags and markup.  Also look-up RDoc, since the yard documentation tool can understand RDoc instructions also.
    

# Good Code "Hygiene"

When you create a Class that you expect others to use, you should be polite and put your Classes into their own "Namespace".

For example, if you have a class "Patient", you could reasonably assume that someone else, somewhere in the world, has also defined a class called "Patient".  If one of your users has downloaded both your code, and that other person's code, they will now have two different Class definitions for "Patient", and 

<code>require 'Patient"</code>

will now be an ambiguous statement.

What to do?   Simply put your objects into a folder that has a name that is unlikely to be used by anyone else - for example, your initials, or the project name.  

Look in the <code>UPM_BioinfoCourse/Lectures/</code> folder.  You will see there is a folder called "Mdw_objects".  Inside of that folder I have a file (patient.rb) containing a Class definition (Patient).

The content of that file are slightly different - it uses a "Module", which is a way to group classes into a namespace.  The code is:


In [11]:
module Mdw_objects
        class Patient  # this could also be Mdw_objects::Patient, if you want to be extremely clear!
          attr_accessor :name         
          def initialize (thisname = "Some Person")
                @name = thisname
           end
        end
end

:initialize

The way you use this object is as follows:

In [12]:
require './mdw_objects/patient'  # my class definition is  in a different namespace, wont crash into someone elses

p = Mdw_objects::Patient.new()

#<Mdw_objects::Patient:0x0000000002a7b460 @name="Some Person">

## NOTE:  the naming conventions for Ruby are a bit surprising...

see:  http://guides.rubygems.org/name-your-gem/


    GEM NAME	           REQUIRE STATEMENT               MAIN CLASS OR MODULE
    ruby_parser	           require 'ruby_parser'	       RubyParser
    rdoc-data	           require 'rdoc/data'	           RDoc::Data
    net-http-persistent	   require 'net/http/persistent'   Net::HTTP::Persistent
    net-http-digest_auth   require 'net/http/digest_auth'  Net::HTTP::DigestAuth

## That was easy!

The next thing you will want to do is help your users install your code.  You want to create your own Ruby Gem.

Go to your terminal window, and type:

<code>gem install bundler</code>

Bundler is a Ruby Gem that builds Ruby Gems.  Our "Patient" class (Mdw_objects::Patient) should follow the naming conventions shown above.  i.e., the name of the Gem should be:  mdw_objects-patient

To begin creating your Gem go to the command line and type:

<code>bundle gem mdw_objects-patient</code>

 *  It will ask you if you want to generate "test files" - for the moment, say "none"
 * Answer "y" to the question about licenses (you can change your mind later!)
 * Answer "n" to the code-of-conduct question (you can change your mind later!)
 
 It will now create mdw_objects-patient folder for you.  Now in your terminal, the commands:
 
 <code>$  cd mdw_objects-patient</code>

 <code>mdw_objects-patients$  ls</code>
 
 <code>bin  Gemfile  lib  LICENSE.txt  mdw_objects-patient.gemspec  Rakefile  README.md</code>

## now look at the metadata file

<code>
**mdw_objects-patient$ cat mdw_objects-patient.gemspec **


lib = File.expand_path("../lib", _\_FILE\_\_)
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
require "mdw_objects/patient/version"

Gem::Specification.new do |spec|
  spec.name          = "mdw_objects-patient"
**  spec.version       = MdwObjects::Patient::VERSION **
**  spec.authors       = ["Mark Wilkinson"]**
**  spec.email         = ["markw@illuminae.com"]**

**  spec.summary       = %q{TODO: Write a short summary, because RubyGems requires one.} **
**  spec.description   = %q{TODO: Write a longer description or delete this line.} **
**  spec.homepage      = "TODO: Put your gem's website or public repo (your GitHub!) URL here." **
  spec.license       = "MIT"

  
  if spec.respond_to?(:metadata)
    spec.metadata["allowed_push_host"] = "TODO: Set to 'http://mygemserver.com'"
  else
    raise "RubyGems 2.0 or newer is required to protect against " \
      "public gem pushes."
  end

  spec.files         = `git ls-files -z`.split("\x0").reject do |f|
    f.match(%r{^(test|spec|features)/})
  end
  spec.bindir        = "exe"
  spec.executables   = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
  spec.require_paths = ["lib"]

  spec.add_development_dependency "bundler", "~> 1.16"
  spec.add_development_dependency "rake", "~> 10.0"
end
</pre></code>

The GEM Bundler has done some nice things for us!  It has created a "VERSION" constant (do you know what a "constant" is?)
  
The constant is available from <code>MdwObjects::Patient::VERSION</code>

To see this, go into the "Lectures/mdw_objects-patient/lib" folder, then start the Ruby command line:
  
  <pre>
  /mdw_objects-patient/lib$   irb
    
  <code>2.4.2 :002 > require './mdw_objects/patient/version'</code>
  <code> => true </code>
  <code>2.4.2 :003 > require './mdw_objects/patient'</code>
  ....ignore all of the error messages here
  ....
  ....
  ....
  <code>2.4.2 :004 > puts MdwObjects::Patient::VERSION</code>
  0.1.0
  => nil 
  2.4.2 :005 > exit
  
  /mdw_objects-patient/lib$   cat ./mdw_objects/patient/version.rb

  module MdwObjects
    module Patient
      VERSION = "0.1.0"   <  "VERSION" is defined here.  You will update this every time you make a new Gem
    end
  end

## last step - move your code 

Notice that Bundler has already created a mdw_objects/patient.rb file for you, but it is almost empty:


    require "mdw_objects/patient/version"

    module MdwObjects
       module Patient
           #### Your code goes here...
       end
    end

All you need to do is copy your code into that module (or, add the first line   <code>require "mdw_objects/patient/version"</code>    to your existing patient.rb module and then replace this file.


##  Release your Gem to the world!

We will not do this, but the final step is to say:

    Lectures/mdw_objects-patient$  gem push
    
You are then asked to login to RubyGems.  Once you do, your Gem will be deposited so that anyone can install it by typing 

     "gem install mdw_objects-patient"
