# Creating an InterMine workflow using the API.

We are going to re-create the workflow we did using the web interface using the Python API.

We start by importing the Service class from InterMine's webservice module.  You will need to access your account on humanMine and you do this through an API token. You can get your token by logging into [HumanMine](http://www.humanmine.org/) and going to the account details tab within MyMine.  Cut and paste your token into the code below.

In [None]:
from intermine.webservice import Service
service = Service("http://www.humanmine.org/humanmine/service", token = "Your Token")


Our first query looked at genes that are upregulated in adipose tissue.  Using the API we can either generate a query object or a template object to do this.  The code below shows how to generate a query object.  The `"AtlasExpression"` passed to the query object defines the query class.  

To run the template through the API is very similar except we generate a template object rather than a query object (`template = service.get_template('TissueAtlas_Expression')`. `TissueAtlas_Expression` is the name of the template).

In [None]:
# Create a new query against the root class "AtlasExpression"
# Syntax: service.new_query("root_class_here")
query = 

First we will define the output columns that we want in our result - i.e the view.  Note that we have started our query from the Atlas Expression class.  "Condition", "expression", "pValue" and "tStatistic" are attributes of this class.  The gene class is referenced from the AtlasExpression class, so to return the gene information we give the path to that information from the Atlas Expression class - i.e gene.primaryIdentifier etc.

In [2]:
# Now select the following views: 
#
# "condition", "gene.primaryIdentifier", "gene.symbol", "gene.name",
# "expression", "pValue", "tStatistic", "dataSets.name"
#
# The syntax to do so is query.add_view("comma","separated", "set", "of", "views")



Next, add the constraints to your query.  We are only interested in genes expressed in Adipose tissue with a pValue <= 0.01.

In [3]:
# Syntax: query.add_constraint("view_name", "operator", "value")
#
# Let's add two constraints: 
# - Set "condition" to be equal to "Adipose tissue"
# - Set "pValue" to be less than or equal to "0.01"



Now, let's check what the query returns by looping through the rows and printing the results:

In [None]:
for row in query.rows():
    print (row["condition"], row["gene.primaryIdentifier"], row["gene.symbol"], row["gene.name"], 
        row["expression"], row["pValue"], row["tStatistic"], row["dataSets.name"])

Note that this gives a lot of rows.  If we just want to check we are getting the right results we could print just the first 10 rows:

In [None]:
# To add size, the syntax for query.rows becomes 
# query.rows(start=some_number, size=number_of_results_wanted)
#
# Try it yourself - print in a for loop, the same as above, 
# but in the query.rows method we add the two arguments: 
# - start should be set to 0 (e.g. start at the first result)
# - size should be set to 10 (please only show the first ten results)



Now, remember that when we looked at the results table we used the filter options to show just the genes that are "UP" expressed in Adipose tissue.  We can do this by adding another constraint to our query.  (We could have added this straight away in our first set of constraints).

In [None]:
# Same constraint syntax as before! 
# query.add_constraint("view_name", "operator", "value")
# This time, the view "expression" should equal the value "UP",
# and we'll add a 4th argument - code = "A"

 

Now let's check our results again.

In [None]:
for row in query.rows(start=0, size=10):
    print (row["condition"], row["gene.primaryIdentifier"], row["gene.symbol"], row["gene.name"], 
        row["expression"], row["pValue"], row["tStatistic"], row["dataSets.name"])

We want to save this set of genes that are UP expressed in adipose for further analysis.  To do this we define our python list and loop through our results again - this time, instead of printing the results, we append just the primary identifiers returned to our list.

In [None]:
# let's make an empty python list called UpinAdipose
UpinAdipose = list()

# now let's use a for loop on query results and select just 
# the gene primary identifiers 
# then append them to our UpinAdipose list. 



and check that the list we have created looks correct:

In [None]:
print(UpinAdipose)

We now need to save the list to our intermine account so we can use it again in a later query.  The `ListManager` class provides methods to manage list contents and operations.

In [5]:
# first let's make a new list manager assigned to the variable lm
# the syntax to make a list manager is service.list_manager()
 

# next, we want to put the contents of UpinAdipose into an InterMine list.
# The syntax is lm.create_list(content=a_list_of_ids, list_type="identifier_class", name="some name")
# In this case, you'll want to set the following arguments:
# - content should be UpinAdipose
# - list_type is "Gene"
# - name - could be anything you want, but let's be consistent and call it "UpinAdipose"



[Log in to HumanMine](http://www.humanmine.org/) and check your list has been created.

Our second query looked at whether any of the genes that were UP expressed in adipose tissue interact with the pparg gene. First, we define our new query object.  This time we start our query from the Gene class:

In [6]:
# Create a new query against the root class "Gene"
# Syntax: service.new_query("root_class_here")
query2 =

NameError: name 'query2' is not defined

Add the views and constraints:

In [None]:
query2.add_view(
    "primaryIdentifier", "symbol",
    "interactions.participant2.primaryIdentifier",
    "interactions.participant2.symbol", "interactions.details.type",
    "interactions.details.role1", "interactions.details.role2",
    "interactions.details.experiment.interactionDetectionMethods.name",
    "interactions.details.experiment.publication.pubMedId",
    "interactions.details.dataSets.name"
)



In [None]:
# Syntax: query.add_constraint("root_class", "operator", "value", "optional_extra_value", constraint_code)
#
# Constraint A - lookup a gene called "pparg" in H.sapiens
# Constraint B - set "interactions.participant2" to be IN the "UpinAdipose" list




In an interaction we have two participants.  Our first participant is from the Gene class and we have constrained this to be the gene PPARG.  Note that the pparg constraint is a LOOKUP. The lookup operator searches through all the fields in a particular class for the value specified. In the example given below, it will search through the entire gene class to find if any of the fields has an occurence of "pparg". The advantage of this is that you do not need to remember if pparg is a symbol or a name or a primaryIdentifier. Our second participant is from the interactions class and called participant2.  This is a bioentity like Gene and so shares some of the attributes, like primary identifier and symbol.

Check the results:

In [None]:
for row in query2.rows():
    print (row["primaryIdentifier"], row["symbol"], 
        row["interactions.participant2.primaryIdentifier"], row["interactions.participant2.symbol"], 
        row["interactions.details.type"], row["interactions.details.role1"], 
        row["interactions.details.role2"], 
        row["interactions.details.experiment.interactionDetectionMethods.name"], 
        row["interactions.details.experiment.publication.pubMedId"], 
        row["interactions.details.dataSets.name"])

Save the genes that interact with pparg to a list and save this list to your intermine account.  

In [None]:
# Make a new (python) list to store the interesting genes
UpinAdiposeInteractPparg = list()

# Loop through the identifiers and store them in the new list 
for row in query2.rows():
    UpinAdiposeInteractPparg.append(row["interactions.participant2.primaryIdentifier"])

In [None]:
# We created a list manager earlier, called lm.
# Now we need to use the list manager to save another list.
# 
# syntax reminder: is lm.create_list(content=a_list_of_ids, list_type="identifier_class", name="some name")
# 
# - content should be UpinAdiposeInteractPparg
# - list_type is "Gene"
# - name  "UpinAdiposeInteractPparg"



Finally, run the third query (genes that are associated with the disease diabetes, that we originally created using the query builder) and again, save the set of genes that are returned to your intermine account.

In [None]:
query3 = service.new_query("Gene")

# Let's add views for "primaryIdentifier" and "symbol" using query.add_view()


# And let's give it some constraints using 
# query.add_constraint("view_name", "operator", "value", code = "constraint_code")
# 
# Constraint A: organism_name should equal Homo sapiens
# Constraint B: diseases.name should contain diabetes (operator is CONTAINS) 

# We've written the code to print it out for you. 
for row in query3.rows():
    print (row["primaryIdentifier"], row["symbol"])

In [None]:
# Make a python list of gene identifiers
diabetesGenes = list()
for row in query3.rows():
    diabetesGenes.append(row["primaryIdentifier"])

In [None]:
# One last time, we'll create a list and save it to our HumanMine account
#
# syntax: lm.create_list(content=a_list_of_ids, list_type="identifier_class", name="some name")
# 
# - content should be diabetesGenes
# - list_type is "Gene"
# - name  "diabetesGenes"
# Try it now: 


Finally, we used a list intersect to find those genes that are upregulated in adipose tissue that also interact with pparg, that are also associated with the diease diabetes.  We need to intersect the second (UpinAdiposeInteractPparg) and third (diabetesGenes) lists that we created.  We can do this using the intersect method from the ListManager class.

In [None]:
# The syntax to create an InterMine list intersection is
# lm.intersect(["comma_separated", "list", "of_intermine_lists"], "name for new list")
#
# We want to intersect the last two lists we created - 
# "UpinAdiposeInteractPparg" and "diabetesGenes"
# try it now: 



The last list intersection was stored in our HumanMine account, so we need to use the method `get_list` to retrieve it from HumanMine 

In [7]:
# Syntax: lm.get_list("name of the intersected list you just created")
# Store it in a variable called final, so we can print it in the next step


In [None]:
print(final)

In [None]:
for gene in final:
        print (gene.primaryIdentifier, gene.symbol)