# Intermine-Python: Tutorial 5: Query Results

This tutorial will talk about dealing with the results of our query. You can either store the results into a file (using a library like csv) or you can process the results immediately after you extract them. 

We will write a short query and will then explore the Results class of InterMine. 

In [None]:
from intermine.webservice import Service
service = Service("www.flymine.org/flymine/service")
query=service.new_query("Gene")
query.select("publications.*")
query.add_constraint("Gene","LOOKUP","zen",extra_value="D. melanogaster")

Once we have added our constraints and views, we are ready to look at the results. The results can be accessed in either a dictionary form, or a list, a ResultRow object (the most common one), or even as a list of strings (CSV or TSV).

##### Valid row types are: 

- dict
- list
- rr
- csv
- tsv 

In [None]:
# This syntax is probably familiar from previous tutorials:
for gene in query.rows():
    print(gene)

In [None]:
# There's also an alternate syntax: 
# for x in x.results(row_type):
#     # code to output results here, 
#     # e.g. maybe print(x).    
# try writing a for loop that uses the "rr" (resultrow) type.



Iterating through `query.results(row="rr")` and iterating through `query.rows()` are equivalent. Feel free to use whichever you feel more comfortable with. If you want to extract only specific columns, it may be easier to use `"list"` instead of `"rr"`. Let's say you want to extract column 2 & 3, i.e. `publications.doi` and `publications.firstAuthor`, then it can be done by running `query.results("list")`, then printing columns by index. 

In [None]:
# Try writing a for loop that iterates over query.results("list")
# and prints the column at index 1 (doi) and column 2 (author) of each row.  
# the syntax to access a single column from a list is list[index_in_list]
# e.g for the second item in a list, access list[1] 
# since lists start counting at 0



Maybe we want to only print  those rows where publications.doi is not `None`. We can add an `if` condition to do this.

In [None]:
#Same as before, but you probably want to use if row[1] != None: before you print.. 



You can pass two more parameters while passing `query.results()`. These are `start` and `size`. `Start` represents the row number that you want to start processing from. By default this is set to 0 (first row). `Size` represents the number of rows that you want to print. Lets say we want to print rows 10 and 11 only. 

In [None]:
# use a for loop to print rows, 
# - starting at 10, 
# - with a size of 2, 
# - with the row type "rr".
# Syntax to limit the query: query.results(row="row_type", size=some_size, start=start_index)


### Working with CSVs

If you prefer dealing with lists of strings, i.e. csv and tsv objects you can use them too. First, we need to import the csv library. If you want to read your results in a CSV format you create a `csv.reader` object. Let's try below.

In [None]:
import csv

In [None]:
# Let's create a csv reader object assigned to the variable csv_reader
# The syntax to create a reader is csv.reader(input_data, delimiter=",", quotechar='"')
# 
# In this case the input data is a query.results() iterator with 
# - row type set to csv,
# - start at result 10
# - select 10 results
# e.g. query.results(row="csv", size=10, start=10)
# Now we just need to glue those elements together:



In [None]:
for row in csv_reader:
    print(row[0])

If you have used the csv library before try writing your results into a csv file using the writer class. If you have not used it before, trying going through the documentation first and then writing code on your own. It is pretty self explanatory. The documentation can be found at: https://docs.python.org/2/library/csv.html

### Summarized results

The last thing that we will look at, in this tutorial, is the summarize method. This method proves to be particularly useful when we want some basic statistics regarding a particular column. We will look at the statistics of the length of genes present in the list : List of the most enriched genes in the adult fly brain. We begin by creating a query. This is followed by adding views and the list constraint. 

In [None]:
# query setup
query2=service.new_query()
query2.select("Gene.*","organism.*")
query2.add_constraint("Gene","IN","PL FlyAtlas_brain_top")

We then print out the first 10 rows of results. 

In [None]:
for row in query2.rows(size=10):
    print(row)

We can look at the summary of the length of each gene. This contains some useful information such as the average length and the maximum and minimum length. 

In [None]:
# Let's print the summary of the Gene's length
# Syntax: for summary: query2.summarise(field_name)



This brings us to the end of the fifth tutorial. The next tutorial will be about further management of results. 