In [1]:
# !AUTOEXEC

%reload_ext gp
%reload_ext gp_widgets
%reload_ext gp_magics

# Don't have the GenePattern library? It can be downloaded from: 
# http://genepattern.broadinstitute.org/gp/downloads/gp-python.zip 
import gp

# The following widgets are components of the GenePattern Notebook extension.
try:
    from gp_widgets import GPAuthWidget, GPJobWidget, GPTaskWidget
except:
    def GPAuthWidget(input):
        print("GP Widget Library not installed. Please visit http://genepattern.org")
    def GPJobWidget(input):
        print("GP Widget Library not installed. Please visit http://genepattern.org")
    def GPTaskWidget(input):
        print("GP Widget Library not installed. Please visit http://genepattern.org")

# The gpserver object holds your authentication credentials and is used to
# make calls to the GenePattern server through the GenePattern Python library.
# Your actual username and password have been removed from the code shown
# below for security reasons.
gpserver = gp.GPServer("http://127.0.0.1:8080/gp", "", "")

# Return the authentication widget to view it
GPAuthWidget(gpserver)

GenePattern IPython Module Loaded!




In [5]:
# GenomeSpace Authentication

import httplib2, urllib, hashlib, base64, json

# Set login credentials, obtain gs-token
h = httplib2.Http('.cache')
username = 'gsnb'
password = 'gsnb1234'
h.add_credentials(username, password)

response = [] # list to cache log of reponses (for debugging)
content = [] # list to cache log of content (for debugging)
(resp, cont) = h.request('https://identity.genomespace.org/identityServer/basic', 'GET')
response.append(resp)
content.append(cont)

try:
    # Object "token" stores a cookie named "gs-token"; must be included with every subsequent http request
    token = {'Cookie': resp['set-cookie']}
except KeyError:
    print 'Error: Authentication failed.'

# Build default header for subsequent HTTP requests
header_default = token.copy()
header_default['Accept'] = 'application/json,text/plain'

# Setup JSON decoder
json_dec = json.JSONDecoder()

# Find subnetworks of differentially expressed genes and identify associated biological functions

##Summary
This recipe provides one method of using genes that are differentially expressed between two phenotypes, such as normal and tumor, to find subnetworks of interacting proteins and determine their functional annotations using Gene Ontology. In particular, this recipe makes use of several GenePattern modules to identify differentially regulated genes, then uses several Cytoscape plugins to identify potential interactions between gene products, and to visualize the resulting network.

>**Why differential expression analysis?** We assume that most genes are not expressed all the time, but rather are expressed in specific tissues, stages of development, or under certain conditions. Genes which are expressed in one condition, such as cancerous tissue, are said to be differentially expressed when compared to normal conditions. To identify which genes change in response to specific conditions (e.g. cancer), we must filter or process the dataset to remove genes which are not informative.

>**Why protein interaction network analysis?** Gene expression analysis results in a list of differentially expressed genes, but it does not explain whether these genes are connected biologically in a pathway or network. To better understanding the underlying biology that drives changes in gene expression analysis, we can perform network analysis to determine whether gene products (e.g. proteins) are reported to interact. To identify potential networks or pathways, we search for highly interconnected subnetworks within a large interaction network.

##Input
To complete this recipe, we will need a gene expression dataset describing two conditions or phenotypes, such as cancervs. normal tissue. In this example, we will use gene expression data from a study in which committed granulocyte macrophage progenitor cells (normal phenotype) were transformed into leukemia stem cells (leukemic phenotype) by introduction of the MLL-AF9 protein. This example data is derived from mouse (_Mus musculus_) cell lines. We will need the following datasets, which can be downloaded from GenomeSpace's Public folder:
>**Normal_Leu.gct**: This file contains gene expression data of two phenotypes: normal and leukemic. The file is available in GenePattern's GCT format.

>**Normal_Leu.cls**: This file contains class assignments (normal or leukemic) for all the samples in the GCT file, as identified by the GenePattern CLS format.

**Getting Data**

1. Log into GenomeSpace and create a new working folder, e.g., _diff_subnetworks_.
2. Navigate to the following Public data folder: `Public > SharedData > Demos > Analysis Recipe 1`
3. Copy the following two files into your new working folder, e.g., _diff_subnetworks_:
  * Normals_Leu.gct
  * Normals_Leu.cls

<a href="http://www.youtube.com/watch?feature=player_embedded&v=9_9qsYWE_K8
" target="_blank"><img src="http://img.youtube.com/vi/9_9qsYWE_K8/0.jpg" 
alt=" Video: Copying data from the GenomeSpace Public Folder into your personal GenomeSpace account." width="480" height="360" border="5" /></a>

In [None]:
# Getting Data

# GenomeSpace url prefixes
dm_url = 'https://dm.genomespace.org/datamanager/v1.0'

# 1. Create a new working folder, e.g., diff_subnetworks
dirname = 'diff_subnetworks'
user_dir = '/users/' + username
working_dir = user_dir + '/' + dirname

# Create folder with PUT
working_url = dm_url + '/file' + working_dir  # URL of working folder
(resp, cont) = h.request(working_url,
                                   method='PUT', headers=header_default,
                                   body="{\"isDirectory\":true}")
response.append(resp)
content.append(cont)

# 2. Navigate to the following Public data folder: `Public > SharedData > Demos > Analysis Recipe 1`
# 3. Copy the following two files into your new working folder, e.g., _diff_subnetworks_:
#  * Normals_Leu.gct
#  * Normals_Leu.cls
fnames = ['Normals_Leu.gct', 'Normals_Leu.cls']
from_dir = '/users/SharedData/Demos/Analysis Recipe 1'
for f in fnames:
    from_url = from_dir + '/' + f
    to_url = working_url + '/' + f
    header_tmp = header_default.copy()
    header_tmp['x-gs-copy-source'] = from_url
    (resp, cont) = h.request(to_url,
                             method='PUT', headers=header_tmp)
    response.append(resp)
    content.append(cont)

#(resp_header, content) = h.request(tmp_url, method='GET', headers=header_default)
#(content_dec, n) = json_dec.raw_decode(content) # decode GSDirectoryListing JSON object


# Find subnetworks of differentially expressed genes and identify associated biological functions
1. Identify the top 50 differentially expressed genes in our dataset, using GenePattern:
  * Filter out genes which are not up- or down-regulated (PreprocessDataset)
  * Identify genes which can discriminate between conditions, e.g. normal vs. leukemic (ComparativeMarkerSelection).
  * Create a new file containing the differentially expressed genes, so that we can move data back into GenomeSpace (ExtractComparativeMarkerResults, SelectFileMatrix)
2. Identify protein-protein interaction subnetworks associated with these genes, using Cytoscape:
  * Identify connections between the differentially expressed genes, e.g. genetic interactions or protein-protein interactions, and examine functional annotation of the subnetworks (GeneMANIA)
  * Identify highly interconnected clusters within a network (MCODE)
  * Visualize the subnetworks of differentially expressed genes, and examine the functional annotation of these subnetworks

In [None]:
try:
    # Object "token" stores a cookie named "gs-token"; must be included with every subsequent http request
    token = {'Cookie': resp['set-cookie']}
except KeyError:
    print 'Error: Authentication failed.'

# Build default header for subsequent HTTP requests
header_default = token.copy()
header_default['Accept'] = 'application/json,text/plain'

# Setup JSON decoder
json_dec = json.JSONDecoder()

In [2]:
# !AUTOEXEC

task = gp.GPTask(gpserver, 'urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00020:5')
GPTaskWidget(task)

In [9]:
!curl -o BMID000000140222.xml http://www.ebi.ac.uk/biomodels-main/download?mid=BMID000000140222

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 9112k    0 9112k    0     0  1247k      0 --:--:--  0:00:07 --:--:-- 1756k
