# Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (CESC)
[Jump to the urls to download the GCT and CLS files](#Downloads)

**Authors:** Marylu Villa,Alejandra Ramos and Edwin Juarez 
**Is this what you want your scientific identity to be?**  
**Contact info:** Email Edwin at [ejuarez@cloud.ucsd.edu](mailto:ejuarez@cloud.ucsd.edu) or post a question in http://www.genepattern.org/help

This notebook provides the steps to download all the CESC samples from The Cancer Genome Atlas (TCGA) contained in the Genomic Data Commons (GDC) Data portal. These samples can be downloaded as a GCT file and phenotype labels (primary tumor vs normal samples) can be downloaded as a CLS file. These files are compatible with other GenePattern Analyses.

![image.png](attachment:image.png)

# Overview

<p>Cervical cancer is a type of cancer that occurs in the cells of the cervix&nbsp; the lower part of the uterus that connects to the vagina.Various strains of the human papillomavirus (HPV), a sexually transmitted infection, play a role in causing most cervical cancer.</p>

<p>Endocervical Adenocarcinoma, Usual Type is the most common histological variant of cervical adenocarcinoma (constituting about 90% of the same). The classification of histological subtypes is based upon the appearance of cells when observed under a microscope by a pathologist</p>

<p>&nbsp;</p>

<p>&nbsp;</p>


![image.png](attachment:image.png)

## CESC Statistics

Cervical cancer encompasses several histologic types, of which squamous cell carcinoma (SCC) is the most common (70 percent) . The incidence of invasive cervical adenocarcinoma and its variants has increased dramatically over the past few decades; this cell type now accounts for about 25 percent of all invasive cervical cancers diagnosed in the United States (US).

Number of New Cases and Deaths per 100,000: The number of new cases of cervical cancer was 7.4 per 100,000 women per year. The number of deaths was 2.3 per 100,000 women per year. These rates are age-adjusted and based on 2011-2015 cases and deaths.

Lifetime Risk of Developing Cancer: Approximately 0.6 percent of women will be diagnosed with cervical cancer at some point during their lifetime, based on 2013-2015 data.

Prevalence of This Cancer: In 2015, there were an estimated 257,524 women living with cervical cancer in the United States.

![image.png](attachment:image.png)
https://seer.cancer.gov/statfacts/html/cervix.html

## Dataset's Demographic Information

<p>TCGA contained 309 CESC samples (3 primary cancer samples, and 304 normal tissue samples and the rest are ignored)&nbsp;&nbsp;from 304&nbsp;people. Below is a summary of the demographic information represented in this dataset. If you are interested in viewing the complete study, as well as the files on the GDC Data Portal, you can follow&nbsp;<a href="https://portal.gdc.cancer.gov/repository?facetTab=files&amp;filters=%7B%22op%22%3A%22and%22%2C%22content%22%3A%5B%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22cases.project.project_id%22%2C%22value%22%3A%5B%22TCGA-CESC%22%5D%7D%7D%2C%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22files.analysis.workflow_type%22%2C%22value%22%3A%5B%22HTSeq%20-%20Counts%22%5D%7D%7D%2C%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22files.experimental_strategy%22%2C%22value%22%3A%5B%22RNA-Seq%22%5D%7D%7D%5D%7D&amp;searchTableTab=files" target="_blank">this link.(these data were gathered on July 17th, 2018)</a></p>


![image.png](attachment:image.png)

# Login to GenePattern

<div class="alert alert-info">
<h3 style="margin-top: 0;"> Instructions <i class="fa fa-info-circle"></i></h3>

<ol>
    <li>Login to the *GenePattern Cloud* server.</li>
</ol>

</div>

In [43]:
# Requires GenePattern Notebook: pip install genepattern-notebook
import gp
import genepattern

# Username and password removed for security reasons.
genepattern.display(genepattern.session.register("https://cloud.genepattern.org/gp", "", ""))

# Downloading RNA-Seq HTSeq Counts Using TCGAImporter


Use the TCGAImporter module to download RNA-Seq HTSeq counts from the GDC Data Portal using a Manifest file and a Metadata file

<p><strong>Input files</strong></p>

<ul>
	<li><em>Manifest file:</em> a file containing the list of RNA-Seq samples to be downloaded.</li>
	<li><em>Metadata file:</em> a file containing information about the files present at the GDC Data Portal. Instructions for downloading the Manifest and Metadata files can be found here: <a href="https://github.com/genepattern/TCGAImporter/blob/master/how_to_download_a_manifest_and_metadata.pdf" target="_blank">https://github.com/genepattern/TCGAImporter/blob/master/how_to_download_a_manifest_and_metadata.pdf</a></li>
</ul>

<p><strong>Output files</strong></p>

<p>&nbsp;</p>

<ul>
	<li><em>CESC_TCGA.gct</em> - This is a tab delimited file that contains the gene expression&nbsp;(HTSeq&nbsp;counts) from the samples listed on the Manifest file. For more info on GCT files, look at reference <a href="#References">1</a><em> </em></li>
	<li><em><em>CESC_TCGA.cls</em> -</em> The CLS file defines phenotype labels (in this case Primary Tumor and Normal Sample) and associates each sample in the GCT file with a label. For more info on CLS files, look at reference <a href="#References">2</a></li>
</ul>


<div class="alert alert-info">
<h3 style="margin-top: 0;"> Instructions <i class="fa fa-info-circle"></i></h3>

<ol>
    <li>Load the manifest file in **Manifest** parameter.</li>
    <li>Load the metadata file in **Metadata** parameter.</li>
    <li>Click **run**.</li>
</ol>

</div>

<p><strong>Estimated run time for TCGAImporter</strong> : ~ 5 minutes</p>


In [44]:
tcgaimporter_task = gp.GPTask(genepattern.session.get(0), 'urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00369')
tcgaimporter_job_spec = tcgaimporter_task.make_job_spec()
tcgaimporter_job_spec.set_parameter("manifest", "https://cloud.genepattern.org/gp/users/marylu257/tmp/run868551132683601753.tmp/CESC_manifest.txt")
tcgaimporter_job_spec.set_parameter("metadata", "https://cloud.genepattern.org/gp/users/marylu257/tmp/run895752084419223965.tmp/CESC_metadata.json")
tcgaimporter_job_spec.set_parameter("output_file_name", "CESC_TCGA")
tcgaimporter_job_spec.set_parameter("gct", "True")
tcgaimporter_job_spec.set_parameter("translate_gene_id", "False")
tcgaimporter_job_spec.set_parameter("cls", "True")
genepattern.display(tcgaimporter_task)

job35208 = gp.GPJob(genepattern.session.get(0), 35208)
genepattern.display(job35208)

In [45]:
collapsedataset_task = gp.GPTask(genepattern.session.get(0), 'urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00134')
collapsedataset_job_spec = collapsedataset_task.make_job_spec()
collapsedataset_job_spec.set_parameter("dataset.file", "https://cloud.genepattern.org/gp/jobResults/35082/CESC_TCGA.gct")
collapsedataset_job_spec.set_parameter("chip.platform", "ftp://ftp.broadinstitute.org/pub/gsea/annotations/ENSEMBL_human_gene.chip")
collapsedataset_job_spec.set_parameter("collapse.mode", "Maximum")
collapsedataset_job_spec.set_parameter("output.file.name", "<dataset.file_basename>.collapsed")
genepattern.display(collapsedataset_task)

job35083 = gp.GPJob(genepattern.session.get(0), 35083)
genepattern.display(job35083)

# Downloads

<p>You can download the input and output files of TCGAImporter for this cancer type here:</p>

<p><strong>Inputs:</strong></p>

<p>&nbsp;</p>

<ul>
	<li><a href="https://datasets.genepattern.org/data/TCGA_HTSeq_counts/BLCA/BLCA_MANIFEST.txt" target="_blank">https://datasets.genepattern.org/data/TCGA_HTSeq_counts/CESC/CESC_MANIFEST.txt</a></li>
	<li><a href="https://datasets.genepattern.org/data/TCGA_HTSeq_counts/KIRP/KIRP_METADATA.json" target="_blank">https://datasets.genepattern.org/data/TCGA_HTSeq_counts/CESC/CESC_METADATA.json</a></li>
</ul>

<p><strong>Outputs:</strong></p>

<p>&nbsp;</p>

<ul>
	<li><a href="https://datasets.genepattern.org/data/TCGA_HTSeq_counts/KIRP/KIRP_TCGA.gct" target="_blank">https://datasets.genepattern.org/data/TCGA_HTSeq_counts/CESC/CESC_TCGA.gct</a></li>
	<li><a href="https://datasets.genepattern.org/data/TCGA_HTSeq_counts/KIRP/KIRP_TCGA.cls" target="_blank">https://datasets.genepattern.org/data/TCGA_HTSeq_counts/CESC/CESC_TCGA.cls</a></li>
</ul>


If you'd like to download similar files for other TCGA datasets, visit this link: 
- https://datasets.genepattern.org/?prefix=data/TCGA_HTSeq_counts/

# References

[1] http://software.broadinstitute.org/cancer/software/genepattern/file-formats-guide#GCT

[2] http://software.broadinstitute.org/cancer/software/genepattern/file-formats-guide#CLS

 [3] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5506840/</p>

 [4] https://www.uptodate.com/contents/invasive-cervical-adenocarcinoma</p>

 [5] https://www.mayoclinic.org/diseases-conditions/cervical-cancer/symptoms-causes/syc-20352501</p>

 [6] https://seer.cancer.gov/statfacts/html/cervix.html</p>

 [7] https://www.dovemed.com/diseases-conditions/endocervical-adenocarcinoma-usual-type/</p>
