Skip to content

Commit

Permalink
Use bz2 to open counts file
Browse files Browse the repository at this point in the history
  • Loading branch information
jni committed Mar 18, 2019
1 parent 3b37839 commit ab8c0b4
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 7 deletions.
2 changes: 0 additions & 2 deletions Makefile
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -42,8 +42,6 @@ ipynb/ch7.ipynb: $(FIGURES)/optimization_comparison.png


ipynb/ch8.ipynb: data/dm6.fa ipynb/ch8.ipynb: data/dm6.fa


.SECONDARY: data/counts.txt data/dm6.fa data/dm6.fa.gz

data/counts.txt: data/counts.txt.bz2 data/counts.txt: data/counts.txt.bz2
bunzip2 -d -k -f data/counts.txt.bz2 bunzip2 -d -k -f data/counts.txt.bz2


Expand Down
7 changes: 4 additions & 3 deletions markdown/ch1.markdown
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -373,7 +373,7 @@ It allows us to express complex operations concisely and efficiently.
## Exploring a Gene Expression Dataset ## Exploring a Gene Expression Dataset


The dataset that we'll be using is an RNAseq experiment of skin cancer samples from The Cancer Genome Atlas (TCGA) project (http://cancergenome.nih.gov/). The dataset that we'll be using is an RNAseq experiment of skin cancer samples from The Cancer Genome Atlas (TCGA) project (http://cancergenome.nih.gov/).
We've already cleaned and sorted the data for you, so you can just use `data/counts.txt` We've already cleaned and sorted the data for you, so you can use `data/counts.txt.bz2`
in the book repository. in the book repository.
In Chapter 2 we will be using this gene expression data to predict mortality in skin cancer patients, reproducing a simplified version of [Figures 5A and 5B](http://www.cell.com/action/showImagesData?pii=S0092-8674%2815%2900634-0) of a [paper](http://dx.doi.org/10.1016/j.cell.2015.05.044) from the TCGA consortium. In Chapter 2 we will be using this gene expression data to predict mortality in skin cancer patients, reproducing a simplified version of [Figures 5A and 5B](http://www.cell.com/action/showImagesData?pii=S0092-8674%2815%2900634-0) of a [paper](http://dx.doi.org/10.1016/j.cell.2015.05.044) from the TCGA consortium.
But first we need to get our heads around the biases in our data, and think about how we could improve it. But first we need to get our heads around the biases in our data, and think about how we could improve it.
Expand All @@ -395,12 +395,13 @@ In later chapters we will see a bit more of pandas, but for details, read *Pytho
for Data Analysis* (O'Reilly) by the creator of pandas, Wes McKinney. for Data Analysis* (O'Reilly) by the creator of pandas, Wes McKinney.


```python ```python
import bz2
import numpy as np import numpy as np
import pandas as pd import pandas as pd


# Import TCGA melanoma data # Import TCGA melanoma data
filename = 'data/counts.txt' filename = 'data/counts.txt.bz2'
with open(filename, 'rt') as f: with bz2.open(filename, 'rt') as f:
data_table = pd.read_csv(f, index_col=0) # Parse file with pandas data_table = pd.read_csv(f, index_col=0) # Parse file with pandas


print(data_table.iloc[:5, :5]) print(data_table.iloc[:5, :5])
Expand Down
6 changes: 4 additions & 2 deletions markdown/ch2.markdown
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -93,12 +93,14 @@ As in Chapter 1, first we will use pandas to make our job of reading in the data
First we will read in our counts data as a pandas table. First we will read in our counts data as a pandas table.


```python ```python
import bz2
import numpy as np import numpy as np
import pandas as pd import pandas as pd


# Import TCGA melanoma data # Import TCGA melanoma data
filename = 'data/counts.txt' filename = 'data/counts.txt.bz2'
data_table = pd.read_csv(filename, index_col=0) # Parse file with pandas with bz2.open(filename, mode='rt') as f:
data_table = pd.read_csv(f, index_col=0) # Parse file with pandas


print(data_table.iloc[:5, :5]) print(data_table.iloc[:5, :5])
``` ```
Expand Down

0 comments on commit ab8c0b4

Please sign in to comment.