# Importing expression data

We looked at how to create an `ExpressionProfile` in the previous section. But, most of the time, we are required to analyse data stored in CSV (Comma-Separated Values) or TSV (Tab-separated values) files. `ExpressionProfile` has tight integration with `pandas.DataFrame` thus easing the process of accessing from and storing to these files.

## Loading data from .csv files

In [1]:
from driven import ExpressionProfile

toy_data = ExpressionProfile.from_csv("toy_data.csv", index_col=0)
toy_data

Unnamed: 0,0 hr,2 hrs,4 hrs,0 hr 2 hrs p-value,0 hr 4 hrs p-value,2 hrs 4 hrs p-value
Gene A,0.1,0.2,0.3,0.1,0.4,0.3
Gene B,0.3,0.4,0.5,0.2,0.3,0.1
Gene C,0.8,0.9,1.0,0.3,0.2,0.1
Gene D,0.5,0.6,0.7,0.4,0.1,0.3


As you can see, we are passing the `index_col` argument of `pandas.read_csv()` due to the format of our .csv file. You can pass in the required arguments of `read_csv()`.

### Format of .csv file

Let's have a look in our `toy_data.csv` file:

| <empty> | 0 hr |2 hrs |4 hrs | p-value 0-2 | p-value 0-4 | p-value 2-4 |
| ------- | ---- | ---- | ---- | ----------- | ----------- | ----------- |
| Gene A  | 0.10 | 0.20 | 0.30 | 0.10        | 0.40        | 0.30        |
| Gene B  | 0.30 | 0.40 | 0.50 | 0.20        | 0.30        | 0.10        |
| Gene C  | 0.80 | 0.90 | 1.00 | 0.30        | 0.20        | 0.10        |
| Gene D  | 0.50 | 0.60 | 0.70 | 0.40        | 0.10        | 0.30        |

You can observe that we store the p-values of the data in columns which have "*p-value*" in their column name. This is **required** by our internal parser to fix the values correctly. Also an important point to notice is that we have the data in the file ordered in a specific way i.e., |col-1|, |col-2|, |col-3|, |col-1-2 p-value|, |col-1-3 p-value|, |col-2-3 p-value|. This order is **required** as the parsing takes place in a *combination* fashion. 

## Loading data from pandas.DataFrame

Now that we have an understanding of the .csv parsing, let's have a look at how to obtain the data from a `pandas.DataFrame`, since we mentioned that `pandas` is integrated pretty well.

In [2]:
df = toy_data.data_frame
df = df.drop(['0 hr 2 hrs p-value', '0 hr 4 hrs p-value', '2 hrs 4 hrs p-value'], axis=1)
modified_toy_data = ExpressionProfile.from_data_frame(df)
modified_toy_data

Unnamed: 0,0 hr,2 hrs,4 hrs
Gene A,0.1,0.2,0.3
Gene B,0.3,0.4,0.5
Gene C,0.8,0.9,1.0
Gene D,0.5,0.6,0.7


Here, we modified the `toy_model` and used that DataFrame to instantiate an `ExpressionProfile`. 