# Sample information and accessions

## Introduction

Once your samples have been sequenced or imported, it can be useful to match up the internal lane identifiers with the sample and supplier identifiers.  We can look at the relationship between lane and sample using `pf info` which will return values for:

  * Lane name
  * Sample name
  * Supplier name
  * Public name
  * Strain

Alternatively, you might want to know the EBI sample and submission numbers for a particular lane or sample.  To get this, you can use `pf accession` which will return:

  * Sample name
  * Sample accession
  * Lane name
  * Lane accession

For more information about EBI accession number format please see [www.ebi.ac.uk/ena/submit/read-data-format](https://www.ebi.ac.uk/ena/submit/read-data-format#accession_number_format).

You can also use pf to generate a spreadsheet with supplementary data, which can be useful for publication. `pf supplementary` will return:

  * Sample name
  * Sample accession
  * Lane name
  * Lane accession
  * Supplier name
  * Public name
  * Strain
  * Study ID
  * Study accession

Optionally, `pf supplementary` can also return the sample description.

In this section of the tutorial we will cover:

  * using `pf info` to get sample metadata
  * using `pf accession` to get sample accessions
  * using `pf supplementary` to get supplementary data.


## Exercise 3

**First make sure you're in the directory you created at the start, as described in the [index notebook](index.ipynb), before running the following commands to make it easier to remove everything at the end.**

### Metadata

We can get the metadata associated with our lanes using `pf info`.

**Let's take a look at the usage information for `pf info`.**

In [None]:
pf info -h

**Let's get the sample name that corresponds to lane 5477_6#1.**

In [None]:
pf info -t lane -i 5477_6#1

Here we can see that several pieces of metadata have been returned. One of these is the sample name: **Tw01_0055**.

**Now, let's get the sample names for all lanes associated with study 664.**

In [None]:
pf info -t study -i 664

We can write this information to file using the `-o` or `--outfile` option.

**Let's write our lane metadata to file.**

In [None]:
pf info -t study -i 664 -o

This has generated a new file "infofind.csv" which contains our comma-separated lane metadata.

In [None]:
cat infofind.csv

We can also give the output file a different name.

**Let's call the metadata file for study 664 "study_664_info.csv".**

In [None]:
pf info -t study -i 664 -o study_664_info.csv

This generates the file "study_664_info.csv" which contains our metadata.

In [None]:
cat study_664_info.csv

### Accessions

If available, we can also get the EBI raw sequence data and sample accessions for the lanes associated with study 664 using `pf accession`.

**Let's take a look at the usage information for `pf accession`.**

In [None]:
pf accession -h

**Let's get the EBI accessions for all lanes associated with study 664.**

In [None]:
pf accession -t study -i 664

As with `pf info` we can also write the output of `pf accession` to a comma-delimited file.

**Let's write the accessions associated with study 664 to a file called "study_664_accessions.csv".**

In [None]:
pf accession -t study -i 664 -o study_664_accessions.csv

This generates the file "study_664_accessions.csv" which contains our comma-separated accessions.

In [None]:
cat study_664_accessions.csv

### Supplementary data

We can get the supplementary data associated with our lanes using `pf supplementary`.

**Let's take a look at the usage information for `pf supplementary`.**

In [None]:
pf supplementary -h

**Let's get the supplementary data for all lanes associated with study 664.**

In [None]:
pf supplementary -t study -i 664

As with `pf info` and `pf accession` we can also write the output of `pf supplementary` to a comma-delimited file.

**Let's write the supplementary data associated with study 664 to a file called "study_664_supplementary.csv".**

In [None]:
pf supplementary -t study -i 664 -o study_664_supplementary.csv

This generates the file "study_664_supplementary.csv" which contains our comma-separated supplementary data.

In [None]:
cat study_664_supplementary.csv

Finally, we can include sample description in the supplementary information by using the `-d` or `--description` option. 

**Let's get the supplementary data for all lanes associated with study 664, including the sample description**

In [None]:
pf supplementary -t study -i 664 -d

## Questions

**Q1: What is the sample name that corresponds with lane 10018_1#1?**

In [None]:
# Enter your answer here

**Q2: What lane name(s) correspond with sample APP_T1_OP2?**

In [None]:
# Enter your answer here

**Q3: What are the sample names of the lanes in the file "lanes.txt" that you created in the introduction notebook. (If you have removed this file go back to the [introduction notebook](introduction.ipynb))**  
_Hint: you could use `awk` to get only the column you want

In [None]:
# Enter your answer here

**Q4: What are the sample and lane accessions for lane 5477_6#1?**

In [None]:
# Enter your answer here

## What's next?

You can head back to [finding your data](finding-your-data.ipynb).

Otherwise, let's move on to looking at [analysis pipeline status](pipeline-status.ipynb).