# Sample information and accessions

***

## Introduction

Once your samples have been sequenced or imported, it can be useful to match up the internal lane identifiers with the sample and supplier identifiers.  We can look at the relationship between lane and sample using `pf info` which will return values for:

  * Lane name
  * Sample name
  * Supplier name
  * Public name
  * Strain

Alternatively, you might want to know the EBI sample and submission numbers for a particular lane or sample.  To get this, you can use `pf accession` which will return:

  * Sample name
  * Sample accession
  * Lane name
  * Lane accession

For more information about EBI accession number format please see [www.ebi.ac.uk/ena/submit/read-data-format](https://www.ebi.ac.uk/ena/submit/read-data-format#accession_number_format).

In this section of the tutorial we will cover:

  * using `pf info` to get sample metadata
  * using `pf accession` to get sample accessions



***

## Exercise 3

**First, let's tell the system the location of our tutorial configuration file.**

In [None]:
export PF_CONFIG_FILE=$PWD/data/pathfind.conf

### Metadata

We can get the metadata associated with our lanes using `pf info`.

**Let's get the sample name that corresponds to lane 5477_6#1.**

In [None]:
pf info -t lane -i 5477_6#1

Here we can see that several pieces of metadata have been returned. One of these is the sample name: **Tw01_0055**.

**Now, let's get the sample names for all lanes associated with study 664.**

In [None]:
pf info -t study -i 664

We can write this information to file using the `-o` or `--outfile` option.

**Let's write our lane metadata to file.**

In [None]:
pf info -t study -i 664 -o

This has generated a new file "infofind.csv" which contains our comma-separated lane metadata.

In [None]:
cat infofind.csv

We can also give the output file a different name.

**Let's call the metadata file for study 667 "study_667_info.csv".**

In [None]:
pf info -t study -i 664 -o study_667_info.csv

This generates the file "study_667_info.csv" which contains our metadata.

In [None]:
cat study_667_info.csv

### Accessions

If available, we can also get the EBI raw sequence data and sample accessions for the lanes associated with study 664 using `pf accession`.

**Let's get the EBI accessions for all associated with study 664.**

In [None]:
pf accession -t study -i 664

As with `pf info` we can also write the output of `pf accession` to a comma-delimited file.

**Let's write the accessions associated with study 667 to a file called "study_667_accessions.csv".**

In [None]:
pf accession -t study -i 664 -o study_667_accessions.csv

This generates the file "study_667_accessions.csv" which contains our comma-separated accessions.

In [None]:
cat study_667_accessions.csv

Finally, we can get the EBI URLs to download the raw data using the `-f` or `--fastq` option. By default, these will be written to a file called "fastq_urls.txt".

**Let's get the URLs for downloading the FASTQ files for study 667 from the European Nucleodtide Archive (ENA).**

In [None]:
pf accession -t study -i 664 -f

This generated a file called "fastq_urls.txt" which contained the URLs to download the raw sequencing data, one URL per file.

In [None]:
cat fastq_urls.txt

***

## Questions

**Q1: What is the sample name that corresponds with lane 10050_2#1?**

**Q2: What is the lane name that corresponds with sample 2363STDY5509320?**

**Q3: What are the sample and lane names of the last lane in the file "data/lanes_to_search.txt".**  
_Hint: use `tail -1` to get the last line of the output_

**Q4: What are the sample and lane accessions for lane 10050_2#1?**

**Q5: What are the two URLs which can be used to download the FASTQ files for lane 10050_2#1 from the ENA?**

***

## What's next?

You can head back to [finding your data](finding-your-data.ipynb).

Otherwise, let's move on to looking at [analysis pipeline status](pipeline-status.ipynb).