# Retrieving a sample

Let's say that, following the previous tutorials, you have already submitted your samples. You're happy and comptent, life is good at your longevous 14 years of age.

However, a couple months later, you don't remember your sample's accessions. You probably saved the output somewhere, but computers are tricky and they usually delete/hide the stuff you totally saved in a safe location.

Well, you don't need to worry! Biosamples offers a search service based either on `free text` or `attribute` filtering, making it possible to retrieve your samples at any point, even if they're private to the general public (As long as you're using the same account to retrieve them).

In this notebook, we will try to retrieve a couple of samples by using both logics, just to show how it's done. We will use as an example samples from the MICROBE consortia.

## Setting up

As alwasy, we need to set-up a couple of things. Not that many this time, though!

In [2]:
from biobroker.authenticator import WebinAuthenticator # Biosamples uses the WebinAuthenticator
from biobroker.api import BsdApi # BioSamples Database (BSD) API

username = "" # Your username goes here
password = "" # Your password goes here
authenticator = WebinAuthenticator(username=username, password=password)

api = BsdApi(authenticator=authenticator)

2024-10-14 10:42:46,219 - BsdApi - INFO - Set up BSD API successfully: using base uri 'https://www.ebi.ac.uk/biosamples/samples'
Retrieving samples: [38;2;0;255;0m100%[39m (226/226) ðŸ€± Time:  0:00:09        
Retrieving samples: [38;2;0;255;0m100%[39m (77/77) ðŸ€± Time:  0:00:02          
2024-10-14 10:44:31,082 - BsdApi - INFO - Trying to retrieve sample with accession SAMEA115657829


## Using attributes

We will start loading the attributes. Let's say, from your samples, you remember that you set-up certain attributes; in this case, from the MICROBE samples, I remember setting up `project name`: `MICROBE`, `biome`: `soil`, and `center`: `HMGU`. Let's put that to search!

In [3]:
attributes_to_search = {
    'project name': 'MICROBE',
    'biome': 'soil',
    'center': 'HMGU'
}
my_samples = api.search_samples(attributes=attributes_to_search)

Attributes are always provided in a key:value pair manner. What happens behind the scenes is not that important, since the `BsdApi` object handles everything, but this dictionary is transformed into a query that is then requested to a BioSamples endpoint.

Please note that, depending on the number of samples, it may take a while to search. It is not displayed in the notebook, but I added a cool progress bar for impatient people! (For me, mostly).

Let's see how many samples we got, and a teaser of the content!

In [4]:
print(len(my_samples))
print(my_samples[0].entity)

226
{'characteristics': {'Effective(%)': [{'text': '98.9'}], 'Error(%)': [{'text': '0.01'}], 'GC(%)': [{'text': '64.58'}], 'Library_Flowcell_Lane': [{'text': 'MKDN240001513-1A_22GYCMLT3_L5'}], 'Q20(%)': [{'text': '97.75'}], 'Q30(%)': [{'text': '93.65'}], 'Raw data': [{'text': '5547457200'}], 'Raw reads': [{'text': '36983048'}], 'SRA accession': [{'text': 'ERS20120911'}], 'analysis date': [{'text': '2023-07-01T00:00:00Z'}], 'biome': [{'text': 'soil'}], 'biome.1': [{'text': 'soil'}], 'broad-scale environmental context': [{'text': 'temperate biome'}], 'center': [{'text': 'HMGU'}], 'checklist': [{'text': 'ERC000022'}], 'collection date': [{'text': '2023-05-01T00:00:00Z'}], 'cryoprotectant': [{'text': 'none'}], 'cultivation': [{'text': 'not provided'}], 'depth': [{'text': 'not provided'}], 'elevation': [{'text': 'not provided'}], 'environmental medium': [{'text': 'Bulk soil'}], 'freezing method': [{'text': 'not provided'}], 'geographic location (country and/or sea)': [{'text': 'Germany'}], 

##Â Using free text

Sometimes, unfortunately, you won't remember what attributes you set up on your samples to identify them (BAD SCIENTIST! BAD! no treats today)

For this, BioSamples also provides with a free text search. For more information, you can take a look at the [BioSamples guide](https://www.ebi.ac.uk/biosamples/docs/guides/search#_advanced_search) on what kind of advanced search tricks you can use to make it simpler.

For this, let's say that I remember that I put, somewhere `AEG19_23`. Let's make the query!

In [5]:
query = 'AEG19_23'
my_samples_free_text = api.search_samples(text=query)

In [6]:
print(len(my_samples_free_text))
print(my_samples_free_text[0].entity)

77
{'characteristics': {'Effective(%)': [{'text': '99.16'}], 'Error(%)': [{'text': '0.01'}], 'GC(%)': [{'text': '63.0'}], 'Library_Flowcell_Lane': [{'text': 'MKDN240001563-1A_22GYCMLT3_L5'}], 'Q20(%)': [{'text': '98.18'}], 'Q30(%)': [{'text': '94.85'}], 'Raw data': [{'text': '6378478200'}], 'Raw reads': [{'text': '42523188'}], 'SRA accession': [{'text': 'ERS20120838'}], 'analysis date': [{'text': '2023-07-01T00:00:00Z'}], 'biome': [{'text': 'soil'}], 'biome.1': [{'text': 'soil'}], 'broad-scale environmental context': [{'text': 'temperate biome'}], 'center': [{'text': 'HMGU'}], 'checklist': [{'text': 'ERC000022'}], 'collection date': [{'text': '2023-05-01T00:00:00Z'}], 'cryoprotectant': [{'text': 'none'}], 'cultivation': [{'text': 'not provided'}], 'depth': [{'text': 'not provided'}], 'elevation': [{'text': 'not provided'}], 'environmental medium': [{'text': 'Bulk soil'}], 'freezing method': [{'text': 'not provided'}], 'geographic location (country and/or sea)': [{'text': 'Germany'}], '

Sincerely, I really do not like the free text search. It doesn't really work as intended, with complex searches resulting most of the time in nothing at all (Either that or I am really stoopid, but... yeah probably is the second one).

In any case, I **always** recommend relying on attributes

##Â Using an accession

You can also retrieve the samples by using an accession; this is usually the easiest, and it's the function that **must be defined** for all the API entities.

In [7]:
my_sample = api.retrieve(['SAMEA115657829'])
print(len(my_sample))
print(my_sample[0].entity)

1
{'characteristics': {'Effective(%)': [{'text': '99.16'}], 'Error(%)': [{'text': '0.01'}], 'GC(%)': [{'text': '63.0'}], 'Library_Flowcell_Lane': [{'text': 'MKDN240001563-1A_22GYCMLT3_L5'}], 'Q20(%)': [{'text': '98.18'}], 'Q30(%)': [{'text': '94.85'}], 'Raw data': [{'text': '6378478200'}], 'Raw reads': [{'text': '42523188'}], 'SRA accession': [{'text': 'ERS20120838'}], 'analysis date': [{'text': '2023-07-01T00:00:00Z'}], 'biome': [{'text': 'soil'}], 'biome.1': [{'text': 'soil'}], 'broad-scale environmental context': [{'text': 'temperate biome'}], 'center': [{'text': 'HMGU'}], 'checklist': [{'text': 'ERC000022'}], 'collection date': [{'text': '2023-05-01T00:00:00Z'}], 'cryoprotectant': [{'text': 'none'}], 'cultivation': [{'text': 'not provided'}], 'depth': [{'text': 'not provided'}], 'elevation': [{'text': 'not provided'}], 'environmental medium': [{'text': 'Bulk soil'}], 'freezing method': [{'text': 'not provided'}], 'geographic location (country and/or sea)': [{'text': 'Germany'}], 'g

Please note you can **provide multiple accessions** as elements of the array