# Welcome to the interactive Galaxy IPython Notebook.

You can access your data via the dataset number. Using a Python kernel, you can access dataset number 42 with ``handle = open(get(42), 'r')``.
To save data, write your data to a file, and then call ``put('filename.txt')``. The dataset will then be available in your galaxy history.
<br>When using a non-Python kernel, ``get`` and ``put`` are available as command-line tools, which can be accessed using system calls in R, Julia, and Ruby. For example, to read dataset number 42 into R, you can write ```handle <- file(system('get -i 42', intern = TRUE))```.
To save data in R, write the data to a file and then call ``system('put -p filename.txt')``.
Notebooks can be saved to Galaxy by clicking the large green button at the top right of the IPython interface.<br>
More help and informations can be found on the project [website](https://github.com/bgruening/docker-jupyter-notebook).

In [1]:
!pip install jsonapi_client

Collecting jsonapi_client
  Downloading jsonapi_client-0.9.9-py3-none-any.whl (33 kB)
Collecting aiohttp
  Downloading aiohttp-3.8.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 11.0 MB/s eta 0:00:01
Collecting async-timeout<5.0,>=4.0.0a3
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting charset-normalizer<4.0,>=2.0
  Downloading charset_normalizer-3.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (199 kB)
[K     |████████████████████████████████| 199 kB 62.2 MB/s eta 0:00:01
[?25hCollecting aiosignal>=1.1.2
  Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting multidict<7.0,>=4.5
  Downloading multidict-6.0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (121 kB)
[K     |████████████████████████████████| 121 kB 98.6 MB/s eta 0:00:01
[?25hCollecting frozenlist>=1.1.1
  Downloading frozenlist-1.4.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux

In [16]:
from jsonapi_client import Session
import pandas as pd
import requests
import os

study_accessions = ["MGYS00002220", "MGYS00002352", "MGYS00005656", "MGYS00002353", "MGYS00002266"]
#data_type = "Taxonomic assignments SSU"
data_type = "Phylum level taxonomies SSU"


data_output_folder = 'outputs/collection'
os.makedirs(data_output_folder, exist_ok=True)

for study_accession in study_accessions:

    print(study_accession)

    # get df of metadata and urls to data 
    with Session("https://www.ebi.ac.uk/metagenomics/api/v1") as mgnify:

        dfs = []
        for r in mgnify.iterate(f'studies/{study_accession}/downloads'):
            df = pd.json_normalize(r.json)
            df['url'] = str(r.links.self)
            dfs.append(df)

    try:
        # df of all data that can be downloaded for this study
        main_df = pd.concat(dfs)

    # get specific data table

        url = main_df.loc[main_df["attributes.description.label"] == data_type, "url"].iloc[0]
        response = requests.get(url)

        data_output_path = os.path.join(data_output_folder, f"{study_accession}.txt")
        with open(data_output_path, "w") as f:
            f.write(response.text)

        # add to galaxy
        put(data_output_path)
    
    except:
        print(f"Could not fetch data for: {study_accession} using the lable: {data_type}")

MGYS00002220
MGYS00002352
MGYS00005656
Could not fetch data for: MGYS00005656 using the lable: Phylum level taxonomies SSU
MGYS00002353
MGYS00002266
