# Data Mining of Human Skin Microbiome from EBI-Metagenomics Portal

_Matin Nuhamunada_<sup>1*</sup>, _Gregorius Altius Pratama_<sup>1</sup>, _Setianing Wikanthi_<sup>2</sup>, and _Mohamad Khoirul_<sup>1</sup>

<sup>1</sup>Department of Tropical Biology, Universitas Gadjah Mada;   
Jl. Teknika Selatan, Sekip Utara, Bulaksumur, Yogyakarta, Indonesia, 55281;   

<sup>2</sup>Department of Agricultural Microbiology, Universitas Gadjah Mada;  

*Correspondence: [matin_nuhamunada@ugm.ac.id](mailto:matin_nuhamunada@mail.ugm.ac.id)

---
## Abstract
Human skin microbiome is unique to individuals in regards to many aspects, including behaviour, environment, and perhaps maybe genes. To understand more about the distribution of human skin microbiome across the globe, we compare several skin microbiome study available in the EBI-Metagenomic Portal. Study data was acquired using EBI-Metagenome API, and sample data was selected based on sex, location, and bodysite. The biological observation matrix from the analysis result of the selected samples were compared using MEGAN. 

### Keywords
Human Skin, Microbiome, EBI-Metagenome


## Import Python Modules

In [2]:
from pandas import DataFrame

try:
    from urllib import urlencode
except ImportError:
    from urllib.parse import urlencode

In [3]:
from jsonapi_client import Session, Filter

API_BASE = 'https://www.ebi.ac.uk/metagenomics/api/latest/'

In [4]:
import pycurl
import os, sys
import pandas as pd

## Get Study
https://www.ebi.ac.uk/metagenomics/projects/doExportDetails?searchTerm=skin&includingChildren=true&biomeLineage=root%3AHost-associated%3AHuman&search=Search

In [5]:
filename = 'data.csv'
print(filename)
if not os.path.isfile(filename):
    with open(filename, 'wb') as f:
        c = pycurl.Curl()
        c.setopt(c.URL, 'https://www.ebi.ac.uk/metagenomics/projects/doExportDetails?searchTerm=skin&includingChildren=true&biomeLineage=root%3AHost-associated%3AHuman&search=Search')
        c.setopt(c.WRITEDATA, f)
        c.perform()
        c.close()

data.csv
done


In [7]:
#Script ini untuk mengekstrak data tabel dari file CSV
df1 = pd.read_csv("data.csv")
print(df1)

     Study ID                                         Study Name  \
0   ERP104068  EMG produced TPA metagenomics assembly of the ...   
1   SRP002480  Gene-Environment Interactions at the Skin Surface   
2   ERP018577  Human skin bacterial and fungal microbiotas an...   
3   ERP022958  Impact of the Mk VI SkinSuit on skin microbiot...   
4   ERP019566  Longitudinal study of the diabetic skin and wo...   
5   ERP016629  Microbiome samples derived from Buruli ulcer w...   
6   ERP021525  Micromes on salmon skin and surrounding sea water   
7   SRP056364  Skin microbiome in human volunteers inoculated...   
8   ERP104518                  skin microbiota in infected frogs   
9   ERP104520                 Skin microbiota of Scinax alcatraz   
10  ERP104516  Variations on the diversity of amphibian skin ...   

    Number Of Samples Submitted Date  Analysis NCBI Project ID  \
0                  45     2017-11-15  Finished      PRJEB22388   
1                2560     2016-02-03  Finished     

In [32]:
df2 = df1.set_index("Study ID", drop = False)
#print(df2["Study Abstract"])
df3 = df2["Study ID"]
#print(df3)

In [31]:
for i in range(len(df3)):
    with Session(API_BASE) as s:
        study = s.get('studies', df3[i]).resource
        print('Study id:', study.id)
        print('Study name:', study.study_name)
        print('Study abstract:', study.study_abstract)
        for biome in study.biomes:
            print('Biome:', biome.biome_name, biome.lineage)
        print('_____________________________________________________________')

Study id: ERP104068
Study name: EMG produced TPA metagenomics assembly of the Raw reads of the microbiota of premature infant mouth, skin, and gut (human gut metagenome) data set
Study abstract: The human gut metagenome Third Party Annotation (TPA) assembly was derived from the primary whole genome shotgun (WGS) data set PRJNA327106. This project includes samples from the following biomes : Human gut.
Biome: Human root:Host-associated:Human
_____________________________________________________________
Study id: SRP002480
Study name: Gene-Environment Interactions at the Skin Surface
Study abstract: 16S rRNA gene sequences amplified from subjects with eczema and age-matched healthy controls.  Microbes living in and on humans are ten times more numerous than human cells. Culture-based methods have been the primary techniques used to study microbes inhabiting humans; however, many species are not successfully grown in culture. The NIH Roadmap for Medical Research Human Microbiome Project (

In [7]:
study1 = 'SRP002480'

### List samples with biomes for the given study

Get study: https://www.ebi.ac.uk/metagenomics/api/latest/studies/SRP002480

List samples: https://www.ebi.ac.uk/metagenomics/api/latest/studies/SRP002480/samples


Fetch samples for the given study accession: https://www.ebi.ac.uk/metagenomics/api/latest/samples?study_accession=SRP002480


In [75]:
df = DataFrame(columns=('sample name', 'lineage', 'sex', 'sample metadata', 'description'))
df.index.name = 'accession'

with Session(API_BASE) as s:
    params = {
        'study_accession': study1,
        'page_size': 100,
    }
    f = Filter(urlencode(params))
    for sample in s.iterate('samples', f):
        df.loc[sample.accession] = [
            sample.sample_name,
            sample.biome.id,
            sample.sample_metadata[0]["value"],
            sample.sample_metadata[1]["value"],
            sample.sample_desc
        ]
df

Unnamed: 0_level_0,sample name,lineage,sex,sample metadata,description
accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
SRS731971,MET0299,root:Host-associated:Human:Skin,38.984652 N 77.094709 W,USA,Human Skin Metagenome
SRS731972,MET0300,root:Host-associated:Human:Skin,38.984652 N 77.094709 W,USA,Human Skin Metagenome
SRS731973,MET0301,root:Host-associated:Human:Skin,38.984652 N 77.094709 W,USA,Human Skin Metagenome
SRS731974,MET0302,root:Host-associated:Human:Skin,38.984652 N 77.094709 W,USA,Human Skin Metagenome
SRS731975,MET0303,root:Host-associated:Human:Skin,38.984652 N 77.094709 W,USA,Human Skin Metagenome
SRS731976,MET0304,root:Host-associated:Human:Skin,38.984652 N 77.094709 W,USA,Human Skin Metagenome
SRS732091,MET0305,root:Host-associated:Human:Skin,38.984652 N 77.094709 W,USA,Human Skin Metagenome
SRS732092,MET0306,root:Host-associated:Human:Skin,38.984652 N 77.094709 W,USA,Human Skin Metagenome
SRS732093,MET0308,root:Host-associated:Human:Skin,38.984652 N 77.094709 W,USA,Human Skin Metagenome
SRS732094,MET0309,root:Host-associated:Human:Skin,38.984652 N 77.094709 W,USA,Human Skin Metagenome


In [72]:
#print(sample.accession)
#print(sample.analysis_completed)
#print(sample.as_resource_identifier_dict)
#print(sample.attributes)
#print(sample.biome)
#print(sample.biosample)
#print(sample.collection_date)
#print(sample.commit)
#print(sample.create_map)
#print(sample.delete)
#print(sample.dirty_fields)
#print(sample.environment_biome)
#print(sample.environment_feature)
#print(sample.environment_material)
#print(sample.fields)
#print(sample.geo_loc_name)
#print(sample.host_tax_id)
#print(sample.id)
#print(sample.is_dirty)
print(sample.sample_metadata[1])
print(sample.sample_metadata[0])
print(sample.)

{'value': 'inguinal crease', 'unit': None, 'key': 'body site'}
{'value': 'male', 'unit': None, 'key': 'sex'}
Non-tumor DNA sample from inguinal crease of a human male participant in the dbGaP study "Skin Microbiome in Disease States: Atopic Dermatitis and Immunodeficiency"


In [76]:
df.to_csv('List Sample'+study1+'.csv',index=True,header=True)

In [8]:
def get_metadata(metadata, key):
    import html
    for m in metadata:
        if m['key'].lower() == key.lower():
            value = m['value']
            unit = html.unescape(m['unit']) if m['unit'] else ""
            return "{value} {unit}".format(value=value, unit=unit)
    return None

depth_label = 'geographic location (depth)'
temp_label = 'temperature'
df = DataFrame(columns=('sample name', 'biome', 'temperature', 'depth', 'longitude', 'latitude'))
df.index.name = 'accession'

with Session(API_BASE) as s:
    params = {
        'study_accession': study1,
        'include': 'biome',
        'page_size': 100,
    }
    f = Filter(urlencode(params))
    for sample in s.iterate('samples', f):
        df.loc[sample.accession] = [
            sample.sample_name, sample.biome.id,
            get_metadata(sample.sample_metadata, temp_label),
            get_metadata(sample.sample_metadata, depth_label),
            sample.longitude, sample.latitude
        ]
df

Unnamed: 0_level_0,sample name,biome,temperature,depth,longitude,latitude
accession,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
SRS731971,MET0299,root:Host-associated:Human:Skin,,,,
SRS731972,MET0300,root:Host-associated:Human:Skin,,,,
SRS731973,MET0301,root:Host-associated:Human:Skin,,,,
SRS731974,MET0302,root:Host-associated:Human:Skin,,,,
SRS731975,MET0303,root:Host-associated:Human:Skin,,,,
SRS731976,MET0304,root:Host-associated:Human:Skin,,,,
SRS732091,MET0305,root:Host-associated:Human:Skin,,,,
SRS732092,MET0306,root:Host-associated:Human:Skin,,,,
SRS732093,MET0308,root:Host-associated:Human:Skin,,,,
SRS732094,MET0309,root:Host-associated:Human:Skin,,,,


In [None]:
https://www.ebi.ac.uk/metagenomics//projects/SRP002480/samples/SRS451457/runs/SRR919567/results/versions/2.0/taxonomy/OTU-table-HDF5-BIOM