<a href="https://colab.research.google.com/github/seek4science/stress-testing/blob/main/datafile_creation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install num2words
import num2words

Collecting num2words
  Downloading num2words-0.5.10-py3-none-any.whl (101 kB)
[?25l[K     |███▎                            | 10 kB 16.6 MB/s eta 0:00:01[K     |██████▌                         | 20 kB 21.9 MB/s eta 0:00:01[K     |█████████▊                      | 30 kB 27.3 MB/s eta 0:00:01[K     |█████████████                   | 40 kB 30.3 MB/s eta 0:00:01[K     |████████████████▏               | 51 kB 28.5 MB/s eta 0:00:01[K     |███████████████████▍            | 61 kB 31.1 MB/s eta 0:00:01[K     |██████████████████████▋         | 71 kB 30.6 MB/s eta 0:00:01[K     |█████████████████████████▉      | 81 kB 18.1 MB/s eta 0:00:01[K     |█████████████████████████████   | 92 kB 19.7 MB/s eta 0:00:01[K     |████████████████████████████████| 101 kB 8.3 MB/s 
Installing collected packages: num2words
Successfully installed num2words-0.5.10


Import the libraries so that they can be used within the notebook

* **requests** is used to make HTTP calls
* **json** is used to encode and decode strings into JSON
* **string** is used to perform text manipulation and checking
* **getpass** is used to do non-echoing password input

In [2]:
import requests
import json
import string
import getpass

The **base_url** holds the URL to the SEEK instance that will be used in the notebook

**headers** holds the HTTP headers that will be sent with every HTTP call

* **Content-type: application/vnd.api+json** - indicates that any data sent will be in JSON API format
* **Accept: application/vnd.api+json** - indicates that the notebook expects any data returned to be in JSON API format
* **Accept-Charset: ISO-8859-1** - indicates that the notebook expects any text returned to be in ISO-8859-1 character set

In [3]:
base_url = 'https://sandbox10.fairdomhub.org/'

headers = {"Content-type": "application/vnd.api+json",
           "Accept": "application/vnd.api+json",
           "Accept-Charset": "ISO-8859-1"}

Create a **requests** HTTP **Session**. A **Session** has re-usable settings such as **headers**

The **authorization** is username and password. The user is prompted for this information.

In [4]:
session = requests.Session()
session.headers.update(headers)
session.auth = (input('Username: '), getpass.getpass('Password: '))

Username: alson
Password: ··········


In [25]:
project_data = '''{
  "data": {
    "type": "projects",
    "attributes": {
      "discussion_links": [],
      "avatar": null,
      "title": "TBD",
      "description": "",
      "web_page": null,
      "wiki_page": null,
      "default_license": "CC-BY-4.0",
      "start_date": null,
      "end_date": null,
      "default_policy": {
        "access": "download",
        "permissions": [
          {
            "resource": {
              "id": "2",
              "type": "people"
            },
            "access": "manage"
          }
        ]
      },
      "members": [
        {
          "person_id": "2",
          "institution_id": "1"
        }
      ],
      "use_default_policy": true
    },
    "relationships": {
      "project_administrators": {
        "data": [
          {
            "id": "2",
            "type": "people"
          }
        ]
      },
      "people": {
        "data": [
          {
            "id": "2",
            "type": "people"
          }
        ]
      },
      "institutions": {
        "data": [
          {
            "id": "1",
            "type": "institutions"
          }
        ]
      }
    }
  }
}
'''

In [26]:
from num2words import num2words

from datetime import datetime


In [27]:
def create_project(datafile_count) :
  time = datetime.now().strftime("%m/%d/%Y, %H:%M:%S")
  project = json.loads(project_data)
  project['data']['attributes']['title'] = num2words(datafile_count).capitalize() + ' datafiles project at ' + time

  return(project)

In [28]:
def handle_project(datafile_count):
  r = session.post(base_url + 'projects', json=create_project(datafile_count))
  r.raise_for_status()
  j = r.json()
  print(j)
  project_id = j['data']['id']
  return (project_id)

In [29]:
%time handle_project(10)

HTTPError: ignored

In [None]:
study_data = '''{
  "data": {
    "type": "studies",
    "attributes": {
      "title": "TBD",
      "description": ""
    },
    "relationships": {
      "investigation": {
        "data": {
          "id": "1",
          "type": "investigations"
        }
      },
      "submitter": {
        "data": [
          {
            "id": "2",
            "type": "people"
          }
        ]
      }
    }
  }
}'''

In [None]:
assay_data = {
    "data": {
        "type": "assays",
        "attributes": {
            "title": "TBD",
            "assay_class": {
              "key": "EXP"
            },
            "assay_type": {
              "uri": "http://jermontology.org/ontology/JERMOntology#Transcriptomics"
            },
            "technology_type": {
              "uri": "http://jermontology.org/ontology/JERMOntology#RNA-Seq"
            }
        },
        "relationships": {
            "creators": {
                "data": [
                    {
                        "id": "2",
                        "type": "people"
                    }
                ]
            },
            "submitter": {
                "data": [
                    {
                        "id": "2",
                        "type": "people"
                    }
                ]
            },
            "study": {
               "data": {
                 "id": "TBD",
                  "type": "studies"
              }
            },
            "people": {
                "data": [
                    {
                        "id": "2",
                        "type": "people"
                    }
                ]
            }
        }
    }
}

In [None]:
from pprint import pprint

In [None]:
import copy


In [None]:
def handle_study(assay_count):
  r = session.post(base_url + 'studies', json=create_study(assay_count))
  r.raise_for_status()
  j = r.json()
  study_id = j['data']['id']

  singletons = []
  for i in range(1, assay_count + 1):
    s = assay_data.copy()
    s['data']['attributes']['title'] = 't_' + str(i)
    s['data']['relationships']['study']['data']['id'] = study_id

    singletons.append (copy.deepcopy(s))

  return (copy.deepcopy(singletons))

In [None]:
def post_assays(posts):
  for s in posts:
    r = session.post(base_url + 'assays', json=s)
    r.raise_for_status()

In [None]:
%time post_assays(handle_study (1))

In [None]:
%time post_assays(handle_study (20))

In [None]:
%time post_assays(handle_study (200))

In [None]:
%time post_assays(handle_study (2000))

In [None]:
def search_assays (a):
  r = session.get(base_url + 'search?search_type=assays&q=' + a)
  r.raise_for_status()

In [None]:
%time search_assays('t_3')