Skip to content

Commit

Permalink
merging current docs to main (#571)
Browse files Browse the repository at this point in the history
Co-authored-by: thetechnocrat-dev <josh.mcmenemy@openzyme.bio>
  • Loading branch information
acashmoney and thetechnocrat-dev committed Aug 4, 2023
1 parent d9f2a0b commit cc1a1fb
Show file tree
Hide file tree
Showing 5 changed files with 542 additions and 326 deletions.
11 changes: 11 additions & 0 deletions docs/docs/tutorials/protein-folding-nft-minting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
title: Minting ProofOfScience tokens
sidebar_label: ProofOfScience NFTs
sidebar_position: 3
---

import OpenInColab from '../../src/components/OpenInColab.js';

The following interactive notebook demo has been prepared to demonstrate minting ProofOfScience tokens using plex. It is an extension of the Protein Folding tutorial with a few additional modules appended at the end. To use the notebook, please visit the Google Colab link below.

<OpenInColab link="https://colab.research.google.com/drive/1312M2VOx_YpTFgy60ZYChgR9h3a7aorr?usp=sharing"></OpenInColab>
73 changes: 33 additions & 40 deletions docs/docs/tutorials/protein-folding.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,24 @@
---
title: Folding proteins with ColabFold
sidebar_label: Protein Folding
sidebar_position: 3
sidebar_position: 2
---

import OpenInColab from '../../src/components/OpenInColab.js';

<OpenInColab link="https://colab.research.google.com/drive/1AmxLoU5W2vYoi9KDw9IDoj3k4ijSCqoh?usp=sharing"></OpenInColab>
<OpenInColab link="https://colab.research.google.com/drive/1AfnJ50Ei4_9KXdKgexwdmEiwwDDXsfWJ?usp=sharing"></OpenInColab>

## Protein folding in silico

Protein folding is a crucial process in drug discovery. It helps research understand the 3D structure of experimental proteins and identify potential drug targets. With PLEX, predicting a protein's 3D structure from the amino acid sequence is streamlined and efficient.
In this tutorial, we perform protein folding with PLEX.

In this tutorial, we'll walk through an example of how to use PLEX to predict a protein's 3D structure using [ColabFold](https://www.nature.com/articles/s41592-022-01488-1).
There are multiple reasons we believe PLEX is a new standard for computational biology 🧫:
1. With a simple python interface, running containerised tools with your data is only a few commands away
2. The infrastructure of the compute network is fully open source - use the public network or work with us to set up your own node
3. Every event on the compute network is tracked - no more results are lost in an interactive compute session. You can base your decisions and publications on fully reproducible results.
4. We made adding new tools to the network as easy as possible - moving your favorite tool to PLEX is one JSON document away.

We'll walk through an example of how to use PLEX to predict a protein's 3D structure using [ColabFold](https://www.nature.com/articles/s41592-022-01488-1). We will use the sequence of the Streptavidin protein for this demo.

![img](../../static/img/protein-folding-graphic.png)

Expand Down Expand Up @@ -47,29 +53,19 @@ We'll download a `.fasta` file containing the sequence of the protein we want to


```python
!pip install requests

import requests

def download_file(url, directory, filename=None):
local_filename = filename if filename else url.split('/')[-1]
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(os.path.join(directory, local_filename), 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
return local_filename
!wget https://rest.uniprot.org/uniprotkb/P22629.fasta -O {dir_path}/P22629.fasta # Streptavidin
```

url = 'https://rest.uniprot.org/uniprotkb/P22629.fasta' # Streptavidin
--2023-08-01 21:39:21-- https://rest.uniprot.org/uniprotkb/P22629.fasta
Resolving rest.uniprot.org (rest.uniprot.org)... 193.62.193.81
Connecting to rest.uniprot.org (rest.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 264 [text/plain]
Saving to: ‘/content/project/P22629.fasta’

fasta_filepath = download_file(url, dir_path)
```
/content/project/P2 100%[===================>] 264 --.-KB/s in 0s

Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (2.27.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests) (2023.5.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests) (3.4)
2023-08-01 21:39:21 (144 MB/s) - ‘/content/project/P22629.fasta’ saved [264/264]


## Fold the protein
Expand All @@ -79,16 +75,14 @@ With the sequence downloaded, we can now use ColabFold to fold the protein.
```python
from plex import CoreTools, plex_create

sequences = [fasta_filepath]

initial_io_cid = plex_create(CoreTools.COLABFOLD_MINI.value, dir_path)
```

Plex version (v0.8.3) up to date.
Temporary directory created: /tmp/2604ada3-04ec-4d58-9ecc-1e65134c15674117000244
Plex version (v0.8.4) up to date.
Temporary directory created: /tmp/9ed8c638-c1b0-43da-bf92-7f054517d45c2889128719
Reading tool config: QmcRH74qfqDBJFku3mEDGxkAf6CSpaHTpdbe1pMkHnbcZD
Creating IO entries from input directory: /content/project
Initialized IO file at: /tmp/2604ada3-04ec-4d58-9ecc-1e65134c15674117000244/io.json
Initialized IO file at: /tmp/9ed8c638-c1b0-43da-bf92-7f054517d45c2889128719/io.json
Initial IO JSON file CID: QmUhysTE4aLZNw2ePRMCxHWko868xmQoXnGP25fKM1aofb

This code initiates the folding process. We'll need to run it to complete the operation.
Expand All @@ -99,29 +93,28 @@ from plex import plex_run
completed_io_cid, completed_io_filepath = plex_run(initial_io_cid, dir_path)
```

Plex version (v0.8.3) up to date.
Created working directory: /content/project/03ef6ae4-b2ff-424b-894c-05f8fbe48888
Initialized IO file at: /content/project/03ef6ae4-b2ff-424b-894c-05f8fbe48888/io.json
Plex version (v0.8.4) up to date.
Created working directory: /content/project/2ef79c16-6f59-4e44-aea7-c39db85280cb
Initialized IO file at: /content/project/2ef79c16-6f59-4e44-aea7-c39db85280cb/io.json
Processing IO Entries
Starting to process IO entry 0
Job running...
Bacalhau job id: ac42f8de-1fea-4e09-9644-75c940bdbd5c
Bacalhau job id: 476d232b-e1c6-42d6-b1c0-2f4d237244b1

Computing default go-libp2p Resource Manager limits based on:
- 'Swarm.ResourceMgr.MaxMemory': "6.8 GB"
- 'Swarm.ResourceMgr.MaxFileDescriptors': 524288

Applying any user-supplied overrides on top.
Run 'ipfs swarm limit all' to see the resulting limits.

Success processing IO entry 0
Finished processing, results written to /content/project/03ef6ae4-b2ff-424b-894c-05f8fbe48888/io.json
Finished processing, results written to /content/project/2ef79c16-6f59-4e44-aea7-c39db85280cb/io.json
Completed IO JSON CID: QmdnjMsUar6nTqGwgjCwN1Fyjaan4i3zyht9SE9L235YRm
2023/07/20 04:50:10 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.

## Viewing the results

After the job is complete, we can retrieve and view the results.
After the job is complete, we can retrieve and view the results. The state of each object is written in a JSON object. Every file has a unique content address.


```python
Expand Down Expand Up @@ -190,4 +183,4 @@ with open(completed_io_filepath, 'r') as f:

The output is a JSON file with information about the folded protein structures. This can be used for further analysis, visualization, and more.

<OpenInColab link="https://colab.research.google.com/drive/1AmxLoU5W2vYoi9KDw9IDoj3k4ijSCqoh?usp=sharing"></OpenInColab>
<OpenInColab link="https://colab.research.google.com/drive/1AfnJ50Ei4_9KXdKgexwdmEiwwDDXsfWJ?usp=sharing"></OpenInColab>

0 comments on commit cc1a1fb

Please sign in to comment.