|
6 | 6 | "id": "jMGcXXPabEN4"
|
7 | 7 | },
|
8 | 8 | "source": [
|
9 |
| - "# Uni-Fold Colab\n", |
| 9 | + "# Uni-Fold Notebook\n", |
10 | 10 | "\n",
|
11 |
| - "This Colab notebook provides an online runnable version of [Uni-Fold](https://github.com/dptech-corp/Uni-Fold/) for users to predict the structure of a protein, single chain or multimer, with custom settings.\n", |
| 11 | + "This notebook provides protein structure prediction service of [Uni-Fold](https://github.com/dptech-corp/Uni-Fold/) as well as [UF-Symmetry](https://www.biorxiv.org/content/10.1101/2022.08.30.505833v1). Predictions of both protein monomers and multimers are supported. The homology search process in this notebook is enabled with the [MMSeqs2](https://github.com/soedinglab/MMseqs2.git) server provided by [ColabFold](https://github.com/sokrypton/ColabFold). For more consistent results with the original AlphaFold(-Multimer), please refer to the open-source repository of [Uni-Fold](https://github.com/dptech-corp/Uni-Fold/), or our convenient web server at [Hermite™](https://hermite.dp.tech/).\n", |
12 | 12 | "\n",
|
13 |
| - "Thanks to [MMSeqs2](https://github.com/soedinglab/MMseqs2.git) and the server provided by [ColabFold](https://github.com/sokrypton/ColabFold), the homogeneous searching in this notebook is very fast and is comparable with the original AlphaFold(-Multimer). If you want more consistent results with the original AlphaFold(-Multimer), you can use the [full open source Uni-Fold](https://github.com/dptech-corp/Uni-Fold/), or the convenient web server at [Hermite™](https://hermite.dp.tech/).\n", |
14 |
| - "\n", |
15 |
| - "Please note that this Colab notebook is not a finished product and is provided as an early-access prototype. It is provided for theoretical modeling only and caution should be exercised in its use. \n", |
| 13 | + "Please note that this notebook is provided as an early-access prototype, and is NOT an official product of DP Technology. It is provided for theoretical modeling only and caution should be exercised in its use. \n", |
16 | 14 | "\n",
|
17 | 15 | "**Licenses**\n",
|
18 | 16 | "\n",
|
|
23 | 21 | "\n",
|
24 | 22 | "Please cite the following papers if you use this notebook:\n",
|
25 | 23 | "\n",
|
26 |
| - "* Jumper et al. \"[Highly accurate protein structure prediction with AlphaFold.](https://doi.org/10.1038/s41586-021-03819-2)\" Nature (2021)\n", |
27 |
| - "* Evans et al. \"[Protein complex prediction with AlphaFold-Multimer.](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1)\" biorxiv (2021)\n", |
28 |
| - "* Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. \"[ColabFold: Making protein folding accessible to all.](https://www.nature.com/articles/s41592-022-01488-1)\" Nature Methods (2022) \n", |
29 | 24 | "* Ziyao Li, Xuyang Liu, Weijie Chen, Fan Shen, Hangrui Bi, Guolin Ke, Linfeng Zhang. \"[Uni-Fold: An Open-Source Platform for Developing Protein Folding Models beyond AlphaFold.](https://www.biorxiv.org/content/10.1101/2022.08.04.502811v1)\" biorxiv (2022)\n",
|
30 | 25 | "* Ziyao Li, Shuwen Yang, Xuyang Liu, Weijie Chen, Han Wen, Fan Shen, Guolin Ke, Linfeng Zhang. \"[Uni-Fold Symmetry: Harnessing Symmetry in Folding Large Protein Complexes.](https://www.biorxiv.org/content/10.1101/2022.08.30.505833v1)\" bioRxiv (2022)\n",
|
31 |
| - "\n", |
| 26 | + "* Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. \"[ColabFold: Making protein folding accessible to all.](https://www.nature.com/articles/s41592-022-01488-1)\" Nature Methods (2022)\n", |
32 | 27 | "\n",
|
33 | 28 | "**Acknowledgements**\n",
|
34 | 29 | "\n",
|
35 |
| - "We thank [@sokrypton](https://twitter.com/sokrypton) for many helpful suggestions to this notebook.\n" |
| 30 | + "The model architecture of Uni-Fold is largely based on [AlphaFold](https://doi.org/10.1038/s41586-021-03819-2) and [AlphaFold-Multimer](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1). The design of this notebook refers directly to [ColabFold](https://www.nature.com/articles/s41592-022-01488-1). We specially thank [@sokrypton](https://twitter.com/sokrypton) for his helpful suggestions to this notebook.\n", |
| 31 | + "\n", |
| 32 | + "Copyright © 2022 DP Technology. All rights reserved." |
36 | 33 | ]
|
37 | 34 | },
|
38 | 35 | {
|
|
127 | 124 | "output_dir_base = \"./prediction\"\n",
|
128 | 125 | "os.makedirs(output_dir_base, exist_ok=True)\n",
|
129 | 126 | "\n",
|
130 |
| - "\n", |
131 | 127 | "def clean_and_validate_sequence(\n",
|
132 | 128 | " input_sequence: str, min_length: int, max_length: int) -> str:\n",
|
133 | 129 | " \"\"\"Checks that the input sequence is ok and returns a clean version of it.\"\"\"\n",
|
|
203 | 199 | "def add_hash(x,y):\n",
|
204 | 200 | " return x+\"_\"+hashlib.sha1(y.encode()).hexdigest()[:5]\n",
|
205 | 201 | "\n",
|
| 202 | + "jobname = 'unifold_colab' #@param {type:\"string\"}\n", |
206 | 203 | "\n",
|
207 | 204 | "sequence_1 = 'LILNLRGGAFVSNTQITMADKQKKFINEIQEGDLVRSYSITDETFQQNAVTSIVKHEADQLCQINFGKQHVVCTVNHRFYDPESKLWKSVCPHPGSGISFLKKYDYLLSEEGEKLQITEIKTFTTKQPVFIYHIQVENNHNFFANGVLAHAMQVSI' #@param {type:\"string\"}\n",
|
208 | 205 | "sequence_2 = '' #@param {type:\"string\"}\n",
|
209 | 206 | "sequence_3 = '' #@param {type:\"string\"}\n",
|
210 | 207 | "sequence_4 = '' #@param {type:\"string\"}\n",
|
211 | 208 | "\n",
|
| 209 | + "#@markdown Use symmetry group `C1` for default Uni-Fold predictions.\n", |
| 210 | + "#@markdown Or, specify a **cyclic** symmetry group (e.g. `C4``) and\n", |
| 211 | + "#@markdown the sequences of the asymmetric unit (i.e. **do not copy\n", |
| 212 | + "#@markdown them multiple times**) to predict with UF-Symmetry.\n", |
| 213 | + "\n", |
212 | 214 | "symmetry_group = 'C1' #@param {type:\"string\"}\n",
|
213 | 215 | "\n",
|
214 | 216 | "use_templates = True #@param {type:\"boolean\"}\n",
|
215 | 217 | "msa_mode = \"MMseqs2\" #@param [\"MMseqs2\",\"single_sequence\"]\n",
|
216 | 218 | "\n",
|
217 | 219 | "input_sequences = [sequence_1, sequence_2, sequence_3, sequence_4]\n",
|
218 | 220 | "\n",
|
219 |
| - "jobname = 'unifold_colab' #@param {type:\"string\"}\n", |
220 |
| - "\n", |
221 | 221 | "basejobname = \"\".join(input_sequences)\n",
|
222 | 222 | "basejobname = re.sub(r'\\W+', '', basejobname)\n",
|
223 | 223 | "target_id = add_hash(jobname, basejobname)\n",
|
|
1046 | 1046 | },
|
1047 | 1047 | "gpuClass": "standard",
|
1048 | 1048 | "kernelspec": {
|
1049 |
| - "display_name": "Python 3.8.10 ('ProteinMD')", |
| 1049 | + "display_name": "Python 3.8.10 64-bit", |
1050 | 1050 | "language": "python",
|
1051 | 1051 | "name": "python3"
|
1052 | 1052 | },
|
|
1056 | 1056 | },
|
1057 | 1057 | "vscode": {
|
1058 | 1058 | "interpreter": {
|
1059 |
| - "hash": "af92dc656850d97b5469b75c9ef2009aaa936e713f0093b069a7ff14eeb2ca8d" |
| 1059 | + "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1" |
1060 | 1060 | }
|
1061 | 1061 | }
|
1062 | 1062 | },
|
|
0 commit comments