In [3]:
import AlphaStream as mit

# Blueprints

***
### Things to remember:

> 1. When this window gets annoying, click the arrow pointing down to the left of "Blueprints".
> 2. AlphaStream functions will create files in your current directory.
> 3. Before using the functions, run the first block of code one cell above: `import AlphaStream as mit`
> 4. If you decide to use the `file_name` parameter, make sure it’s a string containing only letters, like `'thisisafilename'`. The filename will be time stamped to avoid overwriting files with the same name.
> 5. This version currently supports protein combinations only.
> 6. In `split`, `ace`, `rico` and `pair`, you can input the sequence of a protein as a string instead of its name. In the output file, the first 7 letters of the sequence will be used as the name for that protein.
> 7. Future versions will include:
>     - Parameters to include ligands, ions, DNA/RNA molecules etc.
>     - A function that lets you run the jobs an AlphaFold installed locally
***


## To get a sequence:
Use this to input the name of a protein in quotation marks (as a string) and get the sequence from UniProt:
```python
mit.get_uniprot_sequence("mam33")
```  

## SPLIT:
Function that takes one or more protein names as strings and makes AlphaFold predict each one separately.  
You can add a **list** of counts to change the count for every protein — the order must match the order of the protein names.  
  
**INPUT:**  
   - protein_names: one or more strings  
   - count: integer OR list of integers — must match the order of protein names  
   - file_name: string — use only letters

```python
mit.split("MGNVKVGIVLCDALDKGWEKKKKYPQNIVQLQDADGQLTERSVKIIARKTTLDGLHNIEQAKRHFNQVAADYYEACSVASKYETGIRNPVLGLNVGVPIATEGARALTPPVHWDLGKQDLGDADVEEVIIAELSKRQ")  
mit.split("YBL022C", count = 3)  
mit.split("YBL022C", "Pim1", count = [2,3], file_name = "DenkDirWasAus")
```


## ACE:
This function lets you make predictions for multiple proteins together.  
Function that takes one or more dictionaries with protein names and their respective counts.  
It creates a job for every dictionary. 
  
**INPUT:**  
   - protein_compounds: one or more dictionaries like this: `{"key": count, "anotherkey", count}`  
        - key: string — the protein name or sequence  
        - value: integer — the count  
   - file_name: string — use only letters
```python
mit.ace({"Pim1": 1, "YBL022C": 22}, {"whatever_you_want": 3, "something_else": 2}, file_name = "NowYouSeeMe")
```


## RICO:
this function lets you predict a combination of proteins, and it iterates over a range of protein amounts.  
So you pass *one dictionary* with proteins you want to predict together, and you give them each a range - a different amount of times the protein can appear in the structure.  
This function creates *combinations of the count ranges* in your dictionary.  
It creates a job for every combination.  
  
**INPUT:**  
   - One single dictionary like this:  
     `{ "ProteinOrSequence": range(1,3), "AnotherProtein": range(2,5) }`  
     **Important Note:** The last digit in `range` is not included. So `range(1,3)` gives 1 and 2.  
   - file_name: string — use only letters
```python
mit.rico({"Pim1": range(1,3), "fcyx": range(2,5)}, file_name = "RicoUndOscar")
```
And this example would create jobs for:  
1x Pim1 and 2x fcyx,  
1x Pim1 and 3x fcxy,  
1x Pim1 and 4x fcxy,  
2x Pim1 and 2x fcxy,  
2x Pim1 and 3x fcxy,  
2x Pim1 and 4x fcxy.

## PAIR:
This function takes a list of proteins (called *protein_subjects*)  
and as many additional proteins (called *protein_objects*) as you want.  
It pairs the protein_subject with each protein_object using the specified counts.

**INPUT:**  
   - protein_subjects: a list (e.g. `["Pim1", "fcyx"]`)  
   - protein_objects: any number of proteins as strings  
   - count: list of integers — must match the order of protein_objects (not for protein_subjects)
   - file_name: string — use only letters
```python
mit.pair(["Pim1", "fcyx"], "Mrpl15", "Mrpl20", count = [2, 3], file_name = "PairIsPronouncedPear")
```
The example above will give you jobs for:  
1×Pim1 + 1×fcyx + 2×Mrpl15, and  
1×Pim1 + 1×fcyx + 3×Mrpl20.  

**Note:** There is no count parameter for *protein_subjects*.  
If you want to have 2× Pim1 as a subject, include it twice like this: `["Pim1", "Pim1"]`, ...  





## SEARCH:
Once you've downloaded the AlphaFold results, place the result folder in the same directory as your scripts.  
You can then run the `search` function to extract the results into a DataFrame.
```python
mit.search("name_of_the_folder_with_the_results")
```

The resulting DataFrame will rank the downloaded files according to their **highest ranking-score**, along with its **ptm** and **iptm**  and some more values.  
The ranking score explained by AlphaFold:  
```
For ranking of the full complex use the ranking_score (higher is better). This score uses overall structure confidences (pTM and ipTM), but also includes terms that penalize clashes and encourage disordered regions not to have spurious helices – these extra terms mean the score should only be used to rank structures.

ranking_score: A scalar in the range [-100, 1.5] that can be used for ranking predictions, it incorporates ptm, iptm, fraction_disordered and has_clash into a single number with the following equation: 0.8 × ipTM + 0.2 × pTM + 0.5 × disorder − 100 × has_clash.
```

### Notes & Tips for SEARCH

- When downloading from the AlphaFold server, **select all jobs** (that belong together in your research) **using the checkboxes** — the files will be bundled into a single folder.  
  Use this folder name in the `search` function, once you moved it into your directory with the AlphaStream files.
  
- The last columns of the DataFrame correspond to the protein combination used — **but not by name** if you inputted a sequence.  
  They show the **first 7 letters** of the amino acid sequence instead.  


# Your Code:

In [5]:
# Message me if anything doesn't work