<a href="https://colab.research.google.com/github/holehouse-lab/supportingdata/blob/master/other/cider_colab/localcider_kappa_calculator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CIDER colab notebook
This notebook lets you reproduce the basic sequence parameter analysis offered by [CIDER](http://pappulab.wustl.edu/CIDER/analysis/), and allows FASTA file upload as well as individual sequence analysis.

See the END of the notebook for more detailed instructions on how to use!

In [19]:
#@title Step 1: Setup
#@markdown Run this cell to setup the notebook (press the right-facing triangle on the left!)

from IPython.utils import io


print('Setting up the environment....')
with io.capture_output() as captured:
  !pip install localcider
  !pip install protfasta


  from google.colab import files
  import io
  from tqdm import tqdm

  import numpy as np
  import os
  from localcider import plots
  import protfasta

  from localcider import plots
  from localcider.sequenceParameters import SequenceParameters

Setting up the environment....


In [None]:
#@title Step 2: Upload and analysis
#@markdown <h1>Enter the amino acid sequence or upload a FASTA file!:</h1>

#@markdown *NB*: A sequence cannot be provided at the same time as a fasta file.

# define the function that will be called when the form is submitted
def process_form():
    uploaded = files.upload()

    # get filename
    try:
      # this ENSURES we overwrite an existing
      # file if it was there before...
      fn = list(uploaded.keys())[0]
      with open(fn,'wb') as fh:
        fh.write(uploaded[fn])
    except Exception:
      raise Exception('No file uploaded')

    # Load protein objects
    try:
      protein_objs = protfasta.read_fasta(fn, return_list=True, expect_unique_header=False, invalid_sequence_action='convert')
    except TypeError as e:
      print(f"Received TypeError: Perhaps you didn't upload a file?")
      raise e

    return protein_objs



sequence = "" #@param {type:"string"}
fasta_file = False #@param {type:"boolean"}


if fasta_file and len(sequence) > 0:
  raise Exception(f"Cannot specify both a sequence and a list of sequences from a fasta file simultaneously")

if fasta_file is False and len(sequence) == 0:
  raise Exception(f"Must specify either a sequence or a list of sequences from a fasta file")

if fasta_file:
    protein_objs = process_form()

else:
    protein_objs = [['input_sequence', sequence]]


# ANALYSIS from here on down

c = 0
outstring = 'localcider_properties.csv'
print('Calculating properties values...')
with open(outstring,'w') as fh:
  fh.write('FASTA ID, FCR, NCPR, hydrophobicity, kappa, disorder_promoting, sequence\n')
  for k in tqdm(protein_objs):
    og = k[0]
    seq = k[1]

    uid = og.replace(',',';') # remove commas so we can make a legit csv

    try:
      S = SequenceParameters(seq)
      fh.write(f"{uid}, {S.get_FCR()}, {S.get_NCPR()}, {S.get_uversky_hydropathy()}, {S.get_kappa()}, {S.get_fraction_disorder_promoting()}, {seq}\n")
      c = c + 1
    except Exception:
      print('ERROR on sequence {og}: {seq}')
      print('Skipping....')


files.download(outstring)
print(f'\nPredicted properties for {c} out of {len(protein_objs)} sequences!')


# How to use this notebook!
Colab notebooks execute code in your browser using Google's computing infrastructure. The notebook is made up of cells. Each cell executes a different bit of code. To use this notebook, you need to

1. Execute the setup cell (Step 1). This should take ~30 seconds to run but ONLY needs to be run once. Running a cell involves pressing the right-facing arrow on its left (i.e. a "play" button).

2. Then run the second cell, which takes in a FASTA file or a single sequence. This sequence (or sequences) will then be analyzed and a CSV file is downloaded with the various sequence properties.  You're done!

#### What is FASTA file?!
FASTA files are just text files with the following format:

```
>identifier 1
KNVPPGS......VTKEGWVATKKAR


>identifier 2
GQLKATV......ERRSRRSLPTSA

>identifier 3
MAWFSKK......PRIAVDTEPQAE

...

>identifier n
VPAASAS......GKKRKGAAALLA

```

Where the `...` just indicates sequence we're skipping over in this example.


Note that a text file should have the extension `.txt`. In MacOS, you can create a text file using the program TextEdit. To do this, create a new document, and then select Format -> Make plain text. You can now create a file the follows the formatting described above, and in this way upload multiple sequence at once.


## Changelog

### V1 (2024-02-28)
* Initial release

### V1.1 (2024-08-02)
* Added additional CIDER parameters
* Added ability to paste sequence or use a FASTA file