# Use ellipticbn to measure the ellipticity of cucurbituril hosts


### Description

*ElliptiCBn is a collaboration between the Pluth and Harms labs at the University of Oregon*

Arman Garcia, Michael Shavlik PhD, Mike Harms PhD, Mike Pluth PhD

A manuscript describing the software is forthcoming.



#### ellipticbn performs the following steps

<img src="https://github.com/harmsm/ElliptiC/raw/main/images/pipeline_image.png" width="600px"/>

1. Extract the coordinates of all C, N, O, and H atoms from an xyz file.

2. Identify separate molecules by finding strongly-connected components.

3. Identify macrocycles using patterns of bonds, cycle connectivity and cycle size. 

5. Use a Principal Component Analysis to calculate the variance along both major axes of the central cycle.

6. Calculate ellipticity. This is done by two methods:

   A.  *pca_ellip*: $(V_{ax1}-V_{ax2})/V_{ax1}$ where $V_{ax1}$ is the PCA variance on the longest axis (length) and $V_{ax2}$ is the PCA variance on the second-longest axis (width).  

   B.  *orig_ellip*: Use the perimeter and largest carbon-to-centroid distance to infer ellipticity.

7. Generate outputs, which include annotated structures and a spreadsheet with ellipticities.

### Input

ElliptiCBn takes molecular structures in [XYZ format](https://en.wikipedia.org/wiki/XYZ_file_format). The first two lines are ignored. We assume the coordinates are in angstroms. XYZ files can be generated from other structure formats using software like [Open Babel](http://openbabel.org). 

In [None]:
#@title Set up environment

#@markdown This cell configures the computing environment to run ellipticbn. Run
#@markdown this cell before uploading an XYZ file.

#@markdown To run the cell, click the "Play" button to the left.

try:
    import google.colab
    RUNNING_IN_COLAB = True
except ImportError:
    RUNNING_IN_COLAB = False
except Exception as e:
    err = "Could not figure out if runnning in a colab notebook\n"
    raise Exception(err) from e

if RUNNING_IN_COLAB:
    !git clone https://github.com/harmslab/ellipticbn
    %cd ellipticbn
    %pip install . pyproject.toml 

# ------------------------------------------------------------------------------
# Imports

import ellipticbn as ec

import numpy as np
import pandas as pd

import shutil
import os
import inspect
import re

# Set these here in case the user does not run the cell below.
min_num_carbons = 10 
max_num_carbons = 20 
guest_search_radius = 3 
output_dir = "."
overwrite = False

# Note that we're getting get_macrocycles doc string because this has the
# relevant user-settable parameters for a colab-level user. If someone wants
# to set parameters to ec.run_all, they can run help on that within a
# python environment.
print("\nParameter descriptions\n")
doc_string = dict(inspect.getmembers(ec.get_macrocycles))["__doc__"]
doc_string = doc_string.split("Returns")[0]

lines = doc_string.split("\n")

print("\n".join(lines[7:]))

In [None]:
#@title Set calculation parameters

#@markdown The default parameters should work for most calculations, but you can
#@markdown edit them if necessary to pull out the correct macrocycles. After
#@markdown setting the parameters, press the "Play" button on the left before
#@markdown proceeding. Parameter descriptions are above.

min_num_carbons = 10 #@param
max_num_carbons = 20 #@param
guest_search_radius = 3 #@param


In [None]:
#@title Upload a file and run analysis

#@markdown You can either upload a single .xyz file OR a .zip file containing
#@markdown multiple .xyz files.

#@markdown Press the "Play" button do upload the file.


if RUNNING_IN_COLAB:

    f = google.colab.files.upload()
    filename = list(f.keys())[0]

    out_base = f"{filename}_out"
    if not os.path.isdir(out_base):
        os.mkdir(out_base)

    if filename.split(".")[-1].lower() == "zip":

        input_dir = ".".join(filename.split(".")[:-1])

        shutil.unpack_archive(filename,input_dir)

        match_pattern = re.compile("\.xyz",re.IGNORECASE)
        file_input = []
        for root, dir, files in os.walk(input_dir):
            for f in files:
                if f.startswith("._"):
                    continue
                if not match_pattern.search(f):
                    continue

                file_input.append(os.path.join(root,f))

        file_input.sort()

        output_dir = f"{input_dir}_output"

    else:

        input_dir = ".".join(filename.split(".")[:-1])
        file_input = [filename]
        output_dir = f"{input_dir}_output"

fig = ec.run_all(file_input,
                 min_num_carbons=min_num_carbons,
                 max_num_carbons=max_num_carbons,
                 guest_search_radius=guest_search_radius,
                 output_dir=output_dir,
                 overwrite=overwrite)

if fig is not None:
    fig.show()


In [None]:
#@title Download results

#@markdown To download the results, press the "Play" button on the left.
#@markdown The zip file will have a spreadsheet with the calculated
#@markdown ellipticities, as well as an html file showing the ellipses
#@markdown graphically.

if RUNNING_IN_COLAB:

    shutil.make_archive(base_name=output_dir,
                        format="zip",
                        base_dir=output_dir)
    print(f"Put results in {output_dir}.zip")

    google.colab.files.download(f"{output_dir}.zip")