# Use ElliptiCBn to measure the ellipticity of cucurbituril hosts

This software measures host ellipticity using the following method:

<img src="https://github.com/harmsm/ElliptiC/raw/main/images/pipeline_image.png" width="600px"/>

1. Extract the coordinates of all heavy (non-hydrogen) atoms from an xyz file.

2. Identify separate molecules by finding strongly-connected components.

3. Identify candidate hosts by filtering on aspect ratio, which differentiates between long, skinny molecules and short, fat molecules.

4. Identify the central macrocycle of each host by finding carbons that are not in bonds with oxygen. The cycle identity is further validated by the number of carbons and their connectivity.

5. Use a Principal Component Analysis to calculate the variance along both major axes of the host ring.

6. Calculate ellipticity. This is done by two methods:

   A.  *pca_ellip*: $(V_{ax1}-V_{ax2})/V_{ax1}$ where $V_{ax1}$ is the variance on the longest axis (length) and $V_{ax2}$​​ is the variance on the second-longest axis (width).  

   B.  *orig_ellip*: Use the perimeter and largest carbon-to-centroid distance to infer ellipticity.

7. Generate outputs, which include annotated structures and a spreadsheet with ellipticities.


*ElliptiCBn is a collaboration between the Pluth and Harms labs at the University of Oregon*

Arman Garcia, Michael Shavlik PhD, Mike Harms PhD, Mike Pluth PhD

For more details, please see the documentation:

https://github.com/harmslab/ElliptiC

A manuscript describing this approach is forthcoming.

In [1]:
#@title Set up environment.

#@markdown This cell configures the computing environment to run ElliptiCbn. Run
#@markdown this cell before uploading an XYZ file.

#@markdown To run the cell, click the "Play" button to the left.

try:
    import google.colab
    RUNNING_IN_COLAB = True
except ImportError:
    RUNNING_IN_COLAB = False
except Exception as e:
    err = "Could not figure out if runnning in a colab notebook\n"
    raise Exception(err) from e

if RUNNING_IN_COLAB:
    %pip install git+https://github.com/harmsm/ElliptiC
    #%pip install -q ElliptiCBn

# ------------------------------------------------------------------------------
# Imports

import ElliptiCBn as ec

import numpy as np
import pandas as pd

import shutil
import os
import inspect
import re

# Set these here in case the user does not run the cell below.
bond_dist = 2.5
aspect_ratio_filter = 3
oxygen_dist_cutoff = 2.9
min_num_carbons = 10
max_num_carbons = 20
min_cycle_cc_bond_length = 1.3
max_cycle_cc_bond_length = 1.7
output_dir = "."
overwrite = False

# Note that we're getting get_macrocycles doc string because this has the
# relevant user-settable parameters for a colab-level user. If someone wants
# to set parameters to ec.run_all, they can run help on that within a
# python environment.
print("\nParameter descriptions\n")
doc_string = dict(inspect.getmembers(ec.get_macrocycles))["__doc__"]
doc_string = doc_string.split("Returns")[0]

lines = doc_string.split("\n")

print("\n".join(lines[7:]))

Collecting git+https://github.com/harmsm/ElliptiC
  Cloning https://github.com/harmsm/ElliptiC to /tmp/pip-req-build-0o772w_6
  Running command git clone --filter=blob:none --quiet https://github.com/harmsm/ElliptiC /tmp/pip-req-build-0o772w_6
  Resolved https://github.com/harmsm/ElliptiC to commit e3ced33a251733c3991cadffa4438a1335a1f701
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: ElliptiCBn
  Building wheel for ElliptiCBn (setup.py) ... [?25l[?25hdone
  Created wheel for ElliptiCBn: filename=ElliptiCBn-1.2.1-py3-none-any.whl size=15345 sha256=814611d58494d5229ff859aea375c462c24a0c7a475654c8a142974e405f9fea
  Stored in directory: /tmp/pip-ephem-wheel-cache-7l4uef08/wheels/a0/d8/b3/5f0f4eee3692d9cb98579dc1d5cf042a4de699a60f0bd3702a
Successfully built ElliptiCBn
Installing collected packages: ElliptiCBn
Successfully installed ElliptiCBn-1.2.1

Parameter descriptions

    bond_dist : float, default=2.5
        any atoms closer than bond 

In [2]:
#@title Set calculation parameters

#@markdown The default parameters should work for most calculations, but you can
#@markdown edit them if necessary to pull out the correct macrocycles. After
#@markdown setting the parameters, press the "Play" button on the left before
#@markdown proceeding. Parameter descriptions are above.

bond_dist = 2.5 #@param
aspect_ratio_filter = 3 #@param
oxygen_dist_cutoff = 2.9 #@param
min_num_carbons = 10 #@param
max_num_carbons = 20 #@param
min_cycle_cc_bond_length = 1.3 #@param
max_cycle_cc_bond_length = 1.7 #@param


In [29]:
#@title Analyze a file.

#@markdown You can either upload a single .xyz file OR a .zip file containing
#@markdown multiple .xyz files.

#@markdown Press the "Play" button do upload the file.


if RUNNING_IN_COLAB:

    f = google.colab.files.upload()
    filename = list(f.keys())[0]

    out_base = f"{filename}_out"
    if not os.path.isdir(out_base):
        os.mkdir(out_base)

    if filename.split(".")[-1].lower() == "zip":

        input_dir = ".".join(filename.split(".")[:-1])

        shutil.unpack_archive(filename,input_dir)

        file_input = []
        for root, dir, files in os.walk(input_dir):
            for f in files:
                if f.startswith("._"):
                    continue
                if not match_pattern.search(f):
                    continue

                file_input.append(os.path.join(root,f))

        file_input.sort()

        output_dir = f"{input_dir}_output"

    else:

        file_input = [filename]
        output_dir = f"{input_dir}_output"

fig = ec.run_all(file_input,
                 bond_dist=bond_dist,
                 aspect_ratio_filter=aspect_ratio_filter,
                 oxygen_dist_cutoff=oxygen_dist_cutoff,
                 min_num_carbons=min_num_carbons,
                 max_num_carbons=max_num_carbons,
                 min_cycle_cc_bond_length=min_cycle_cc_bond_length,
                 max_cycle_cc_bond_length=max_cycle_cc_bond_length,
                 output_dir=output_dir,
                 overwrite=overwrite)

if fig is not None:
    fig.show()


Saving test.zip to test (4).zip
Analyzing test (4)/test/AXOCAU.xyz.
1 macrocycles identified.

Calculating ellipticities for 1 macrocycles.

Results:
    id  size  pca_ellip  orig_ellip
0  0.0    12   0.071895    0.075669

Saving plot to test (4)_output/AXOCAU.xyz.html
Analyzing test (4)/test/BATWAW.xyz.
2 macrocycles identified.

Calculating ellipticities for 2 macrocycles.

Results:
    id  size  pca_ellip  orig_ellip
0  0.0    10   0.037337    0.078137
1  1.0    10   0.045687    0.088511

Saving plot to test (4)_output/BATWAW.xyz.html
Analyzing test (4)/test/CADXOY.xyz.
2 macrocycles identified.

Calculating ellipticities for 2 macrocycles.

Results:
    id  size  pca_ellip  orig_ellip
0  0.0    16   0.180619    0.153017
1  1.0    16   0.099105    0.086870

Saving plot to test (4)_output/CADXOY.xyz.html
Analyzing test (4)/test/GAYSIL.xyz.
3 macrocycles identified.

Calculating ellipticities for 3 macrocycles.

Results:
    id  size  pca_ellip  orig_ellip
0  0.0    20   0.033643    0

In [35]:
#@title Download results.

#@markdown To download the results, press the "Play" button on the left.
#@markdown The zip file will have a spreadsheet with the calculated
#@markdown ellipticities, as well as an html file showing the ellipses
#@markdown graphically.

if RUNNING_IN_COLAB:

    shutil.make_archive(base_name=output_dir,
                        format="zip",
                        base_dir=output_dir)
    print(f"Put results in {output_dir}.zip")

    google.colab.files.download(f"{output_dir}.zip")

Put results in test (4)_output.zip


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>