# <center>`sas_helper` for Protein Study with SAXS and SANS Techniques </center>

## Overview
This Jupyter notebook provides an introduction to two important techniques used in protein studies: Small-Angle X-ray Scattering (SAXS) and Small-Angle Neutron Scattering (SANS). Small-Angle Scattering (SAS) techniques are powerful tools used in the study of protein structure and dynamics in solution. These techniques provide valuable information about the size, shape, flexibility, and interactions of biological macromolecules.

SAS techniques involve measuring the scattering pattern produced when a beam of X-rays or neutrons interacts with a sample. By analyzing the scattering pattern, researchers can gain insights into the structural properties of proteins and other biomolecules.

The term "small-angle" refers to the range of scattering angles typically observed in SAS experiments. These angles correspond to scattering events that deviate only slightly from the direction of the incident beam. The scattered intensity at small angles contains information about the overall shape and size of the particles in the sample.

## Table of Contents
1. [Key Principles of SAXS and SANS](#key-principles)
2. [Main differences between SAXS and SANS](#differences)
3. [Output: Intensity curve I(Q) and Fourier Trasformation](#output-saxs)
4. [Sample Measurement in Solution for SAXS and SANS Studies](#sample-solution)
5. [Protein Conformation Modeling](#protein-modelling)
6. [sas_helper Module Overview](#sashelper-module-overview)

## Key principles of SAXS and SANS <a class="anchor" id="key-principles"></a>

### SAXS (Small-Angle X-ray Scattering)
SAXS is a technique used to study the shape, size, and structural properties of biological macromolecules such as proteins. Here are the key principles:

- **Measurement**: In SAXS, a beam of X-rays is directed at a sample containing proteins in solution. The scattered X-rays are then collected and analyzed.

- **Intensity**: The intensity of the scattered X-rays is measured as a function of the scattering angle, typically in the range of 0.1° to 10°. The scattered intensity provides information about the distribution of electron density within the protein sample.

- **Output**: The output of a SAXS experiment is a plot of the scattered intensity $I(\theta)$ as a function of the scattering angle $\theta$. This plot is known as the SAXS curve or scattering profile.

- **Equation**: The intensity $I(\theta)$ in SAXS is given by the Guinier equation, which describes the low-angle scattering behavior:


$$ I(\theta) = I_0 \cdot \exp\left(-R_g^2 \cdot Q^2 / 3\right) $$


where:
- $I_0$: Intensity at $Q = 0$ (intercept on the y-axis)
- $R_g$: Radius of gyration, a measure of the protein's overall size
- $Q$: Scattering vector, given by $Q = \frac{4\pi}{\lambda} \sin{\left(\frac{\theta}{2}\right)}$
- $\lambda$: Wavelength of the incident X-rays

### SANS (Small-Angle Neutron Scattering) 
SANS is a complementary technique to SAXS, which uses neutrons instead of X-rays. Here are the key principles:

- **Measurement**: In SANS, a beam of neutrons is directed at the protein sample in solution. The scattered neutrons are then detected and analyzed.

- **Intensity**: The intensity of the scattered neutrons is measured as a function of the scattering angle, similar to SAXS. The scattered intensity provides information about the distribution of nuclear density within the protein sample.

- **Output**: The output of a SANS experiment is also a plot of the scattered intensity $I(\theta)$ as a function of the scattering angle $\theta$, similar to SAXS.

- **Equation**: The intensity $I(\theta)$ in SANS is described by the Debye equation, which relates the scattering to the Fourier transform of the radial distribution function:

$$ I(\theta) = I_0 \cdot \frac{1}{V} \cdot (\Delta \rho \cdot F(Q))^2 \cdot S(Q) $$

  - $I_0$: Intensity at $Q = 0$ (intercept on the y-axis)
  - $V$: Volume of the scattering particle
  - $\Delta \rho$: Contrast, the difference in scattering length densities between the particle and the solvent
  - $F(Q)$: Form factor, which describes the shape of the particle
  - $S(Q)$: Structure factor, which accounts for interparticle interactions

## Main Differences between SAXS and SANS <a class="anchor" id="differences"></a>

- **Contrast**: SAXS utilizes the contrast in electron density between the protein and solvent, while SANS relies on the contrast in scattering length density. This difference in contrast mechanisms allows for different types of information to be obtained from the two techniques.

- **Deuteration**: SANS is sensitive to hydrogen atoms and can provide detailed information about their positions using deuterium labeling. SAXS, in contrast, is not sensitive to hydrogen atoms and primarily provides information about the overall shape and size of the protein.

- **Resolution**: SANS generally provides higher-resolution information compared to SAXS. Neutrons used in SANS have a longer wavelength, allowing for the study of finer details and atomic positions in protein structures. SAXS, with shorter-wavelength X-rays, provides lower-resolution information. SANS measurements, on the other hand, necessitate taking into account resolution convolution to accommodate the effects of the instrumental resolution function.

- **Damage**: X-rays used in SAXS can cause radiation damage to proteins, potentially leading to conformational changes or degradation. Neutrons used in SANS have a lower damaging effect, enabling longer exposure times and the study of delicate or radiation-sensitive samples.

- **Atomic Structure Factors**: SAXS measures the electron density distribution, related to the magnitude of the atomic structure factors. SANS measures the nuclear density distribution, related to the magnitude of the nuclear structure factors. This difference in scattering origin contributes to the complementary nature of SAXS and SANS in protein studies.

These differences make SAXS and SANS valuable tools for investigating different aspects of protein structure and dynamics, providing complementary information about protein conformations, overall shape, and interactions at different levels of detail.

## Output: Intensity Curve $I(Q)$ and Fourier Transformation <a class="anchor" id="output-saxs"></a>

The output of both SAXS and SANS experiments is the intensity curve, I(Q), where Q represents the magnitude of the scattering vector. Here's some information about the intensity curve and the Fourier transformation involved:

- **Intensity Curve $I(Q)$**: The intensity curve, $I(Q)$, represents the scattered intensity as a function of the magnitude of the scattering vector, $Q$. The scattering vector, $Q$, is related to the scattering angle, $\theta$, and the incident wavelength of the X-rays or neutrons. The intensity curve provides valuable information about the distribution of scattering centers within the protein sample.

- **Scattering Vector**: The scattering vector, $Q$, is defined as $Q = \frac{4\pi}{\lambda} \cdot \sin{\frac{\theta}{2}}$, where $\lambda$ is the wavelength of the incident radiation and $\theta$ is the scattering angle. By varying the scattering angle, the intensity curve is obtained over a range of $Q$ values, providing different levels of information about the protein structure at different length scales.

- **Reciprocal Space**: In SAXS and SANS, the intensity curve, $I(Q)$, is related to the scattering pattern in reciprocal space. Reciprocal space represents the Fourier transform of real space, where the intensity at each point in the intensity curve corresponds to a specific spatial frequency or distance in the protein structure.

- **Fourier Transformation**: The Fourier transformation is a mathematical operation used to convert the intensity curve from direct space (scattering pattern) to reciprocal space. This transformation allows for the extraction of structural information, such as the overall size, shape, and arrangement of scattering centers in the protein sample.

- **Structural Analysis**: By analyzing the intensity curve in reciprocal space, various structural parameters can be derived. This includes the radius of gyration $(R_g),$ which characterizes the overall size of the protein, and the pair-distance distribution function $(P(r)),$ which provides information about the distances between scattering centers in the protein.

The Fourier transformation from direct space to reciprocal space enables the extraction of valuable structural information from the intensity curve in SAXS and SANS experiments. This information plays a crucial role in understanding the size, shape, and arrangement of proteins in solution.

## Sample Measurement in Solution for SAXS and SANS Studies <a class="anchor" id="sample-solution"></a>

Both SAXS and SANS techniques are well-suited for studying proteins in solution, allowing for the investigation of their structural properties under more biologically relevant conditions. Here's some information about the sample measurement in solution:

- **Non-crystalline Samples**: Unlike X-ray crystallography, where proteins are typically crystallized, SAXS and SANS experiments are performed on proteins in their native or near-native states, which are often maintained in solution. This enables the study of proteins in more physiologically relevant conformations.

- **Flexibility and Conformational Heterogeneity**: Proteins in solution can adopt a range of conformations, including multiple flexible or dynamic states. SAXS and SANS are particularly well-suited for characterizing such conformational heterogeneity, providing information about the ensemble of conformations present in solution.

- **Size and Shape Analysis**: SAXS and SANS measurements in solution provide valuable information about the overall size and shape of proteins, including their radius of gyration and molecular weight. This is especially relevant for proteins that may undergo conformational changes or exist as multi-domain or multi-subunit complexes.

- **Solvent Effects**: SAXS and SANS experiments are sensitive to the presence of solvent molecules surrounding the protein, allowing for the study of solvent-accessible regions and interactions. These techniques provide insights into protein-solvent interactions, hydration, and the influence of the surrounding environment on protein structure and dynamics.

- **Solution Conditions**: The choice of solution conditions, such as pH, temperature, ionic strength, and the presence of ligands or co-factors, can be carefully controlled during SAXS and SANS measurements. This enables the investigation of how different solution conditions affect the protein's structure and dynamics.

Overall, the ability to study proteins in solution using SAXS and SANS provides important insights into their conformational flexibility, size, shape, and interactions under conditions that more closely mimic their natural biological environments.

## Protein Conformation Modeling <a class="anchor" id="protein-modelling"></a>

Protein conformation modeling is a crucial step in understanding the three-dimensional structure and dynamics of proteins. Here, we will focus on two techniques for protein conformation modeling: NOLB and RRT.

### NOLB (Non-Linear rigid Block NMA approach)

- **Overview**: NOLB is a new conceptually simple and computationally efficient method for non-linear normal mode analysis in protein conformation modeling. It aims to capture the complex non-linear dynamics of proteins by introducing a block-wise approximation scheme.

- **Methodology**: it divides the protein into rigid blocks that can move independently, providing a more realistic representation of the protein's conformational space. It leverages this block-wise approximation to study the collective motions and fluctuations of proteins, going beyond the linear harmonic motion assumption of traditional NMA methods.

- **Advantages**: it offers several advantages in protein conformation modeling:
    - Conceptual Simplicity: it provides a straightforward and intuitive approach to model the non-linear dynamics of proteins.
    - Computational Efficiency: by considering the rigid blocks, NOLB reduces the computational cost compared to traditional NMA methods.
    - Accurate Representation: the block-wise approximation in NOLB captures essential protein dynamics and provides insights into conformational changes.


### RRT (Rapidly-exploring Random Tree Sampling)

- **Overview**: The Rapidly Exploring Random Tree (RRT) method is a powerful algorithm commonly used for exploring high-dimensional configuration spaces. It has been adapted for protein conformation calculations to efficiently sample and navigate the conformational landscape.

- **Methodology**: In the context of protein conformations, RRT constructs a tree-like structure by iteratively expanding random conformations. Starting from an initial conformation, RRT generates new conformations by perturbing the current conformation within defined constraints. These perturbed conformations are then added to the tree, progressively exploring the conformational space.

- **Advantages**: The RRT method offers several advantages for protein conformation calculations:
    - Efficient Sampling: RRT rapidly explores the conformational landscape, allowing for the efficient sampling of diverse protein conformations.
    - Constraint Satisfaction: RRT incorporates constraints, such as bond lengths and angles, to ensure the generated conformations are physically realistic.
    - Scalability: RRT is well-suited for high-dimensional configuration spaces, making it applicable to large proteins and complex molecular systems.

RRT has been successfully applied in various protein-related tasks, including protein folding, structure prediction, and ligand binding studies. Its ability to efficiently navigate the conformational space and generate diverse conformations makes it a valuable tool for exploring protein structures and dynamics.

Although molecular dynamics (MD) simulations are another technique commonly used for protein conformation modeling, it is important to note that MD simulations can be time-consuming and require significant input from the user. Due to the focus of this discussion, we have specifically covered NOLB and RRT techniques, which offer efficient approaches for exploring protein conformational space and generating ensembles.

## `sas_helper` Module Overview <a class="anchor" id="sashelper-module-overview"></a>

The `sas_helper` module is a powerful tool that provides various functionalities to assist users in protein structure analysis and visualization. Here are the key features of the module:

- **Visualizing PDB Files**: The `sas_helper` module utilizes the `nglview`[1] and `pytraj`[2] libraries/modules for visualizing PDB (Protein Data Bank) files. These libraries enable interactive and dynamic visualization of protein structures, helping users gain insights into the molecular architecture.

- **SAXS and SANS Profile Calculation**: The `sas_helper` module provides the capability to compute Small-Angle X-ray Scattering (SAXS) and Small-Angle Neutron Scattering (SANS) profiles. This functionality is particularly useful for training purposes and exploring the impact of different parameters on the intensity of the scattering signal. By inputting a protein structure, users can generate SAXS and SANS profiles and examine how the intensity evolves with varying parameters.

- **Fitting SAXS/SANS Data**: With the help of the `FoXS`[3], `Pepsi-SAXS`[4] and `Pepsi-SANS`[5] tools, the `sas_helper` module facilitates fitting experimental SAXS and SANS data. This fitting process enables researchers to compare experimental data with theoretical models and obtain structural insights.

- **Multi-Model Fitting**: The `sas_helper` module supports multi-model fitting using the `multi_foxs` tool [3]. This approach allows users to fit multiple experimental datasets simultaneously, considering different structural models, and explore protein conformational ensembles.

By leveraging the `NOLB`[6] and `rrt_sample`[3] techniques, the `sas_helper` module offers efficient and powerful tools for protein conformation modeling. These tools enable researchers to explore the conformational space of proteins, generate ensembles, and study their dynamics.

Through the integration of various libraries and modules, including `FoXS`[3], `Pepsi-SAXS`[4], and `Pepsi-SANS`[5], the `sas_helper` module simplifies the fitting process for SAXS and SANS data. Researchers can perform data analysis and structural modeling without the need to install separate software packages or become familiar with different interfaces.

Overall, the `sas_helper` module provides a user-friendly and comprehensive solution for scientists working with protein structural data. It streamlines the analysis and visualization processes, empowering researchers to gain valuable insights into protein structure and dynamics.

**References:**

[1] NGL Viewer (http://nglviewer.org)

[2] PyTraj (https://amber-md.github.io/pytraj/latest/index.html)

[3] IMP (https://github.com/salilab/imp)

[4] Pepsi-SAXS (https://team.inria.fr/nano-d/software/pepsi-saxs/)

[5] Pepsi-SANS (https://team.inria.fr/nano-d/software/pepsi-sans/)

[6] NOLB - Normal Modes (https://team.inria.fr/nano-d/software/nolb-normal-modes/)



# Get started

To get started, import the necessary module and set up the back-end for interactive plotting.

```python
from sas_helper import *
%matplotlib notebook
```

Now you're ready to work with the sas_helper module and create interactive plots!

In [1]:
from sas_helper import *
%matplotlib notebook



# Prepare PDB file

Before you begin, make sure you have the necessary PDB file.

If you already have the PDB file, you can upload it to the folder with your Jupyter notebook environment by clicking on the "Upload" button and selecting the file.

Alternatively, you can use the `download_pdb` function to download the PDB file directly from the Protein Data Bank (PDB) website: [https://www.rcsb.org](https://www.rcsb.org). 

To download a PDB file, first find the specific `.pdb` file you need, which is identified by its unique key name. Then, specify the link in the `download_pdb` function as `https://files.rcsb.org/download/key-name`.

To download a PDB file, use the following command:

```python
download_pdb(link, pdb_file)
``` 

In this example, the first argument is the link to the PDB file on the website, and the second argument is the preferred filename for the downloaded PDB file.

Once you have the PDB file in the same folder as your Jupyter notebook, you can proceed with your analysis.

Below you will find an example of how to download a PDB file.

In [2]:
download_pdb('https://files.rcsb.org/download/8P9G.pdb', '8p9g.pdb')

File 8p9g.pdb has been successfully downloaded


# Protein visualisation

To visualize PDB files in your Jupyter notebook, you can use the `show_pdb` command.

To display a PDB file, use the following command:

```python
show_pdb(pdb_file)
```
Replace pdb_file with the string name of the PDB file you want to visualize.

After executing the show_pdb command, you will see an interactive 3D visualization of the protein structure. You can rotate the protein, zoom in and out, and explore different parts of the model.

Additionally, helpful hints will be displayed, providing information such as the number of chains, atoms, and more.

Note: Ensure that you have the necessary PDB file and the `sas_helper.py` file in the same folder as your Jupyter notebook before executing the show_pdb command.

In [3]:
show_pdb('8p9g.pdb')

NGLWidget()

# Calculation of SAXS Profiles

To calculate SAXS profiles, you can use either the `foxs` or `Pepsi-SAXS` software. There is no critical difference between them, except for the intensity scale.

In Python, you can use the `saxs_profile` function to calculate SAXS profiles for a given PDB file. By default, the function uses the `foxs` software. 

To calculate the SAXS profile using the `foxs` software, use the following command:

```python
saxs_profile(pdb_file)
```
or alternatively
```python
saxs_profile(pdb_file, core="foxs")
```


To calculate the SAXS profile using the `Pepsi-SAXS` software, use the following command:
```python
saxs_profile(pdb_file, core="pepsisaxs")
```
Replace pdb_file with a name or a list of the names of the PDB files you want to calculate the SAXS profiles for. You can specify multiple PDB files to visualize them on the same plot. By default, all the profiles will be plotted. However, if multiple profiles are plotted and you want to select specific profiles to visualize, checkboxes will be provided below the plot. You can check or uncheck the boxes to choose only those profiles you want to visualize.

The calculated SAXS profile(s) will be visualized using the matplotlib notebook backend, which allows you to interactively zoom in/out on a specific region of interest. You can also change the scale between the options: linear(I)/linear(Q), log(I)/linear(Q), and log(I)/log(Q).

To change the title of the plot you can type the necessary Title in the corresponding entry below the plot.

To save the final figure, you can click on "Save as" and specify the desired name with a .png extension.

Note: Make sure you have the necessary PDB files and the `sas_helper.py` file in the same folder as your Jupyter notebook before executing the saxs_profile command.

Below two examples are shown to calculate SAXS profile for one pdb_file and for several pdbs.

In [4]:
saxs_profile("8p9g.pdb")
saxs_profile(["8p9g.pdb","1fguA.pdb"],core="pepsisaxs")

<IPython.core.display.Javascript object>

interactive(children=(Dropdown(description='Scale:', options=('linear/linear', 'linear/log', 'log/log'), value…

VBox(children=(Checkbox(value=True, description='8p9g.pdb', indent=False, layout=Layout(margin='0 5px 0 0', wi…

Button(description='Save As...', style=ButtonStyle())

<IPython.core.display.Javascript object>

interactive(children=(Dropdown(description='Scale:', options=('linear/linear', 'linear/log', 'log/log'), value…

VBox(children=(Checkbox(value=True, description='8p9g.pdb', indent=False, layout=Layout(margin='0 5px 0 0', wi…

Button(description='Save As...', style=ButtonStyle())

# Calculation of SANS Profiles

To calculate SANS profiles, you can use the `sans_profile` command in Python. It provides similar functionality to `saxs_profile`, but with additional options.

The command to calculate SANS profiles is as follows:

```python
sans_profile(pdb_files, deut_level=[0], d2o_level=[0], exchange=[0])
```
- **deut_level** stands for Molecule deuteration. It represents the level of deuteration in the molecules under investigation. Deuteration refers to the replacement of hydrogen atoms with deuterium atoms. The **deut_level** value should be in the range from 0 to 1.
- **d2o_level** stands for Buffer deuteration. It represents the level of deuteration in the solvent or buffer used in the experiment. The **d2o_level** value should be in the range from 0 to 1.
- **exchange** stands for Exchange rate. It represents the rate at which hydrogen atoms in the molecule exchange with deuterium atoms. Higher exchange rates indicate faster exchange between hydrogen and deuterium atoms. The **exchange** value should be in the range from 0 to 1.

If you simply write `sans_profile(pdb_files)` without specifying any additional options, it will calculate the SANS profiles with the default values: deut_level=0, d2o_level=0, and exchange=0.

To explore how different parameter values affect the SANS profiles, you can provide a list of different numbers for each parameter. The `sans_profile` function will calculate all the combinations of those numbers.

For example, if you specify d2o_level = [0, 1] and exchange = [0.5, 0.8, 1], it will calculate 6 profiles for the different combinations of those parameters. If multiple profiles are provided, it will visualize all the combinations on the same plot.

Here's an example to illustrate the usage of different options:

In [5]:
sans_profile("8p9g.pdb", deut_level=[0, 0.2, 0.5], d2o_level=[0, 1], exchange=[0.5, 1])


<IPython.core.display.Javascript object>

interactive(children=(Dropdown(description='Scale:', options=('linear/linear', 'linear/log', 'log/log'), value…

VBox(children=(Checkbox(value=True, description='8p9g.pdb - Deut: 0 - D2O: 0 - Exchange: 0.5', indent=False, l…

Button(description='Save As...', style=ButtonStyle())

# Modeling with RRT

In addition to the previous modeling methods, we also have the option of RRT (Rapidly-exploring Random Trees) modeling. RRT modeling allows for flexible residues to be incorporated into the modeling process. 

To start RRT modeling, you need to provide both the PDB file and a file specifying the flexible residues `flex_file`. An explanation of the usage can be found here: [RRT Modeling Help](https://modbase.compbio.ucsf.edu/multifoxs/help).

To initiate the RRT modeling, use the following command:

```python
modelpdb_flex(pdb_file, flex_file, num_iter=100, num_modes=100, rad=0.5, models="all")
```
Here are the available options:

- **num_iter** represents the number of iterations. It determines the number of attempts the algorithm makes to create a new node in the RRT tree. A higher number of iterations allows for a more extensive exploration of the conformational space.
- **num_modes** specifies the number of nodes to create. Each node represents a distinct conformation.
- **rad** is the radii scaling parameter. It should be set between 0.3 and 1, controlling the scaling of the radii for each node in the RRT tree.
- **models** determines the number of models in the output PDB files. Set it to **"all"** if you want all the rotations (or models) to be saved in a single PDB file. Alternatively, set it to **"sep"** if you prefer each conformation to be saved in a separate file. Choosing **"sep"** can be more efficient for fitting purposes.

The modeling process may take some time, especially if a large number of nodes (more than 100) is created. A progress message will inform you that **"Modeling is in progress..."** Once the modeling is finished, you will see the message **"Modelling finished"** along with the exact number of nodes created, indicated as **"Done RRT <span style="color:green">{value}</span>."**

If you specified `models = "all"`, you can visualize the full picture of conformations animated. If you chose `models = "sep"`, a dropdown window will be available to explore the separate models that have been created.

Note: The modeling process may take some time depending on the specified parameters and the complexity of the system.

In [6]:
modelpdb_flex("1fguA.pdb","1fguA_linkers.txt",num_modes=200,models="all")

Modelling finished
Done RRT 49


NGLWidget(max_frame=49)

# Modeling with NOLB

In addition to the previously mentioned modeling methods, there is another method available called NOLB (NOn-Linear rigid Block NMA). NOLB is a conceptually simple and computationally efficient method for non-linear normal mode analysis.

The key observation of the NOLB method is that the angular velocity of a residue can be interpreted as the result of an implicit force, allowing the motion of the residue to be considered as a pure rotation about a certain center.

To perform NOLB modeling, you need to provide the `pdb_file` and specify the following options:

- **num_iter**: This parameter represents the number of iterations for the NOLB algorithm. It determines how many iterations the algorithm will perform to generate the models. Increasing the number of iterations can lead to a more refined sampling of conformational space.

- **num_modes**: This parameter specifies the number of modes (or models) to be generated. Each mode represents a distinct conformation.

To execute NOLB modeling, you can use the following command:

```python
modelpdb_nolb(pdb_file, num_iter=500, num_modes=10)
```

After executing the command, the NOLB algorithm will generate the specified number of modes. You can examine the created nodes (modes) in the output.

While RRT modeling explores the conformational space through a tree-like structure, NOLB focuses on generating modes based on non-linear normal mode analysis. NOLB identifies implicit forces and considers the motion of residues as pure rotations about certain centers. This approach captures a different aspect of the conformational space, providing complementary information to RRT modeling.




In [7]:
modelpdb_nolb("1fguA.pdb")

interactive(children=(Dropdown(description='pdb_file', options=('1fguA_nlb_1.pdb', '1fguA_nlb_2.pdb', '1fguA_n…

# Fit SAXS Experimental Data with FoXS and Pepsi-SAXS

To fit SAXS experimental data, you can use the `fitsaxs` command in Python. This command allows you to fit the provided PDB files against the specified data file. Depending on the core chosen, either "foxs" or "pepsisaxs", different options are available.

If you choose `core="foxs"`, the following options are taken into account:

```python
fitsaxs(pdb_files, data_file, core="foxs", c1_low=0.99, c1_up=1.05, c2_low=-2, c2_up=4, bg=0, 
        hyd=False)
```
- **c1** is the scaling of the atomic radius, which controls the excluded volume of the molecule. The default value is c1 = 1.0. During fitting, a range of values is allowed, with a 1% decrease and up to a 5% increase in the radius (0.99 ≤ c1 ≤ 1.05). The **c1_low** and **c1_up** parameters define this range.
- **c2** is used to adjust the difference between the densities of the hydration layer and the bulk water. It controls the density of the water layer around the molecule. The default value is c2 = 0.0. The value of c2 can vary from 0 to 4.0, representing up to four water molecule neighbors for an exposed solute atom. Negative values are also allowed (-2.0 ≤ c2 ≤ 4.0) to account for a lower hydration shell density. The **c2_low** and **c2_up** parameters define this range.
- **bg** is an option for background adjustment, which is not used by default.
- **hyd** is a boolean flag that indicates whether to explicitly consider hydrogens in the PDB files. The default value is False. If you want to use hydrogens, set `hyd=True`, assuming that all hydrogen atoms are listed in the PDB file.

If you choose `core="pepsisaxs"`, the following options are available:

```python
fitsaxs(pdb_files, data_file, core="pepsisaxs", bg=0, hyd=False, scale=1, int_0=1, neg=False,
        no_smear=False, hyd_shell=5, conc=1, abs_int=0, bulk_SLD=1e-5)
```
- **bg** is an option for background adjustment, which is not used by default.
- **hyd** is a boolean flag that indicates whether to explicitly consider hydrogens in the PDB files.
- **scale** is a scaling factor between the experimental intensity $I_{exp}$ and the theoretical intensity $I_{theory}$.
- **int_0** sets $I(0)$ to a constant value.
- **neg** is a flag that allows for a negative contrast of the hydration shell upon fitting.
- **no_smear** disables the data smearing during fitting.
- **hyd_shell** represents the hydration shell contrast as a percentage of the bulk value. The default is 5%. If this parameter is omitted, the contrast will be adjusted automatically during fitting.
- **conc** specifies the sample concentration in mg/mL. The default is 1 mg/mL. This parameter is only used when the `abs_int` option is enabled.
- **abs_int** enables the fitting of absolute intensity, in +-%.
- **bulk_SLD** allows for the explicit specification of the bulk SLD (Scattering Length Density) if different from water.

Both fitsaxs commands support multiple PDB files, which can be used for fitting with the provided data file. The output will include a table with the calculated chi-squared $\chi^2$ values for each fitting. 

During the fitting process, the command will identify the best 10 chi-squared values and visualize them. These best 10 values will also be specified in the table for easy reference.

Please note that the fitting process may take some time, especially if a large number of PDB files is provided. While the fitting is in progress, you will see the message "Fitting in progress" in the output.

In the example below we will first model different conformations and then fit them with `fitsaxs` command.

In [4]:
modelpdb_flex("1fguA.pdb","1fguA_linkers.txt",num_modes=20,models="sep")
fitsaxs(["nodes{}.pdb".format(i) for i in range(1,21)],"1fguA_iq.dat")

pdb_file,Chi^2,fit_file
nodes3.pdb,17.3578184464364,nodes3_1fguA_iq.fit
nodes2.pdb,18.0233330102213,nodes2_1fguA_iq.fit
nodes1.pdb,21.7220647685564,nodes1_1fguA_iq.fit
nodes4.pdb,34.5119938479943,nodes4_1fguA_iq.fit
nodes5.pdb,53.6073276794449,nodes5_1fguA_iq.fit
nodes10.pdb,66.5326171874025,nodes10_1fguA_iq.fit
nodes6.pdb,70.7293791205521,nodes6_1fguA_iq.fit
nodes11.pdb,78.0329015207416,nodes11_1fguA_iq.fit
nodes7.pdb,84.2246899583559,nodes7_1fguA_iq.fit
nodes12.pdb,87.5045917105764,nodes12_1fguA_iq.fit


<IPython.core.display.Javascript object>

Button(description='Save As...', style=ButtonStyle())

VBox(children=(Checkbox(value=True, description='nodes3.pdb'), Checkbox(value=True, description='nodes2.pdb'),…

HBox(children=(Text(value='', description='Filename:', layout=Layout(width='400px'), placeholder='Enter a file…

Plot saved as 1fguA_fitting.png


# Fit SANS Experimental Data with Pepsi-SANS

To fit SANS experimental data, you can use the `fitsans` command in Python. This command allows you to fit the provided PDB files against the specified data file. Several PDB files can be provided to fit against one data file.

The command has the following options:

```python
fitsans(pdb_files, data_file, deut_level=[0], d2o_level=[0], exchange=[0], bg=0, hyd=False, scale=1,
        neg=False, no_smear=False, hyd_shell=5, conc=1, abs_int=0, bulk_SLD=1e-5)
```

- **deut_level** stands for Molecule deuteration. It represents the level of deuteration in the molecules under investigation. Deuteration refers to the replacement of hydrogen atoms with deuterium atoms. The **deut_level** value should be in the range from 0 to 1.
- **d2o_level** stands for Buffer deuteration. It represents the level of deuteration in the solvent or buffer used in the experiment. The **d2o_level** value should be in the range from 0 to 1.
- **exchange** stands for Exchange rate. It represents the rate at which hydrogen atoms in the molecule exchange with deuterium atoms. Higher exchange rates indicate faster exchange between hydrogen and deuterium atoms. The **exchange** value should be in the range from 0 to 1.
- **bg** is an option for background adjustment, which is not used by default.
- **hyd** is a boolean flag that indicates whether to explicitly consider hydrogens in the PDB files.
- **scale** is a scaling factor between the experimental intensity $I_{exp}$ and the theoretical intensity $I_{theory}$.
- **neg** is a flag that allows for a negative contrast of the hydration shell upon fitting.
- **no_smear** disables the data smearing during fitting.
- **hyd_shell** represents the hydration shell contrast as a percentage of the bulk value. The default is 5%. If this parameter is omitted, the contrast will be adjusted automatically during fitting.
- **conc** specifies the sample concentration in mg/mL. The default is 1 mg/mL. This parameter is only used when the `abs_int` option is enabled.
- **abs_int** enables the fitting of absolute intensity, in +-%.
- **bulk_SLD** allows for the explicit specification of the bulk SLD (Scattering Length Density) if different from water.

Note that `deut_level`, `d2o_level`, and `exchange` can be single numbers or lists. If they are lists, all possible configurations of the parameters will be taken into account. For example, if you specify d2o_level = [0, 1] and exchange = [0.5, 0.8, 1], it will calculate 6 profiles for the different combinations of those parameters. If multiple profiles are provided, it will calculate 6 profiles for each PDB file.

If more than 10 profiles in total are calculated and fitted, only the best 10 $\chi^2$ scores will be shown.

Additionally, for SANS, the data file might include a resolution column as the 4th column. The command will automatically calculate profiles taking into account the instrument resolution. This is an important consideration in SANS data analysis and one of the main differences between SAXS and SANS.

Below we show the example of SANS fitting taken from SASBDB database [SASDPV4](https://www.sasbdb.org/data/SASDPV4/) (all the files are already dowloaded in the parent folder).

In [30]:
fitsans("SASDPV4_fit2_model1.pdb","SASDPV4.dat",deut_level=[0,1],d2o_level=[0,1],exchange=[0.8,1])

pdb_file,Chi^2,Deut,D2O,H-exchange,fit_file
SASDPV4_fit2_model1.pdb,3.57,1.0,1.0,1.0,SASDPV4_fit2_model1-SASDPV4_deut0_d2o100_exch100.fit
SASDPV4_fit2_model1.pdb,3.79,1.0,1.0,0.8,SASDPV4_fit2_model1-SASDPV4_deut0_d2o100_exch80.fit
SASDPV4_fit2_model1.pdb,54.56,0.0,0.0,0.8,SASDPV4_fit2_model1-SASDPV4_deut100_d2o0_exch80.fit
SASDPV4_fit2_model1.pdb,54.56,0.0,0.0,1.0,SASDPV4_fit2_model1-SASDPV4_deut100_d2o0_exch100.fit
SASDPV4_fit2_model1.pdb,82.65,0.0,0.0,0.8,SASDPV4_fit2_model1-SASDPV4_deut0_d2o0_exch80.fit
SASDPV4_fit2_model1.pdb,82.65,0.0,0.0,1.0,SASDPV4_fit2_model1-SASDPV4_deut0_d2o0_exch100.fit
SASDPV4_fit2_model1.pdb,118.66,1.0,1.0,0.8,SASDPV4_fit2_model1-SASDPV4_deut100_d2o100_exch80.fit
SASDPV4_fit2_model1.pdb,133.45,1.0,1.0,1.0,SASDPV4_fit2_model1-SASDPV4_deut100_d2o100_exch100.fit


<IPython.core.display.Javascript object>

Button(description='Save As...', style=ButtonStyle())

VBox(children=(Checkbox(value=True, description='SASDPV4_fit2_model1.pdb (Deut: 1.0, D2O: 1.0, H-ex: 1.0)', la…

# Multi-State Modeling with `multimodelfit`

In data analysis, dealing with heterogeneous samples is common, where heterogeneity can be both in composition and conformation. For interpreting data collected from such samples, a multi-state model is essential. A multi-state model involves multiple co-existing structural states and parameters, including weights for each state [3].

The `multimodelfit` command in Python allows for multi-state modeling using either SAXS or SANS data. The command provides an array of options to tailor the fitting process.

```python
multimodelfit(pdb_files, data_file, type="saxs", ensemble_size=10, bestK=1000, chi_perc=0.3, chi=0,
              min_weight=0.05, max_q=0.5, c1_low=0.99, c1_up=1.05, c2_low=-0.5, c2_up=2, multimodel=1,
              bg=0, nnls=False,
              deut_level=[0], d2o_level=[0], exchange=[0], conc=1, abs_int=0, hyd=False, 
              bulk_SLD=1e-5, no_smear=False, scale=1, neg=False, hyd_shell=5):
```    

For both SAXS and SANS multi-state modeling, the following parameters are common:

- **pdb_files**: A collection of PDB files for multi-model fitting, with a minimum of 2 files.
- **data_file**: The data file to fit, typically in .dat format.
- **ensemble_size**: The maximum ensemble size, with a default of 10.
- **bestK**: Default value is 1000.
- **chi_perc**: The chi value percentage threshold for profile similarity, defaulting to 0.3.
- **chi**: A chi-based threshold, defaulting to 0.
- **min_weight**: The minimum weight threshold for a profile to contribute to the ensemble, defaulting to 0.05.
- **max_q**: The maximum q value, with a default of 0.5.
- **c1_low**: The minimum c1 value, defaulting to 0.99.
- **c1_up**: The maximum c1 value, defaulting to 1.05.
- **c2_low**: The minimum c2 value, defaulting to -0.5.
- **c2_up**: The maximum c2 value, defaulting to 2.
- **multimodel**: Option to read models, with choices 1, 2, or 3. 1: read the first Model only (default); 2: read each model into a separate structure; 3: read all models into a single structure;
- **bg**: Background adjustment option, not used by default.
- **nnls**: Running Non-negative Least Square on all profiles, defaulting to False.

When using `multimodelfit` on SANS data, the following options are also available:

- **deut_level** stands for Molecule deuteration. It represents the level of deuteration in the molecules under investigation. Deuteration refers to the replacement of hydrogen atoms with deuterium atoms. The **deut_level** value should be in the range from 0 to 1.
- **d2o_level** stands for Buffer deuteration. It represents the level of deuteration in the solvent or buffer used in the experiment. The **d2o_level** value should be in the range from 0 to 1.
- **exchange** stands for Exchange rate. It represents the rate at which hydrogen atoms in the molecule exchange with deuterium atoms. Higher exchange rates indicate faster exchange between hydrogen and deuterium atoms. The **exchange** value should be in the range from 0 to 1.
- **conc** specifies the sample concentration in mg/mL. The default is 1 mg/mL. This parameter is only used when the `abs_int` option is enabled.
- **abs_int** enables the fitting of absolute intensity, in +-%.
- **hyd** is a boolean flag that indicates whether to explicitly consider hydrogens in the PDB files.
- **bulk_SLD** allows for the explicit specification of the bulk SLD (Scattering Length Density) if different from water.
- **no_smear** disables the data smearing during fitting.
- **scale** is a scaling factor between the experimental intensity $I_{exp}$ and the theoretical intensity $I_{theory}$.
- **neg** is a flag that allows for a negative contrast of the hydration shell upon fitting.
- **hyd_shell** represents the hydration shell contrast as a percentage of the bulk value. The default is 5%. If this parameter is omitted, the contrast will be adjusted automatically during fitting.


An illustrative example for multi-model state fitting of SAXS data from the IMP website https://modbase.compbio.ucsf.edu/multifoxs/help will be provided.

Similarly, for SANS data, multi-model fitting is possible, with or without a resolution column. If a resolution column is provided, new files named *_3col will be generated as inputs for multi_foxs [3].




In [2]:
modes = modelpdb_flex("prion.pdb","prion_linkers.txt",models="sep")

Modelling finished
Done RRT 102


interactive(children=(Dropdown(description='pdb_file', options=('nodes1.pdb', 'nodes2.pdb', 'nodes3.pdb', 'nod…

In [3]:
# note that (number of nodes+1) one gets from the previous step
df=multimodelfit(["nodes{}.pdb".format(i) for i in range(1,modes+1)],"prion_iq.dat")

Chi^2,fit_file,Contributing PDB file(s),Weight(s)
4.9371339889815,multi_state_model_2_1_1.fit,"nodes84.pdb, nodes52.pdb","0.721, 0.279"
5.30693421569028,multi_state_model_1_1_1.fit,nodes84.pdb,1.0
7.34204358526882,multi_state_model_2_2_1.fit,"nodes52.pdb, nodes98.pdb","0.881, 0.119"
7.83019109202911,multi_state_model_1_2_1.fit,nodes52.pdb,1.0
10.5766786927221,multi_state_model_2_3_1.fit,"nodes34.pdb, nodes98.pdb","0.881, 0.119"
11.0063378475594,multi_state_model_1_3_1.fit,nodes34.pdb,1.0
14.6142657864746,multi_state_model_2_4_1.fit,"nodes29.pdb, nodes98.pdb","0.763, 0.237"
16.1256921731337,multi_state_model_2_5_1.fit,"nodes29.pdb, nodes53.pdb","0.932, 0.068"
16.1628795078189,multi_state_model_1_4_1.fit,nodes29.pdb,1.0
17.6998210070676,multi_state_model_2_6_1.fit,"nodes88.pdb, nodes98.pdb","0.532, 0.468"


<IPython.core.display.Javascript object>

Button(description='Save As...', style=ButtonStyle())

VBox(children=(Checkbox(value=True, description='multi_state_model_2_1_1.fit', layout=Layout(width='500px')), …

In [28]:
# Then we can have a look on the contributing PDB file(s)
string=df["Contributing PDB file(s)"][9] #the index is the same as in the table above
file_list = [file.strip() for file in string.split(',')]
interact(show_pdb,pdb_file=file_list)

interactive(children=(Dropdown(description='pdb_file', options=('nodes88.pdb', 'nodes98.pdb'), value='nodes88.…

<function sas_helper.show_pdb(pdb_file)>