# Assign exposure groups to particles  
[![DOI](https://zenodo.org/badge/598765943.svg)](https://zenodo.org/badge/latestdoi/598765943)

After [running `kmeans_groups.py`](https://github.com/kookjookeem/kmeans-beamtilt) for your data, assign the group IDs to your CryoSPARC particles [via `cryosparc-tools`](https://tools.cryosparc.com/intro.html).  
See the [wiki page](https://github.com/kookjookeem/kmeans-beamtilt/wiki) on the `kmeans-beamtilt` GitHub repo for the instructions to run `kmeans_groups.py`.  

For this workflow, the following dependencies are required as well as `cryosparc-tools`:  
* `scikit-learn`  
* `pandas`  
* `matplotlib`  

First initialize the `CryoSPARC` client:

In [None]:
from cryosparc.tools import CryoSPARC

cs = CryoSPARC(
    license=LICENSE,
    host=HOSTNAME,
    base_port=39000,
    email=EMAIL,
    password=PASSWORD
)

assert cs.test_connection()

In this example, the particles are from a Non-uniform Refinement job at `P32-J748`.   
Locate the project and the job with `find_project` and `find_job`.  

Load the `particles` as a `pandas` dataframe:

In [None]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.float_format', None)

project = cs.find_project("P32")
job = cs.find_job("P32", "J748")
particles = job.load_output("particles")

particles_df = pd.DataFrame(particles.rows()) # It might take a little time to load

Load the CSV file from `kmeans_groups.py` and parse the two columns to `filename` and `expgroup`.  
Put the absolute path to `csv_file`.

In [None]:
csv_file = '/Users/kookjookim/Downloads/job030_mic_class.csv'
csv_data = pd.read_csv(csv_file, header=None, names=['filename', 'expgroup'])
print(csv_data)

Extract `location/micrograph_path` to create a column named `filename`.

Set offsets for parsing:  
`x`: Number of characters to omit counting from the START of `location/micrograph_path` (default: exclude path & UIDs)  
`y`: Number of characters to omit counting from the END of `location/micrograph_path` (default: exclude .mrc)

In this example, `J481/imported/015577085289234993444_19apr05e_00009hln_00003enn_frames_030.mrc` is parsed to `19apr05e_00009hln_00003enn` to match the `filename` in `csv_data`.

In [None]:
x = 0
y = 11 # _frames_xxx

# Extract from 'location/micrograph_path' to 'filename'
particles_df['filename'] = particles_df['location/micrograph_path'].str.split('/').str[-1].str[22 + x:-4 - y]
path_columns = particles_df.filter(items=['location/micrograph_path','filename', 'ctf/exp_group_id'])
path_columns

Create a dictionary to map `filename` to `expgroup` in `csv_data`.  
Replace `ctf/exp_group_id` with `expgroup` based on `filename`.

In [None]:
file_to_exp = dict(zip(csv_data['filename'], csv_data['expgroup']))
particles_df['ctf/exp_group_id'] = particles_df['filename'].map(file_to_exp).fillna(particles_df['ctf/exp_group_id'])

ctf_expgroup_columns = particles_df.filter(like='ctf/exp_group_id')
ctf_expgroup_columns.head(60) # Showing the first 60 rows. Peep the new 'ctf/exp_group_id'

Create a deep copy of the `particles` dataset and replace `ctf/exp_group_id` with the one in `particles_df` dataframe.

In [None]:
updated_particles = particles.copy()
updated_particles["ctf/exp_group_id"] = particles_df["ctf/exp_group_id"]

Save the particles with the new `ctf/exp_group_id` assignment.  
Create an external job with the updated particles as the output.

In [None]:
project.save_external_result(
    workspace_uid="W13",
    dataset=updated_particles,
    type="particle",
    name="km_beamtilt_particles",
    slots=["ctf"],
    passthrough=(job.uid, "particles"),
    title="Beamtilt grouped particles"
)

You can now run Homogeneous Refinement, Non-uniform Refinement, or Global CTF Refinement to optimize per-group (global) CTF params.