# CellRake Example Workflow

When using **CellRake** to identify the cells on a fluorescence image, it is essential to train a model to determine which segmentations correspond to cells and which do not. To achieve this objective, we will train a machine-learning classifier using a set of input images.

We will start by importing the necessary packages for our analysis, including the functions from `cellrake`. Ensure that:

- You have correctly installed `cellrake` in your Conda environment following the instructions in [README.md](../README.md).
- You are running this notebook with the `cellrake` environment (top right corner of the notebook if you are using VSCode).

## 1. Import packages

In [None]:
from cellrake.main import CellRake
from pathlib import Path

## 2. Initialize project

We need to start by initializing the class with two directories:

- Image folder (`image_folder`): where your TIFF images are located.
- Project directory (`project_dir`): where results will be stored.

You can also pass a third optional argument (`segmented_data`), which corresponds to a Python dictionary of already segmented images. This will skip the segmentation step on your `image_folder` and will use the `segmented_data` for the training and analysis steps.

In [None]:
tutorial_project = CellRake(
    image_folder=Path("./sample_images"),
    project_dir=Path("./tutorial_project")
)

## 3. Train a model

When running the `.train(threshold_rel, model_type, samples)` method, CellRake will:

- Segment images into potential cells (ROIs).
- Extract features (intensity, texture, shape).
- Ask you to manually label a few ROIs.
- Use label spreading to assign pseudo-labels to similar ROIs.
- Train a classifier: Random Forest, Support Vector Machine, Extra-Trees, or Logistic Regression.
- Report training & test metrics.

Key arguments:
- `threshold_rel`: controls sensitivity of segmentation (0–1).
- `model_type`: choose classifier ("rf" = Random Forest by default).
- `samples`: number of ROIs you will manually label.

In [None]:
tutorial_project.train()

After training, the model is stored in `my_tdt_project.model` and metrics in `my_tdt_project.metrics`. We will also have the plots and data files downloaded in our `project_dir`.

In [None]:
tutorial_project.model # Access your trained model
tutorial_project.metrics # Access your training performance metrics

## 4. Saving the model and segmentation

After training, we can save both the `model`and the `segmented_data` in your `project_dir`. This way, we will be able to load them later or in another session.

In [None]:
tutorial_project.save_model('sample_rf_model') # We can save it using a customed name
tutorial_project.save_segmentation('sample_segmented') # We can save it using a customed name

tutorial_project.load_model('sample_rf_model')
tutorial_project.load_segmentation('sample_segmented')

## 5. Run analysis

Once the model is trained (or loaded), we can analyze our images. We can use a new folder of images (`image_folder`) or, if defined, a `segmented_data` dictionary. When running the `.analyze(threshold_rel, cmap)` method, CellRake will:

- Segment the images (or reuse a segmented dictionary if already defined).
- The trained model classifies the ROIs.
- Results are exported to your project_dir:
    - Images of the identified cells.
    - `cell_counts.csv`:  number of positive cells per image.
    - `cell_features.csv`: extracted features (intensity, area, etc.) per ROI.

Key arguments:
- `threshold_rel`: segmentation sensitivity in case segmentation takes place.
- `cmap` → colormap for visualization ("Reds", "Blues", etc.).

In [None]:
tutorial_project.analyze()

After running, results are also available as attributes:

In [None]:
tutorial_project.counts # Access the pd.DataFrame with the count results
tutorial_project.features # Access the pd.DataFrame with the feature extraction