Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running a single sample #177

Closed
josenimo opened this issue Jan 22, 2024 · 5 comments
Closed

Running a single sample #177

josenimo opened this issue Jan 22, 2024 · 5 comments

Comments

@josenimo
Copy link

Hey @gjbaker ,

Thank you for this amazing tool, a couple of questions regarding Cylinter.

I am running MCMICRO using mesmer, so I have to adapt a little bit to conform the input file structure.

First, what is the structure of the seg files, I remember that these come from the S3segmenter QC folder. from I could explore, the file is made up of two channels, the first channel being a binary segmentation mask, 0 for background, and 42594 for foreground. The inside of the cells is also 0, meaning the cells are not filled (as I would expect). The second channel seems to be the nuclear channel used for the segmentation step.
How critical is this file? I will try to replicate exactly what I find in the example files. Any detail that matters for downstream?

Second, can I run a single sample? I am just trying to go through cylinter for a single sample. However a KeyError: 0 shows up at the aggregateData step, which I assume is not necesary for a single sample. Is there a way to bypass the single sample?

StackTrace below:

(cylinter-env47) CMP06623:P23_Core19_PoC jnimoca$ cylinter Cylinter/config.yml 
INFO: Reading configuration file
INFO: Executing pipeline
Running: <function aggregateData at 0x1550615a0>
INFO: ======================================================================
INFO: RUNNING MODULE: aggregateData

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/cylinter-env47/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3790, in get_loc
    return self._engine.get_loc(casted_key)
  File "index.pyx", line 152, in pandas._libs.index.IndexEngine.get_loc
  File "index.pyx", line 181, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 2606, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 2630, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/cylinter-env47/bin/cylinter", line 10, in <module>
    sys.exit(main())
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/cylinter-env47/lib/python3.10/site-packages/cylinter/cylinter.py", line 49, in main
    pipeline.run_pipeline(config, args.module)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/cylinter-env47/lib/python3.10/site-packages/cylinter/pipeline.py", line 132, in run_pipeline
    data = module(data, qc, config)  # getattr(qc, module)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/cylinter-env47/lib/python3.10/site-packages/cylinter/components.py", line 56, in wrapper
    result = func(*args, **kwargs)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/cylinter-env47/lib/python3.10/site-packages/cylinter/modules/aggregateData.py", line 16, in aggregateData
    markers, dna1, dna_moniker, abx_channels = read_markers(
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/cylinter-env47/lib/python3.10/site-packages/cylinter/utils.py", line 321, in read_markers
    dna1 = markers['marker_name'][markers['channel_number'] == markers['channel_number'].min()][0]
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/cylinter-env47/lib/python3.10/site-packages/pandas/core/series.py", line 1040, in __getitem__
    return self._get_value(key)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/cylinter-env47/lib/python3.10/site-packages/pandas/core/series.py", line 1156, in _get_value
    loc = self.index.get_loc(label)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/cylinter-env47/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3797, in get_loc
    raise KeyError(key) from err
KeyError: 0

Config file here, as .txt:
config.txt

Thank you again
Best,
Jose

@josenimo
Copy link
Author

Dear @gjbaker ,

I figured out what is happening. It has nothing to do with it being a single sample.
It fails at this line of code:
dna1 = markers['marker_name'][markers['channel_number'] == markers['channel_number'].min()][0]
https://github.com/josenimo/cylinter/blob/0e1cb69bec85ca5f603f8df920f5d7b5a1e5b1be/cylinter/utils.py#L321

the [0] calls the channel name with index 0, the issue in my case is that I am excluding 5 channels, and therefore the lowest channel number has the index 5. So when I try to call the [0] it cannot find any and it runs into a KeyError.

I also have another problem which is that my nuclear stain is not on the first channel, it is actually the last channel. This means that .min() would still not work for me. Do you think I could be an easy way for me to specify the channel name as a string in the config.yml? This would be nice.

Best,
Jose

@gjbaker
Copy link
Member

gjbaker commented Jan 23, 2024

Hi @josenimo,

The line of code you shared looks for the name of the channel with the lowest index in the channel_number column of the markers.csv file. In our lab's standard workflow, this index almost always corresponds to the first counterstain channel which coincides with the first channel in the OME-TIFF file. The [0] at the end of the line just extracts the column value (i.e., channel name) from the single-row dataframe returned by the pandas slicing operation.

I assume you are using the markersToExclude parameter in config.yml to exclude the 5 channels. Note that for cyclic studies this should only include immunomarker channels, not counterstain channels, as these are used by the cycleCorrelation module to correlate counterstain signals across imaging cycles in order to drop cells that have failed to remain stable over the course of imaging.

A relatively easy work around for your counterstain channel being the last channel in the image is to reassign the dna1 variable in utils.py to explicitly take the name of the channel you wish to consider as your default DNA channel.

Regarding your earlier question about the form and function of seg files. These are binary images showing segmentation contours which are used by the program as a fiducial for evaluating segmentation quality. In the case of seg files output by MCMICRO, these consist of two channels, as you have previously alluded to. That said, CyLinter only reads the binary channel, so this is the only channel one would actually need to recreate.

Hope this helps,
-Greg

@josenimo
Copy link
Author

Hey @gjbaker ,
Thank you for time and explanation, that does make sense.

I am hard-coding a specific channel name to the my counterstain.
small question:
Would I be able to modify the python code inside the mamba environment, I just can't find it and assume it is compiled as a binary. Would I be able to create a new environment with all the requirements and then install from modified cloned repo?

Thank you again,
Best,
Jose

@gjbaker
Copy link
Member

gjbaker commented Jan 24, 2024

Hi @josenimo,

Assuming you are on a Mac, all one would have to do is update the read_markers function in the file found here: ~/miniconda3/envs/cylinter/lib/python3.10/site-packages/cylinter/utils.py (or an analogous path) with the default DNA channel you wish to use.

-Greg

@josenimo
Copy link
Author

Hey @gjbaker ,

thank you, I had never hacked into my conda environments :)

I noticed that the channel being loaded was still being the first one,
I realized that this line of code was the guilty one selectROI.py line 293
I just switched it to the correct one (4 in my case), I guess I might have to change this for all other modules but that should be a simple fix. Just writing it here for documentation.

Thanks again, I will close the issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants