-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preprocessing status and outputs #24
Comments
So the two key flags are --preprocess and --patches . --preprocess creates a zarr file in place of the png, and --patches will construct or append a SQL db. Can you change your npy file to end in _mask.npy instead of just .npy ? |
I get the same output after changing the file name:
I don't see a zarr file or anything new in the directory, unless it's hidden somehow. I think this function would benefit from some text responses to the user showing what has been done and what was saved (it currently outputs just '512') |
And what's #26 you mentioned here? Is it a problem with the mask orientation that I'm using here? |
It’s reference to a potential bug we may need to work out from a previous patch |
@jlevy44 See the pending question above about the missing outputs |
Fair enough, we will add more progress updates. There should be some display though that indicates progress. You also have a forward slash near your stainID that should not be there |
Great. But again, what outputs should I see right now to evaluate whether it ran and completed appropriately or not? For example, what files should I see being created? I don't see any files but I don't know what to look for. There isn't any Zarr file in the directory. @jlevy44 |
You should at least see outputs: You should see outputs such as printed here: Have you adjusted your command as previously discussed? |
Yes, I removed the forward slash and it’s still printing only 512. Was the package updated in the last week or two to do these things I don’t see? Maybe I don’t the the latest version. |
It is possible that you do not have the latest software. As far as I am aware, this has been a long time feature of the package. Your command syntax and print appear incorrect:
|
Now I'm getting the no such command issue with preprocess:
The package is clearly installed and loaded on a GPU instance so what can the issue be? |
Here's the image Saturn Cloud have created for me to run the PFAI environment. Is it helpful on your end to load and see what's the issue? I don't see any other way I can resolve this. |
Just by looking at the YAML files, you have an old version of pathflowai specified. For the latest version of pathflowai, we recommend running:
Adding within the docker container. Of course, if there are any bugs, I would highly encourage having the flexibility to rebuild the Docker within the HPC environment, or at least updating that Docker image put forth by another one housing the latest patch. |
Great, thanks. It's working now. See the outputs below. Can you let mom know if its looks okay? I'mm basically running a loop obtaining slides and both region masks which I'd like to segment as well as a background mask from histoQC which I use to ask the image. I calculate the hematoxylin channel because I don't want the stain to drive the segmentation here. I mask the hematoxylin and segmentation mask matrix to remove the tissue background, save as PNG and NPY and run the preprocess per stain with flags Also the process crashed after about 4 slides although I allocated 32 GB mem 40 GB HD and 1 GPU. Pushing it to 64 GB mem. Does it sound reasonable just for preprocessing or a mI doing something wrong? output:
|
Yeah, the memory utilization is likely due to running the processes in a for loop through jupyter, which Is prone to memory leaks. Typically, I would deploy each of these processes across the HPC. I would also check the resulting SQL database and make sure this is the patch size that you want. You can also add other patch sizes to capture info at a different resolution. You also need the masks in the same directory as the WSI, with the same basename, just replacing the extension with _mask.npy. You don’t want to preprocess the masks as if they were WSI. Everything else looks ok. |
What do you mean by 'You don’t want to preprocess the masks as if they were WSI'? |
In addition to the above, I'm going to follow your advice regarding multiple patch sizes per image so I would do this:
Omitting the preprocess flag from the second run per slide and using 512 and then 1024 size. |
That looks right to me. Do you have 7 output classes? -tc 7 Also, are you using a black background for the slides? Please convert the background to white if so. |
I do have 7 segmentation (not classification, to be clear) classes including the background as the first channel: Background,Bile Ducts,Normal,Tumor,Stroma,Tissue Fold,Lymphoid Aggregate Why do you need the background white? I'm using the deconvolved grayscale hematoxylin image so most of the image is black. Actually that might be the problem - maybe they're all being filtered because it doesn't exceed that intensity threshold which you may have optimized for RGB images? How do I determine that? |
That info should be in the db file and can be visualized using any of our visualization functions (going to update). We remove background based on if it is white, which will be especially pertinent when we implement otsu thresholding. You can change the threshold intensity, but it will grab the entire background if you use black, which you will have to filter out manually. |
One way to set the intensity is to try to ostu threshold one of your images, then take 255-otsu_threshold as the intensity. |
Hi @jlevy44
How do I know what's the status of the preprocessing procedure during and after execution and what are the outputs I should see in terms of files being written?
The following ran for a couple of seconds and just printed '512' at the end, I don't see any file inputs in the directory specified here.
Output:
And these are the input files I have in that folder (just one sample for now to test if it works):
And to make sure my plan is compatible with this function, I plan to run it as a last step in a loop that processes each slide from ndpi separately and preparing the input to PFAI. That's assuming the PFAI preprocess command will concatenate the data when it's being called on new slides. I'll just remove the '--preprocess' flag after the first iteration. In that context, everytime I run the command with --preprocess I basically instruct it to redefine the database? Does it delete the old one? And where are they stored?
The text was updated successfully, but these errors were encountered: