Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CL2D producing less classes than requested #230

Closed
dallakyan opened this issue Jan 24, 2020 · 3 comments
Closed

CL2D producing less classes than requested #230

dallakyan opened this issue Jan 24, 2020 · 3 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@dallakyan
Copy link

Thank you to XMIPP developers for your contributions. We are running CL2D (xmipp_mpi_classify_CL2D) with ~10k particles and requesting 40 classes (--nref 40). However, it produces only 4 classes. When we run it on a subset of ~5k particles, with the same parameters, it is able to produce all 40 classes. This seems counterintuitive, and when I dig deeper in the code, I see that it stops when number of assignment changes is less than 0.5% of total particles.

I was wondering if anyone else has seen this issue and can comment on this. One workaround to get more classes is to reduce the number of iteration, but I'm not sure if this is am optimal solution.
Thanks.

@DStrelak
Copy link
Collaborator

Hello @dallakyan,

thanks for reporting this issue!
To summarize the problem:

  1. you have a set of 10k particles S
  2. your run xmipp_mpi_classify_CL2D --nref 40 and get only 4 classes
  3. you create a subset of S, say s, with 5k particles
  4. you run xmipp_mpi_classify_CL2D --nref 40 and get 40 classes

Is that correct?
Can you send us (dstrelak@cnb.csic.es) Scipion logs from both protocols?
Thanks.

@DStrelak DStrelak added the help wanted Extra attention is needed label Jan 27, 2020
@ajimoreno
Copy link
Contributor

Hello @dallakyan,

thanks for your report.
I was taking a look at the code. You're right, the code stops when the number of assignment changes is less than 0.5%. Maybe, with the whole input set, the program gets stuck and it's unable to evolve from the beginning classes and is stopping in this condition. On the other hand, with the subset of the input particles, it seems that the program is able to generate initial classes with more differences among them and it's able to evolve from these.
I was thinking about how to manage this problem. You could try to run a couple of classifications using subsets of the input particles, then you can concatenate the obtained classes. Another option is to use the "Number of initial classes" field in the protocol, if you put a higher number of random initial classes (e.g. 8), maybe the program will find enough differences among the initial classes to evolve and generate the desired number of output classes.
I hope that any of these recommendations can help you. Let me know if the problem persists.

Thanks.

@dallakyan
Copy link
Author

Thank you @DStrelak and @ajimoreno.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants