Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

low cpu usage for given proto file #1588

Closed
gregy4 opened this issue Sep 19, 2019 · 21 comments
Closed

low cpu usage for given proto file #1588

gregy4 opened this issue Sep 19, 2019 · 21 comments
Assignees
Labels
Bug Lang: C++ Native implementation issue
Milestone

Comments

@gregy4
Copy link

gregy4 commented Sep 19, 2019

For given proto file I use 8 search workers. During execution I found that cpu usage for long periods doesn't match with number of search workers. Cpu utilization is with other proto files around 25-30% (comply with number of threads / available cores), with this proto file spectrum is much wider between 4 and 30%. It seems that lower cpu utilization has something to do with situation when a solution is found and after call of solution callback cpu utilization is improved.
Given example consume during search around 40G of memory in ortools 7.3

proto.zip

@ghost
Copy link

ghost commented Sep 19, 2019

Your question sounds similar to #1420, but I'm not 100% sure if it still works like that

@lperron
Copy link
Collaborator

lperron commented Sep 19, 2019

Most of the time is spent presolving the model, which is sequential.

Solving is fast and uses all threads.

I0919 12:00:42.190474 185435 sat_runner.cc:168] Reading a CpModelProto.
I0919 12:00:43.612012 185435 cp_model_solver.cc:2251] Parameters: num_search_workers: 8
I0919 12:00:43.753345 185435 cp_model_solver.cc:2265] Satisfaction model '':
#Variables: 119200

  • 119200 in [0,1]
    #kLinear2: 1631700
    #kLinearN: 1192
    I0919 12:00:43.877019 185435 cp_model_solver.cc:2308] *** starting model presolve at 0.27s
    I0919 12:01:02.079699 185435 cp_model_presolve.cc:3873] - 0 affine relations were detected.
    I0919 12:01:02.079766 185435 cp_model_presolve.cc:3875] - 0 variable equivalence relations were detected.
    I0919 12:01:02.079798 185435 cp_model_presolve.cc:3883] - rule 'bool_or: implications' was applied 1631700 times.
    I0919 12:01:02.079804 185435 cp_model_presolve.cc:3883] - rule 'linear: empty' was applied 37 times.
    I0919 12:01:02.079809 185435 cp_model_presolve.cc:3883] - rule 'linear: negative clause' was applied 1631700 times.
    I0919 12:01:02.079813 185435 cp_model_presolve.cc:3883] - rule 'linear: positive equal one' was applied 999 times.
    I0919 12:01:02.079818 185435 cp_model_presolve.cc:3883] - rule 'linear: singleton column' was applied 37 times.
    I0919 12:01:02.079825 185435 cp_model_presolve.cc:3883] - rule 'presolve: iteration' was applied 2 times.
    I0919 12:01:02.137112 185435 cp_model_solver.cc:2333] Satisfaction model '':
    #Variables: 115500
  • 115500 in [0,1]
    #kAtMostOne: 41699 (#literals: 622800)
    #kBoolAnd: 3100 (#enforced: 3100) (#literals: 3100)
    #kBoolOr: 999 (#literals: 99900)
    #kLinearN: 156
    I0919 12:01:02.223026 185435 cp_model_solver.cc:2196] *** starting Search at 18.61s with 8 workers and strategies: [ auto, less encoding, no_lp, max_lp, random, random_1, random_2, random_3 ]
    I0919 12:01:03.448972 185547 synchronization.cc:110] simple spelling correction #1 19.84s no_lp num_bool:115500
    I0919 12:01:03.451136 185552 synchronization.cc:110] use the PSAPI version from Win2008 to be compatible with that OS #2 19.84s less encoding num_bool:115500
    I0919 12:01:05.104082 185435 cp_model_solver.cc:2454] CpSolverResponse:
    status: FEASIBLE
    objective: NA
    best_bound: NA
    booleans: 115500
    conflicts: 0
    branches: 250032
    propagations: 8889668
    integer_propagations: 458400
    walltime: 21.4315
    usertime: 21.4315
    deterministic_time: 2.02935
    primal_integral: 0

@lperron lperron closed this as completed Sep 19, 2019
@gregy4
Copy link
Author

gregy4 commented Sep 19, 2019

Sorry I have to disagree.
Of course I know that presolve use only single thread, I talk about next phase - searching.
I think the issue is not related to #1420, since additional search workers are used for lns and when no solution exists yet it doesn't make sense to execute them.

Attached proto file solves optimization and not feasibility problem.
There's my code :
File initialFile = new File("c:/temp/test2.proto");
InputStream targetStream = new FileInputStream(initialFile);

    CpModel model = new CpModel();
    model.getBuilder().mergeFrom(targetStream);

    CpSolver cpSolver = new CpSolver();

    cpSolver.getParameters().setLogSearchProgress(true);
    cpSolver.getParameters().setNumSearchWorkers(8);

    CpSolverSolutionCallback solutionCallBack =  new CpSolverSolutionCallback()
    {
        @Override
        public void onSolutionCallback()
        {
            System.out.println("solution found");
        }
    };

    cpSolver.solveWithSolutionCallback(model, solutionCallBack);

@lperron
Copy link
Collaborator

lperron commented Sep 19, 2019

Sorry, too many files from support with the same names

Here is what I get on master:

216428 lperron 20 0 5177696 4.2g 19720 S 792.1 2.2 11:57.87 sat_runner

It uses 8 CPUs after 1 min and 30s, consistently.

At worst, it will use 5 threads when searching for a solution, and launch all 8 when/if it finds them.

@gregy4
Copy link
Author

gregy4 commented Sep 19, 2019

Understand :-)
After 3hours of execution I still see periods when cpu usage is around 11% that corresponds to 4 threads at max for my cpu. I'll try to test it on master, run it on another OS and get some statistics about cpu usage. Maybe some synchronization between search workers is causing this behaviour.

@lperron
Copy link
Collaborator

lperron commented Sep 19, 2019 via email

@gregy4
Copy link
Author

gregy4 commented Sep 19, 2019

16 real cores, with hyperthreading shown as 32 cores. threadripper 1950x

@lperron
Copy link
Collaborator

lperron commented Sep 19, 2019 via email

@gregy4
Copy link
Author

gregy4 commented Sep 19, 2019

I agree, this is how it should look like.
Meanwhile I used performance profiler from visual studio and have more correct data.
Phase one - presolve, one thread is used - in my case cpu usage is around 3-4%
Phase two - solve started, no solution is found yet - cpu utilization varies between 9-30%
Phase three - solution is found, displayed message
#1 202.16s best:5460 next:[5018,5459] no_lp num_bool:6511250
Phase four - Called solution callback

Between information that solution was found and execution of callback is minimally 40 seconds interval when cpu utilization is between 3-4%.

I can send you report file from VC, but it is little bit big - 30mb per 3 minutes.

@lperron
Copy link
Collaborator

lperron commented Sep 19, 2019 via email

@gregy4
Copy link
Author

gregy4 commented Sep 19, 2019

Definitely much better ... no interval where only one core is used.
This time I choose to use 16 cores but only for short time 16cores (about 50%) were used - on graph between 2:45 and 3:45. After 3:45 cpu usage is between 10 - 16% so callback is probably not the only one reason.
cpu_profile

@lperron
Copy link
Collaborator

lperron commented Sep 19, 2019 via email

@gregy4
Copy link
Author

gregy4 commented Sep 19, 2019

No, optimal solution wasn't found and probably is not reachable in reasonable time.

@lperron
Copy link
Collaborator

lperron commented Sep 19, 2019 via email

@gregy4
Copy link
Author

gregy4 commented Sep 19, 2019

I checked it on another problem, and cpu temperature was 5 degrees (67) above with constant 50% cpu utilization. For this problem temperature is at max around 63 degrees.
I see memory bandwith as more probable reason, 40gb can be quite a lot for memory controller. Unfortunatelly for amd I'm not aware about software that can measure it.

@Mizux Mizux added this to the v8.0 milestone Feb 12, 2021
@timothylowjc
Copy link

Hello!

I have roughly the same issue here, currently have my solvers running on AWS Batch using a c5.metal EC2 instance, which is 96 vCPUs and 192GB RAM.

Proto is a too large for github, but this should work

Anyway,
image

I've currently set num_search_workers to 192, on the off chance I've misunderstood something about logical threads, but it appears that CPU utilization never breaks 30%.

Doubt its a temperature problem as mentioned, given vCPUs. CPU Usage ends in the graph after a certain amount of time has passed (400 seconds).

Any help is appreciated!

@lperron lperron reopened this Mar 23, 2021
@lperron
Copy link
Collaborator

lperron commented Mar 23, 2021

You are right. I could reproduce on my machine.

@timothylowjc
Copy link

Good to know its reproducible, thanks for the hard work! Looking forward to seeing the fix on master

@lperron
Copy link
Collaborator

lperron commented Mar 24, 2021

I have done a first pass and committed to master. If possible, can you try ?

@timothylowjc
Copy link

Hi @lperron ,

Sorry about the late reply. I'll let you know sometime this week, we gotta figure out how to deploy master branches using AWS Batch first. Thanks for the fix!

@Mizux Mizux modified the milestones: v8.0, v9.0 Mar 29, 2021
@Mizux Mizux added Bug Lang: C++ Native implementation issue labels Mar 29, 2021
@timothylowjc
Copy link

image

Alright, looks like the results speak for themselves.

Thanks for the fix @lperron !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Lang: C++ Native implementation issue
Projects
None yet
Development

No branches or pull requests

4 participants