Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

12311 auto memory #2595

Merged
merged 50 commits into from Jul 9, 2014
Merged

12311 auto memory #2595

merged 50 commits into from Jul 9, 2014

Conversation

joshmoore
Copy link
Member

See: http://trac.openmicroscopy.org.uk/ome/ticket/12311

This PR removes the need to use perl or sed to modify the configuration properties on deploy. In fact, all references to -Xmx and MaxPermGen have been removed from templates.xml. Instead, the omero.install.memory module now calculates the appropriate settings on every startup and sets them in config.xml.

Things to check:

  • bin/omero admin memory command should modify config.xml and print out what it's doing
  • bin/omero config set can be used to modify those values
  • bin/omero admin start should make use of those properties.

Items that could use work/discussion:

  • tests for PercentStrategy (done)
  • addition of min/max values (max done)
  • optimization of default percents
  • other strategies? (adaptive added; suggestions welcome)

/cc @kennethgillen @sbesson @manics @rleigh-dundee @stick @bpindelski

So each strategy can make decisions based on what
type of service it's responsible for, the first
ctor argument is now 'name'.
All memory settings from templates.xml have now been removed
in favor of a single `${omero.mem.NAME.option}` property which
is set during the `start(async)` and `deploy` commands by
calling memory.adjust_settings.
This required passing a complete Settings object to the
Strategy constructor so that we could use the default
property resolution before creating the instance.
If available psutil will be used to check the free
and total physical memory for the system. Otherwise,
a call will be made to Java which uses the JMX OS
bean to get the same values. These are then used by
the `PercentStrategy`.
For use in the templates.xml file, all of our
properties must also be available as variables
before loading the templates.
@joshmoore
Copy link
Member Author

Opening for discussion, but this will need 1-2 more commits.

The `<option/>` element does not handle splitting
on whitespace, so there must be a number of properties
rather than one large one.
@joshmoore
Copy link
Member Author

With the last commits, this now works, but is less pretty than I would like:

$ bin/omero config get
...
omero.mem.blitz.generated_dump=
omero.mem.blitz.generated_heap=-Xmx10093m
omero.mem.blitz.generated_perm=-XX:MaxPermSize=128m
omero.mem.indexer.generated_dump=
omero.mem.indexer.generated_heap=-Xmx2523m
omero.mem.indexer.generated_perm=-XX:MaxPermSize=128m
omero.mem.pixeldata.generated_dump=
omero.mem.pixeldata.generated_heap=-Xmx5046m
omero.mem.pixeldata.generated_perm=-XX:MaxPermSize=128m
omero.mem.repository.generated_dump=
omero.mem.repository.generated_heap=-Xmx2523m
omero.mem.repository.generated_perm=-XX:MaxPermSize=128m
...
omero.web.application_server=development
omero.web.debug=true

I could try to hide these if that would be preferable.

@joshmoore joshmoore removed the exclude label Jun 10, 2014
@joshmoore
Copy link
Member Author

Exclude removed. I haven't done any configuration on octopus for tomorrow. Probably best to get this in to start testing performance, but review might should wait on a documentation PR. @sbesson has suggested we might should run this via the breaking queue.

@joshmoore
Copy link
Member Author

Docs PR opened to help explain what's going on here: ome/omero-documentation#819

@sbesson
Copy link
Member

sbesson commented Jun 11, 2014

http://ci.openmicroscopy.org/view/Failing/job/OMERO-5.0-merge-build/ICE=3.4,label=ome-c6100-1/22/console and http://ci.openmicroscopy.org/job/OMERO-5.0-merge-integration/296/console failed to start with this PR in. Logs copied under /home/hudson/OMERO-5.0-merge-integration_296_logs with the following in master.err:

Error: Could not find or load main class 
Error: Could not find or load main class 
-! 06/11/14 05:35:54.511 OMERO.Glacier2: warning: unable to contact permissions verifier `BlitzVerifier@BlitzAdapters'
   Reference.cpp:1566: Ice::NoEndpointException:
   no suitable endpoint available for proxy `BlitzVerifier -t @ BlitzAdapters'
-! 06/11/14 05:35:54.526 OMERO.Glacier2: warning: unable to contact session manager `BlitzManager@BlitzAdapters'
   Reference.cpp:1566: Ice::NoEndpointException:
   no suitable endpoint available for proxy `BlitzManager -t @ BlitzAdapters'
Error: Could not find or load main class 

Excluding and rebuilding

manics and others added 3 commits July 1, 2014 18:24
Also test use_total with min_total/max_total
To simplify the logic, AS instances now just
scale *up* the percentage of active RAM (as
specified by `use_total` and `max_total`)
when more memory is available.
To simplify the use of `AdaptiveStrategy`, the
total memory is used rather than the min/max'd
"active" memory. In order to modify the value,
set `use_total`. Scaling, however, cuts off at
24000.
@manics
Copy link
Member

manics commented Jul 2, 2014

What am I missing?

$ ./dist/bin/omero config get | grep omero.mem
omero.mem.use_total=32000

$ ./dist/bin/omero admin memory
Memory settings:
================
blitz=Settings({'use_total': '32000'}) -Xmx4800m -XX:MaxPermSize=1g
indexer=Settings({'use_total': '32000'}) -Xmx3200m -XX:MaxPermSize=1g
pixeldata=Settings({'use_total': '32000'}) -Xmx4800m -XX:MaxPermSize=1g
repository=Settings({'use_total': '32000'}) -Xmx3200m -XX:MaxPermSize=1g```

df94a59 should limit 32000 to 24000, and use 2*perc = 30% for Blitz, which should be 7200 not 4800.

@joshmoore
Copy link
Member Author

The call to get_heap_size is calling PercentStrategy's calculate_heap_size which is using active:

233  ->         percent = self.get_percent()
234             calculated = active * int(percent) / 100
235             return calculated

😦 I'll likely override another method to short-circuit this.

@joshmoore
Copy link
Member Author

About to push a commit. Problem is by removing 'active' from calculate_heap_size there's no upper limit again. So with 2.0 * perc with get:

      4000    -Xmx600
      8000   -Xmx1440
     12000   -Xmx2520
     16000   -Xmx3840
     20000   -Xmx5400
     24000   -Xmx7200
     28000   -Xmx8400
     32000   -Xmx9600

and with 1.5x we get:

      4000    -Xmx600
      8000   -Xmx1280
     12000   -Xmx2160
     16000   -Xmx3040
     20000   -Xmx4200
     24000   -Xmx5280
     28000   -Xmx6160
     32000   -Xmx7040

Previous commits removed the "active" memory level from
interpolation calculations, but the calculate_heap_size
method was still using "active" internally, leading to an
early cutoff.

With this, the adaptive strategy stops scaling up the
percentage of memory that it uses, but it does keep
using more memory as more is available.

Also improved docs.
@mtbc
Copy link
Member

mtbc commented Jul 2, 2014

Travis failure.

@joshmoore
Copy link
Member Author

Travis Guava download failure; restarted

@manics
Copy link
Member

manics commented Jul 3, 2014

We should've scoped this before starting ... i.e. who does this need to work for. I reckon it's OK as is for general users, if we accept that we'll need to configure our own limit internally.

[hudson@ome-c6100-1 ~]$ OMERO-5.0-merge-deploy/bin/omero admin memory
Memory settings:
================
blitz=Settings({}) -Xmx40596m -XX:MaxPermSize=1g
indexer=Settings({}) -Xmx27064m -XX:MaxPermSize=1g
pixeldata=Settings({}) -Xmx40596m -XX:MaxPermSize=1g
repository=Settings({}) -Xmx27064m -XX:MaxPermSize=1g

@joshmoore
Copy link
Member Author

I'd propose adding back in max_total which was intended to prevent exactly this. Then we could decide if one uses max_total or use_total to up what's used. (And I'm open to other name suggestions).

@manics
Copy link
Member

manics commented Jul 3, 2014

Would the limit only apply to the adaptive strategy?

In an attempt to get this merged, AdaptiveStrategy has
been removed and the permgen logic migrated to the
PercentStrategy. In addition, the default max_total for
PercentStrategy has been bumped to 48000 which gives a
default max for blitz (assuming RAM is sufficient) of
7200MB which is close enough to our production settings
to not matter.
@joshmoore
Copy link
Member Author

@manics, pushed. See description of joshmoore@8de2ae1

@manics
Copy link
Member

manics commented Jul 9, 2014

Good to merge.

My only concern (which can be handled later) is the parameter name use_total. To me "use" suggests the actual amount of memory that'll be used, whereas it's the amount used for the calculation of how much to use.

@joshmoore
Copy link
Member Author

@manics: I do need to open one more naming PR so you'll get a chance for specifically that. Merging so this can get onto latest, etc. Be thinking of suggestions 😄 : mem_cap, visible_memory, etc.

@joshmoore joshmoore removed the ON_HOLD label Jul 9, 2014
joshmoore added a commit that referenced this pull request Jul 9, 2014
@joshmoore joshmoore merged commit e956585 into ome:dev_5_0 Jul 9, 2014
@joshmoore joshmoore deleted the 12311-auto-memory branch July 9, 2014 09:57
@joshmoore
Copy link
Member Author

--rebased-to #2778

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants