Skip to content
This repository has been archived by the owner on Nov 9, 2022. It is now read-only.

Support additional config XMLs for forests/replicas, roles and users #704

Open
tternquist opened this issue Nov 20, 2016 · 5 comments
Open

Comments

@tternquist
Copy link

I'm working with a customer that will need to create and manage dozens of nodes and likely hundreds of forests and replicas. We're able to explicitly create forests in ml-config.xml, but a challenge we have is that the forest configurations will be different between production and the lower environments. This means we would have to create different ml-configs for each of the different topologies. To complicate things further we will also have dozens of range indexes and hundreds of roles.

It would be much easier for us to manage if we could pull out pieces of the ml-config so we could have different topology files for forests, without having to keep everything else in the config in sync. Pulling out users and roles would be another big plus for us.

I've done some initial work on a fork, but want to run this approach by folks before any further testing or a PR.

I'm currently supporting three additional files: content-forests.xml, roles.xml and users.xml and have properties in ml-config for substitutions.

users.xml and roles.xml are the users and roles elements as is, just pulled from the ml-config. The content-forests.xml has the following form:

<forest-assignments>
	<primary-forests>
		<assignments xmlns="http://marklogic.com/xdmp/assignments" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://marklogic.com/xdmp/assignments assignments.xsd">
			<assignment>
				<forest-name>${app-name}-content-Forest1</forest-name>
				<host-name>xxx1.test.com</host-name>
				<replica-names>
					<replica-name>${app-name}-content-Forest1-R</replica-name>
				</replica-names>
			</assignment>
			<assignment>
				<forest-name>${app-name}-content-Forest2</forest-name>
				<host-name>xxx2.test.com</host-name>
				<replica-names>
					<replica-name>${app-name}-content-Forest2-R</replica-name>
				</replica-names>
			</assignment>
			<assignment>
				<forest-name>${app-name}-content-Forest3</forest-name>
				<host-name>xxx3.test.com</host-name>
				<replica-names>
					<replica-name>${app-name}-content-Forest3-R</replica-name>
				</replica-names>
			</assignment>
		</assignments>
	</primary-forests>
	<replica-forests>
		<assignments xmlns="http://marklogic.com/xdmp/assignments" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://marklogic.com/xdmp/assignments assignments.xsd">
			<assignment>
				<forest-name>${app-name}-content-Forest1-R</forest-name>
				<host-name>xxx2.test.com</host-name>
			</assignment>
			<assignment>
				<forest-name>${app-name}-content-Forest2-R</forest-name>
				<host-name>xxx3.test.com</host-name>
			</assignment>
			<assignment>
				<forest-name>${app-name}-content-Forest3-R</forest-name>
				<host-name>xxx1.test.com</host-name>
			</assignment>
		</assignments>
	</replica-forests>
</forest-assignments>

Please let me know what you think. I'm especially curious if this is a challenge for other customers as well.

@RobertSzkutak
Copy link
Contributor

RobertSzkutak commented Nov 20, 2016

This is definitely a growing pain-point for larger orgs I am also seeing. How do you feel about this alternative? #692 I could put this together pretty quickly for you if it would work.

@tternquist
Copy link
Author

I think that would be an improvement, but I'm concerned with ml-config becoming even more massive and prone to error. Long term, the number of roles we may be dealing with could be in the thousands.. I don't even want to think of how big the config would be then.

Something to consider is the separation of duties with respect to what goes into ml-config. For the project I'm looking at this for, the forest topology is designed and managed by the DBA team, while the sets of roles will largely be managed by a BA team (although still ultimately deployed by DBA team).

If you'd like to check out the changes i made you can see them on my fork (on master). I neglected to read the contributing to roxy guidelines before jumping in, so apologies for doing things in the wrong order!

@dmcassel
Copy link
Collaborator

The config.file property in default.properties accepts a comma-separated list of values, so you can have something like:

config.file=ml-config.xml,content-forests.xml,roles.xml,users.xml

Is that sufficient for what you're looking to do?

@tternquist
Copy link
Author

Ah, I had no idea you could do that! That works perfectly for what i want to do with users and roles, but the forests and replicas still appear to fall into a special case since you need to create forests and their replicas within assignments and also attach the primaries to the database under the database element.

If the feature Rob's suggesting were implemented and worked on any element then it would be helpful for my use case. I would prefer to be able to have a separate file to manage forest topology, but for now it's not a big deal. I can try to gather some feedback from the customer as to how various approaches would work with their process (They have ML installs, but are only just learning Roxy now).

@ezbc
Copy link
Contributor

ezbc commented Feb 10, 2017

@tternquist, I had a similar problem in a project. I made a workaround by adding a step in the app_specific.rb script which edited the configuration file based on comments specifying the environment. A new configuration file is compiled for a particular environment and is then used for the bootstrap.

Our particular implementation searches for an environment-specific configuration comment following the structure

<!-- env-config=env1,env2 -->
  <config12/>
<!-- env-config -->
<!-- env-config=env3 -->
  <config3/>
<!-- env-config -->

which a app_specific step compiles to <config12/> if deploying to environments "env1" or "env2" and compiles to <config3/> if deploying to environment "env3". Bootstrap then uses the compiled config.

For example in the /configuration/databases/database element I have

     <!-- env-config=local,dev,uat -->
      <forests-per-host>@ml.content-forests-per-host</forests-per-host>
      <forests>
        <forest-id name="@ml.content-db"/>
        @ml.forest-data-dir-xml
      </forests>
      <!-- env-config -->

so that the and elements will only be bootstrapped for my environments named "local", "dev", or "uat". When I deploy to the "prod" environment, these elements are removed from the compiled configuration.

Below is the ruby function definition I used in app_specific.rb, run before the bootstrap.

     def compile_ml_config

      # get deployment env
      env = @properties["environment"]

      # get ml config
      filename = @properties["ml.config.file"]
      filename_new = filename.gsub(".xml", '-compiled-for-' + env + '.xml')

      @logger.info("Compiled configuration file for #{env} environment to file")
      @logger.info(filename_new)

      #f = File.new(filename_new)
      s = File.open(filename, "r").read

      # find each env comment
      comment_envs = s.scan(%r{<!-- env-config=(.*) -->})
      comment_envs.each do |comment_env_array|
        comment_env = comment_env_array[0]

        # get each environment found
        envs = comment_env.split(',')

        # if env not in comment, delete everything within the comment
        if not(envs.include? env)
          comment_1 = "<!-- env-config=#{comment_env} -->"
          comment_2 = '<!-- env-config -->'
          s = s.gsub!(/#{comment_1}(.*?)#{comment_2}/m, "#{comment_1}\r\n#{comment_2}")
        end
      end

      # remove extra lines
      #s = "#{s.gsub("\n", "")}"

      File.write(filename_new, s)

      # set the configuration file to the compiled file
      @properties["ml.config.file"] = filename_new
    end

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants