Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completes https://jira.duraspace.org/browse/FCREPO-2402 #1182

Merged
merged 3 commits into from Apr 9, 2017

Conversation

bbranan
Copy link
Contributor

@bbranan bbranan commented Mar 16, 2017

Adds new repository.json files to support binary storage in S3 along with object storage in mysql, postgresql, and as a file.

Performance of these configuration options have been tested and recorded (by Daniel Bernstein) here: https://wiki.duraspace.org/display/FF/Many+Members+Performance+Testing

Configuration to support binary storage in S3 is currently listed here: https://github.com/fcrepo4-labs/fcrepo-webapp-plus-cloud. In short, the additional parameters are as follows:

JAVA_OPTS="${JAVA_OPTS} -Daws.accessKeyId="
JAVA_OPTS="${JAVA_OPTS} -Daws.secretKey="
JAVA_OPTS="${JAVA_OPTS} -Daws.bucket="

},
"binaryStorage" : {
"type" : "s3",
"username" : "${aws.accessKeyId}",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the other fcrepo configuration elements, would you please change the following:

aws.accessKeyId  -> fcrepo.aws.accessKeyId
aws.secretKey  -> fcrepo.aws.secretKey
aws.bucket  -> fcrepo.aws.bucket

...for each of the three configuration files in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about adding support for JAVA_OPTS="${JAVA_OPTS} -Daws.secretKey=..."

This effectively encourages Fedora users to put their AWS secret key in the JAVA_OPTS configuration, which can be retrieved by anyone with shell access to the machine:

ps -ax | grep aws.secretKey

While this is certainly convenient, I'm not sure that it is a usage pattern that should be encouraged, let alone supported.

These same concerns apply to the use of database passwords.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main alternative that I'm familiar with is storing this kind of sensitive information in a file (which has more robust protection from other users). I don't know if there is any support for reading Java system properties from a file, or if we'd need to manage that process ourselves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best alternative, given the current structure of the application, is to encourage users to create a custom repository.json file with the appropriate sensitive information included there. That way, any sensitive information can be easily protected.

The implication of this approach is to actually remove the database (and this current) configuration from fcrepo-config and simply document how they should be used separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a separate configuration file with the sensitive info directly in the file is certainly possible now — but I also think it's fine to continue using system properties, which can be convenient for deployments that aren't concerned about hostile users having access to their process list. Maybe keeping the system properties as-is, but adding a note about the security implications (in the config files themselves and to the wiki where the usage is described) is a good way to address the problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not proper JSON, but Modeshape does not appear to care if you include comments e.g.,

       "username" : "${fcrepo.postgresql.username}",
        // Please note that passing passwords using command-line parameters...
        "password" : "${fcrepo.postgresql.password}"
    }

So it's possible that Modeshape is already pre-processing the JSON, or using a JSON library that tolerates comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The system property names aws.accessKeyId and aws.secretKey were chosen specifically to coordinate with the documented AWS credentials chain, as described here: http://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-roles.html. Changing them to add a fcrepo. prefix will work, but won't be consistent with an existing AWS pattern. The "aws.bucket" property was chosen simply to coordinate with the other two, so its name can be whatever is preferred.

With regard to security, using external repository.json files seems an appropriate recommendation to include in the documentation. Including these repository.json files in the baseline is useful for reference (this is what would be copied out to an external file) and use in development. I'm not excited about making these invalid json documents by adding comments in-line, given that modeshape could swap out their parser at any time.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bbranan , in addition to adding documentation in the wiki detailing the suggestion to externalize production repository.json files, an alternative to adding inline comments in the json files would be to add a README to the resources/config/ directory detailing the recommendation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awoods Using a README would certainly be my preference over inline comments. To avoid duplication and the inevitable drift in consistency, it might be best to just include a link in the README to the recommendations which are captured in the wiki, once those are written. This would seem to fit nicely into the follow-on ticket you suggested in fcrepo-2402.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a concern for AWS keys (which, if generated and managed properly, can be granted narrow access only to the specific S3 bucket needed), it's also a concern for the database credentials used in the mysql and postgres configs. IOW, the issue isn't specific to whether this PR should be merged as-is. The code in this PR is consistent with current practice – passing everything through JAVA_OPTS. If that's not a good practice, that's probably best handled by a larger, top-level issue that also changes the handling of other secrets at the same time.

@awoods
Copy link

awoods commented Mar 24, 2017

Unrelated to the content of this pull-request and in an attempt to make pull-requests more transparent at a glance, in the future, please ensure the subject text of the pull-request is a brief but meaningful description:
https://wiki.duraspace.org/display/FF/Git+Workflow#GitWorkflow-Committing

A JIRA number as the subject is opaque for all except the person/people who are actively thinking about a give JIRA ticket.

As you can see, this is an issue that is beginning to proliferate:
https://github.com/fcrepo4/fcrepo4/pulls

@awoods
Copy link

awoods commented Apr 2, 2017

I have created the following documentation ticket:
https://jira.duraspace.org/browse/FCREPO-2421

I suggest squash/merging this ticket, with the assumption that the above ticket will be worked in the very near term.

@escowles
Copy link
Contributor

escowles commented Apr 6, 2017

@awoods This PR looks fine to me, but I don't have S3 setup to test it. I take your comments and including this in the tech call agenda to mean you've tested it and just want another committer to merge?

@awoods
Copy link

awoods commented Apr 9, 2017

@escowles Correct. I am just looking for another set of eyes. I have successfully verified the functionality, as has @dbernstein as documented here:
https://wiki.duraspace.org/display/FF/Many+Members+Performance+Testing#ManyMembersPerformanceTesting-n-members.sh (bottom rows in table, as well as subsequent tables)

@escowles escowles merged commit 3a8b785 into fcrepo:master Apr 9, 2017
dbernstein pushed a commit that referenced this pull request Jun 20, 2017
* FCREPO-2402 - Adds initial config to allow for using AWS S3 for binary storage
* Adds garbageCollection block to new repository.json files based on FCREPO-2406
* Adds cacheSize setting to new repository.json files based on changes recommended in FCREPO-2105
dbernstein pushed a commit that referenced this pull request Jul 7, 2017
* FCREPO-2402 - Adds initial config to allow for using AWS S3 for binary storage
* Adds garbageCollection block to new repository.json files based on FCREPO-2406
* Adds cacheSize setting to new repository.json files based on changes recommended in FCREPO-2105
awoods pushed a commit that referenced this pull request Aug 15, 2017
* FCREPO-2402 - Adds initial config to allow for using AWS S3 for binary storage
* Adds garbageCollection block to new repository.json files based on FCREPO-2406
* Adds cacheSize setting to new repository.json files based on changes recommended in FCREPO-2105
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants