Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4878 simple r install #4891

Merged
merged 14 commits into from
Aug 2, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Binary file not shown.
7 changes: 4 additions & 3 deletions doc/sphinx-guides/source/installation/external-tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,11 @@ External tools can provide additional features that are not part of Dataverse it
Inventory of External Tools
---------------------------

Support for external tools is just getting off the ground but TwoRavens has been converted into an external tool. See the :doc:`/user/data-exploration/tworavens` section of the User Guide for more information on TwoRavens from the user perspective and :doc:`r-rapache-tworavens` for more information on installing TwoRavens.
Support for external tools is just getting off the ground but the following tools have been successfully integrated with Dataverse:

- TwoRavens: a system of interlocking statistical tools for data exploration, analysis, and meta-analysis: http://2ra.vn
- Data Explorer: a GUI which lists the variables in a tabular data file allowing searching, charting and cross tabulation analysis. For installation instructions see the README.md file at https://github.com/scholarsportal/Dataverse-Data-Explorer.
- TwoRavens: a system of interlocking statistical tools for data exploration, analysis, and meta-analysis: http://2ra.vn. See the :doc:`/user/data-exploration/tworavens` section of the User Guide for more information on TwoRavens from the user perspective and the :doc:`r-rapache-tworavens` section of the Installation Guide.

- Data Explorer: a GUI which lists the variables in a tabular data file allowing searching, charting and cross tabulation analysis. See the README.md file at https://github.com/scholarsportal/Dataverse-Data-Explorer for the instructions on adding Data Explorer to your Dataverse; and the :doc:`prerequisites` section of the Installation Guide for the instructions on how to set up **basic R configuration required** (specifically, Dataverse uses R to generate .prep metadata files that are needed to run Data Explorer).
- [Your tool here! Please get in touch! :) ]

Downloading and Adjusting an External Tool Manifest File
Expand Down
111 changes: 111 additions & 0 deletions doc/sphinx-guides/source/installation/prerequisites.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. role:: fixedwidthplain

=============
Prerequisites
=============
Expand Down Expand Up @@ -269,7 +271,116 @@ If the installed location of the convert executable is different from ``/usr/bin

(see the :doc:`config` section for more information on the JVM options)

R
-

Dataverse uses `R <https://https://cran.r-project.org/>`_ to handle
tabular data files. The instructions below describe a **minimal** R
installation. It will allow you to ingest R (.RData) files as tabular
data; to export tabular data as .RData files; and to run `Data
Explorer <https://github.com/scholarsportal/Dataverse-Data-Explorer>`_
(specifically, R is used to generate .prep metadata files that Data
Explorer uses). R can be considered an optional component, meaning
that if you don't have R installed, you will still be able to run and
use Dataverse - but the functionality specific to tabular data
mentioned above will not be available to your users. **Note** that if
you choose to also install `TwoRavens
<https://github.com/IQSS/TwoRavens>`_, it will require some extra R
components and libraries. Please consult the instructions in the
TowRavens section of the Installation Guide.


Installing R
============

Can be installed with :fixedwidthplain:`yum`::

yum install R-core R-core-devel

EPEL distribution is strongly recommended. The version of R currently available from epel6 and epel7 is 3.5; it has been tested and is known to work on RedHat and CentOS versions 6 and 7.

If :fixedwidthplain:`yum` isn't configured to use EPEL repositories ( https://fedoraproject.org/wiki/EPEL ):

RHEL/CentOS users can install the RPM :fixedwidthplain:`epel-release`. For RHEL/CentOS 7::

yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

RHEL/CentOS users can install the RPM :fixedwidthplain:`epel-release`. For RHEL/CentOS 6::

yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm

RHEL users will want to log in to their organization's respective RHN interface, find the particular machine in question and:

• click on "Subscribed Channels: Alter Channel Subscriptions"
• enable EPEL, Server Extras, Server Optional

Installing the required R libraries
===================================

The following R packages (libraries) are required::

R2HTML
rjson
DescTools
Rserve
haven

Install them following the normal R package installation procedures. For example, with the following R commands::

install.packages("R2HTML", repos="https://cloud.r-project.org/", lib="/usr/lib64/R/library" )
install.packages("rjson", repos="https://cloud.r-project.org/", lib="/usr/lib64/R/library" )
install.packages("DescTools", repos="https://cloud.r-project.org/", lib="/usr/lib64/R/library" )
install.packages("Rserve", repos="https://cloud.r-project.org/", lib="/usr/lib64/R/library" )
install.packages("haven", repos="https://cloud.r-project.org/", lib="/usr/lib64/R/library" )

Rserve
======

Dataverse uses `Rserve <https://rforge.net/Rserve/>`_ to communicate
to R. Rserve is installed as a library package, as described in the
step above. It runs as a daemon process on the server, accepting
network connections on a dedicated port. This requires some extra
configuration and we provide a script (:fixedwidthplain:`scripts/r/rserve/rserve-setup.sh`) for setting it up.
Run the script as follows (as root)::

cd <DATAVERSE SOURCE TREE>/scripts/r/rserve
./rserve-setup.sh

The setup script will create a system user :fixedwidthplain:`rserve`
that will run the daemon process. It will install the startup script
for the daemon (:fixedwidthplain:`/etc/init.d/rserve`), so that it
gets started automatically when the system boots. This is an
:fixedwidthplain:`init.d`-style startup file. If this is a
RedHat/CentOS 7 system, you may want to use the
:fixedwidthplain:`systemctl`-style file
:fixedwidthplain:`rserve.service` instead. (Copy it into the
:fixedwidthplain:`/usr/lib/systemd/system/` directory)



Note that the setup will also set the Rserve password to
":fixedwidthplain:`rserve`". Rserve daemon runs under a
non-privileged user id, so there's not much potential for security
damage through unauthorized access. It is however still a good idea
**to change the password**. The password is specified in
:fixedwidthplain:`/etc/Rserv.pwd`. You can consult `Rserve
documentation <https://rforge.net/Rserve/doc.html>`_ for more
information on password encryption and access security.

You should already have the following 4 JVM options added to your
:fixedwidthplain:`domain.xml` by the Dataverse installer::

<jvm-options>-Ddataverse.rserve.host=localhost</jvm-options>
<jvm-options>-Ddataverse.rserve.port=6311</jvm-options>
<jvm-options>-Ddataverse.rserve.user=rserve</jvm-options>
<jvm-options>-Ddataverse.rserve.password=rserve</jvm-options>

If you have changed the password, make sure it is correctly specified
in the :fixedwidthplain:`dataverse.rserve.password` option above. If
Rserve is running on a host that's different from your Dataverse
server, change the :fixedwidthplain:`dataverse.rserve.host` option
above as well (and make sure the port 6311 on the Rserve host is not
firewalled from your Dataverse host).

Now that you have all the prerequisites in place, you can proceed to the :doc:`installation-main` section.

Expand Down
71 changes: 50 additions & 21 deletions doc/sphinx-guides/source/installation/r-rapache-tworavens.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,14 @@ of Dataverse v.4.6.1) version of the installer scripts and updated this guide. W
installation process, particularly the difficult process of installing
correct versions of the required third party R packages.

**Note that the installation process below supercedes the basic R
setup described in the "Prerequisites" portion of the Installation
Guide. Meaning that once completed, it installs everything needed to
run TwoRavens, PLUS all the libraries and components required to
ingest RData files, export as RData, and use Data Explorer.**



Please be warned:

- This process may still require some system administration skills.
Expand Down Expand Up @@ -92,7 +100,6 @@ install TwoRavens in the past, and it didn't work, please see the part of
section ``1.b.`` where we explain how to completely erase all the previously
built packages.


1. Prerequisites
++++++++++++++++

Expand Down Expand Up @@ -132,18 +139,16 @@ change it to
b. R:
-----

The simplest way to install R on RHEL/CentOS 6 systems is with yum, using the EPEL repository:
The simplest way to install R on RHEL/CentOS systems is with yum, using the EPEL repository::

yum install epel-release
yum install R R-devel

EPEL6 provides R-3.3, which is known to work well. Some installations have run EPEL7's former 3.4 release with success, but EPEL7 currently provides R-3.5, a significant release with many new features which may challenge backwards compatibility. You may wish to compile the older 3.3 or 3.4 versions [from source](https://cran.r-project.org/src/base/R-3/).
yum install R-core R-core-devel

If you have an installed R 3.3 or 3.4 installation from EPEL, you may lock that version in place using the yum versionlock plugin, or simply add this line to the "epel" section of /etc/yum.repos.d/epel.repo::
Both EPEL6 and EPEL7 currently provide R 3.5, which has been tested and appears to work well. R 3.4, offered by EPEL until also works well. We recommend using the currently available EPEL version for all the new installations. But if you already have a working R 3.4 installation from EPEL and you don't have a specific need to upgrade, you may lock that version in place using the ``yum-versionlock`` yum plugin, or simply add this line to the "epel" section of /etc/yum.repos.d/epel.repo::

exclude=R-*,openblas-*,libRmath*

RHEL users may want to log in to their organization's respective RHN interface, find the particular machine in question and:
RHEL users may need to log in to their organization's respective RHN interface, find the particular machine in question and:

• click on "Subscribed Channels: Alter Channel Subscriptions"
• enable EPEL, Server Extras, Server Optional
Expand All @@ -154,27 +159,44 @@ R completely**, erasing all the extra R packages that may have been already buil

Uninstall R::

yum erase R R-devel
yum erase R-core R-core-devel

Wipe clean any R packages that were left behind::

rm -rf /usr/lib64/R/library/*
rm -rf /usr/share/R/library/*

... then install R with :fixedwidthplain:`yum`.
... then re-install R with :fixedwidthplain:`yum install`

c. rApache:
-----------

For RHEL/CentOS 6, we recommend that you download :download:`rapache-1.2.6-rpm0.x86_64.rpm <../_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.6-rpm0.x86_64.rpm>` and install it with::
We maintain the following rpms of rApache, built for the following version of RedHat/CentOS distribution:

For RHEL/CentOS 6 and R 3.4, download :download:`rapache-1.2.6-rpm0.x86_64.rpm <../_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.6-rpm0.x86_64.rpm>` and install it with::

yum install rapache-1.2.6-rpm0.x86_64.rpm

If you are using RHEL/CentOS 7, you can download our experimental :download:`rapache-1.2.7-rpm0.x86_64.rpm <../_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.7-rpm0.x86_64.rpm>` and install it with::
For RHEL/CentOS 6 and R 3.5, download :download:`rapache-1.2.9_R-3.5-RH6.x86_64.rpm <../_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.9_R-3.5-RH6.x86_64.rpm>` and install it with::

yum install rapache-1.2.9_R-3.5-RH6.x86_64.rpm

If you are using RHEL/CentOS 7 and R 3.4, download :download:`rapache-1.2.7-rpm0.x86_64.rpm <../_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.7-rpm0.x86_64.rpm>` and install it with::

yum install apache-1.2.7-rpm0.x86_64.rpm

If you are using RHEL/CentOS 7 in combination with R-3.5, you may install :download:`rapache-1.2.9_R-3.5.x86_64.rpm <../_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.9_R-3.5.x86_64.rpm>` which was built against R-3.5.
If you are using RHEL/CentOS 7 in combination with R 3.5, download :download:`rapache-1.2.9_R-3.5.x86_64.rpm <../_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.9_R-3.5.x86_64.rpm>` and install it with::

yum install rapache-1.2.9_R-3.5.x86_64.rpm

**Please note:**
The rpms above cannot be *guaranteed* to work on your
system. You may have a collection of system libraries installed on
your system that will create a version conflict. If that's the case,
or if you are trying to install on an operating system that's listed
above, do not despair: simply build rApache from `source
<http://rapache.net/downloads.html>`_ . **Make sure** to build with
the R that's the same version you are planning on using.

d. Install the build environment for R:
---------------------------------------
Expand All @@ -192,25 +214,32 @@ Depending on how your system was originally set up, you may end up needing to in

We provide a shell script (``r-setup.sh``) that will try to install all the needed packages. **Note:** the script is now part of the TwoRavens distribution (it **used to be** in the Dataverse source tree).


The script will attempt to download the packages from CRAN (or a mirror), so the system must have access to the Internet.

In order to run the script:

Download the TwoRavens distribution from `https://github.com/IQSS/TwoRavens/archive/a6869eb.zip <https://github.com/IQSS/TwoRavens/archive/a6869eb.zip>`_.
Note that the link above points to a specific snapshot of the sources. Do not download the master distribution, as it may have changed since this guide, and
the installation scripts were written.
Download the current snapshot of the "dataverse-distribution" branch
of TwoRavens from github:
`https://github.com/IQSS/TwoRavens/archive/dataverse-distribution.zip
<https://github.com/IQSS/TwoRavens/archive/dataverse-distribution.zip>`_.
Once again, it is important that you download the
"dataverse-distribution" branch, and NOT the master distribution!
Unpack the zip file, then run the script::

unzip a6869eb.zip
cd TwoRavens-a6869eb28693d6df529e7cb3888c40de5f302b66/r-setup
unzip dataverse-distribution.zip
cd TwoRavens-dataverse-distribution/r-setup
chmod +x r-setup.sh
./r-setup.sh


See the section ``II.`` of the Appendix for trouble-shooting tips.

For the Rserve package the setup script will also create a system user :fixedwidthplain:`rserve`, and install the startup script for the daemon (``/etc/init.d/rserve``).
The script will skip this part, if this has already been done on this system (i.e., it should be safe to run it repeatedly).
For the Rserve package the setup script will also create a system user
:fixedwidthplain:`rserve`, and install the startup script for the
daemon (``/etc/init.d/rserve``). The script will skip this part, if
this has already been done on this system (i.e., it should be safe to
run it repeatedly).

Note that the setup will set the Rserve password to :fixedwidthplain:`"rserve"`.
Rserve daemon runs under a non-privileged user id, and there appears to be a
Expand Down Expand Up @@ -238,7 +267,7 @@ b. Rename the resulting directory "dataexplore" ...

...and place it in the web root directory of your apache server. We'll assume ``/var/www/html/dataexplore`` in the examples below::

mv TwoRavens-a6869eb28693d6df529e7cb3888c40de5f302b66 /var/www/html/dataexplore
mv TwoRavens-dataverse-distribution /var/www/html/dataexplore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is better than a6869eb? I guess we want to be able to change or fix the distributions of TwoRavens. Longer term, the TwoRavens team should tag their releases with version numbers.



c. run the installer
Expand Down Expand Up @@ -306,7 +335,7 @@ Compare the two files. **It is important that the two copies are identical**.
- unless this is a brand new Dataverse installation, it may have cached summary statistics fragments that were produced with the older version of this R code. You **must remove** all such cached files::

cd <DATAVERSE FILES DIRECTORY>
find . -name '*.prep' | while read file; do /bin/rm $f; done
find . -name '*.prep' | while read file; do /bin/rm $file; done

*(Yes, this is a HACK! We are working on finding a better way to ensure this compatibility between
TwoRavens and Dataverse!)*
Expand Down
14 changes: 14 additions & 0 deletions scripts/r/rserve/Rserv.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
workdir /tmp/Rserv
pwdfile /etc/Rserv.pwd
remote enable
auth required
plaintext disable
fileio enable

port 6311
maxinbuf 262144

maxsendbuf 0
gid 97
uid 97

1 change: 1 addition & 0 deletions scripts/r/rserve/Rserv.pwd
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
rserve rserve
62 changes: 62 additions & 0 deletions scripts/r/rserve/rserve-setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
#!/bin/sh

echo
echo "Configuring Rserve."
echo

sleep 10


echo
echo "checking if rserve user already exists:"

RSERVEDIR=/tmp/Rserv

/usr/sbin/groupadd -g 97 -o -r rserve >/dev/null 2>/dev/null || :
/usr/sbin/useradd -g rserve -o -r -d $RSERVEDIR -s /bin/bash \
-c "Rserve User" -u 97 rserve 2>/dev/null || :

echo

if [ ! -f /etc/Rserv.conf ]
then
echo "installing Rserv configuration file."
install -o rserve -g rserve Rserv.conf /etc/Rserv.conf
echo
else
echo "Rserve configuration file (/etc/Rserv.conf) already exists."
fi

if [ ! -f /etc/Rserv.pwd ]
then
echo "Installing Rserve password file."
echo "Please change the default password in /etc/Rserv.pwd"
echo "(and make sure this password is set correctly as a"
echo "JVM option in the glassfish configuration of your DVN)"
install -m 0600 -o rserve -g rserve Rserv.pwd /etc/Rserv.pwd
echo
else
echo "Rserve password file (/etc/Rserv.pwd) already exists."
fi

if [ ! -f /etc/init.d/rserve ]
then
echo "Installing Rserve startup file."
install rserve-startup.sh /etc/init.d/rserve
chkconfig rserve on
echo "You can start Rserve daemon by executing"
echo " service rserve start"
echo
echo "If this is a RedHat/CentOS 7 system, you may want to use the systemctl file rserve.service instead (provided in this directory)"
else
echo "Rserve startup file already in place."
fi

echo
echo "Successfully installed Dataverse Rserve framework."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have all this code exercised in Vagrant some day.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be very simple to add this to vagrant, yes. What this script does is super straightforward/non-controversial.

echo


service rserve start

exit 0