FAQ

nadyawilliams edited this page Jun 7, 2014 · 10 revisions

These questions and answers have been culled from some of the Rocks mailing list. If you see a useful answer there or anywhere else, please add it here.

To search the mailing list, try this link

Table of Contents

Installation

Q: How do I burn a CD or DVD with an ISO image on Windows?

or CopyToDVD from VSO Software. https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2006-March/017419.html

Q: Which linux distributions are compatible with Rocks 6.1.1?

A: RPMS from CentOS 6.5 are compatible. Here is a mirror with RPMS for various architectures: http://mirror.stanford.edu/pub/mirrors/centos/6.5/os/x86_64/

Q: How do I work around frontend installation problems?

A: A common workaround for problems with the frontend installer is to type frontend mem=1024M at the boot prompt. This helps the frontend installer work correctly on frontends that have several gigabytes of RAM. Another common source of problems is a corrupted CD or DVD, so be sure to verify the checksums and burn at a slow speed. You may also want to specify networking parameters as build ip=<ip addr> netmask=<netmask> gw=<gateway> dns=8.8.8.8 ks-device=<mac of public>

Q: Why isn't insert-ethers working?

A: The insert-ethers program parses the /var/log/messages and /var/log/httpd/ssl_access_log files. Specifically, it looks for DHCPDISCOVER entries in the former. If insert-ethers is started after the compute node sends DHCP requests, it is possible that it will miss the DHCPDISCOVER entries, since syslog will report "last message repeated N times" instead. If you see this message, try bouncing syslogd:

service syslog restart

Configuration

Q: Is there a guide for configuring extend-compute.xml file to customize the compute nodes?

A: Depending on your version of Rocks:

If-then Statements

You can use if-then statements in extend-compute.xml and extend-nas.xml files. They are perhaps only useful for extend-nas.xml files, if you have non-uniform hardware or do specific things with certain NAS appliances. The format is a bit different, depending on whether you're inside the <post></post> section or not.

To selectively install certain packages (found in /home/install/contrib/5.0/*/RPMS/):

<kickstart>
<main>
</main>

<package>emacs</package>
<package>emacs-common</package>
<package>emacs-leim</package>


<eval mode="xml">
if [ "$Node_Hostname" == "nas-0-2"  ]; then
echo "&amp;amp;lt;package&amp;amp;gt;kmod-xfs&amp;amp;lt;/package&amp;amp;gt;"
echo "&amp;amp;lt;package&amp;amp;gt;dmapi&amp;amp;lt;/package&amp;amp;gt;"
echo "&amp;amp;lt;package&amp;amp;gt;xfsdump&amp;amp;lt;/package&amp;amp;gt;"
echo "&amp;amp;lt;package&amp;amp;gt;xfsprogs&amp;amp;lt;/package&amp;amp;gt;"
echo "&amp;amp;lt;package&amp;amp;gt;kernel&amp;amp;lt;/package&amp;amp;gt;"
fi
</eval>

<post>
...

To selectively execute certain post-install snippets of code:

...

<post>
if [ "<var name="Node_Hostname"/>" == "nas-0-0" ]; then


chkconfig xfs on

<file name="/etc/exports" mode="append">
/media/usbdisk  10.0.0.0/255.0.0.0(rw,no_root_squash,async)
</file>

mkdir /mnt/raid

sed -i "s/id:5:initdefault/id:3:initdefault/" /etc/inittab

fi 

...
</post>
</kickstart>

Q: Why is there a 5 second delay when I ssh to compute nodes?

A: Often this has to do with X11 forwarding. Try:

# ssh -x compute-0-0
If that is faster, you can set "ForwardX11 no" /etc/ssh/ssh_config.

Alternately, you can try adding "ForwardX11Trusted yes" to /etc/ssh/ssh_config (in addition enabling "ForwardX11").

Q: Why do I always get a password prompt when launching a job or logging into the compute node?

A: This is because rocks currently doesn't yet add the user's public key as an authorized key on each compute node automatically, or it may be a problem with the mounting of the /export/home file system on the compute nodes. Below is quoted from a post by G. Bruno:

as a normal user try:

    $ ssh-agent $SHELL
    $ ssh-add
    $ cluster-fork hostname

if you are still asked for a password, then as root try:

    # make -C /var/411 force
    # cluster-fork 'service autofs reload'

then, as a normal user try:

    $ ssh-agent $SHELL
    $ ssh-add
    $ cluster-fork hostname

if that still asks for a password, as root try:

    # cluster-fork '411get --all'
    # cluster-fork 'service autofs reload'

then, as a normal user try:

    $ ssh-agent $SHELL
    $ ssh-add
    $ cluster-fork hostname

Administration

Q: Do you have tips for interacting with the RPM framework?

A. Yes!

 '''rpm -qa'''   ## lists all currently installed packages
 '''rpm -qa | grep whatever''' ## checks to see if "whatever" is installed.
 '''rpm -V'''    ## lists differences between rpm's database and current files
 '''rpm -qil packagename''' ## find in rpm database about "packagename".
           ## package name without version number, something like "gcc" or "firefox"
 '''rpm --rebuilddb''' ## rebuilds the rpm database. 

Suppose you have downloaded an RPM called "packagename.x86_64.rpm".

 rpm -qilp packagename.x86_64.rpm ## give info "i" and file list "l" on a given rpm package
 rpm -Uvh packagename.x86_64.rpm ## install the package in upgrade mode, replaces older version
 rpm -ivh packagename.x86_64.rpm ## 

Yum is a convenience program that is capable of scanning remote repositories and downloading or installing rpm packages. The repositories are added as files in /etc/yum.repos.d and all repositories should provide a PGP security key. yumex is a graphical interface to yum. One should be cautious about adding repositories to the yum framework because they might introduce inconsistencies.

When yum runs, it downloads packages into a directory structure under /var/cache/yum. If the administrator adds the option

 keepcache=1

in /etc/yum.conf, then all downloaded rpms will be saved. This can be useful for record keeping or testing of packages.

Paul Johnson pauljohn@ku.edu 2010-01-03

Q: How can I install new/updated RPM packages in all nodes?

I will keep accumulating information and add it here when its ready. I've run this by the Rocks email list for suggestions once. The only weakness that I cannot fix is that my comments are confined to my experience with Rocks 5.2 and 5.3, and so I do not address the needs of people who are running previous or future versions.

The first step is to check to see if the program is packaged in an rpm that is already available in your Rocks install collection.

If you installed Rocks using the full OS set (not the smaller excerpt from the OS provided by Rocks), then there are many packages available that are not yet installed. When I installed Rocks, I used the Centos Final release DVD as the OS roll. All of the RPMs in the distribution were copied to /state/partition1/rocks/install/rolls/Final... and torrent files are created for all of them. Those same RPM files appear under /state/partition1/rocks/install/rocks-dist/x86_64 and they are accessible at /var/www/html/install/rocks-dist/x86_64 as a yum repository. (Are these copied or hard linked files? I can't tell for sure, it appears to be a "mount --bind" usage).

Note that the rpm packages have the "actual software" in them, the torrent files in the same directories have the bittorrent configurations needed to distribute the burden of sharing the rpm files among the compute nodes within the cluster.

You could inspect the rpm files under /state/partition1/rocks/install/rolls, but it is also easy to use yum to check on that. For example, to search for a package "gcc-objc", run

  yum list gcc-objc

There are also "info" and "search" options for yum.

Case 1. The rpm file is already available.

If the rpm is available, a quick step to install it is to run "yum install gcc-objc" on all nodes. That would be tedious. To hit them all, run this as root:

 rocks run host "yum -y install gcc-objc"

That change will be temporary, it will be forgotten when the compute nodes are re-installed.

In order to make it permanent, it is necessary to make the change in the extend-compute.xml file that is described in the Rocks User Guide section "Adding Packages to Compute Nodes." One must insert the package name in the extend-compute.xml file and then run:

 cd /export/rocks/install
 rocks create distro

Case 2. The rpm file is not available in the cluster.

Following the instructions in "Adding Packages to Compute Nodes," place a copy of the RPM file in this folder

/state/partition1/rocks/install/contrib

Then add the name of the rpm to extend-compute.xml. And then run:

 cd /export/rocks/install
 rocks create distro

A "quick fix" to install the RPMs into the running systems would be to run "yum install XYZ" on all systems. Since those package names were added to extend-compute.xml, they will be reinstalled when the compute nodes are reinstalled.

Case 3. An old version of the package is available in the Rocks framework and installed in the nodes, but you want to use a newer one.

If you have a new version of the rpm available, the first thing is to try a manual install to make sure it works as expected (run "rpm -Uvh XYZ.rpm") and then test.

If the program works as desired, then copy the rpm files into the contrib folder, and rerun the "rocks create distro" command. That command triggers a series of actions, one of which is to update the yum repository update, and the new version of the package will be available for yum updates.

Useful Tidbit 1

I tested this yesterday and I believe it is correct. I believe it is tedious to edit extend-compute.xml over and over again to list every little package you want to add. Not only is it tedious, it may also be harmful if there are slight changes in the makeup of the toolchain for some packages.

One way to simplify this is to add "meta package" names to the extend-compute.xml file. For example, we use the statistical program R. R is distributed in several separate packages, one of which is called R. That package has no files in it, but it requires a list of packages, the ones that actually do work. If we add R to the extend-compute.xml file, then the yum system will "take note" of the fact that R requires R-core, R-devel, libRmath, and whatever else is the "flavor of the month." It is NOT necessary to list out all of these individual packages in extend-compute.xml. Only the meta package is required.

If you have never built an RPM, perhaps this is hard to imagine. But if you spend just a little while studying the rpm build process, you will see it is very easy to create a meta package. Here is a succinct example of a spec file that will create a meta package, the only purpose of which is to require other packages.

 Summary: My Meta Package includes Lots
 Name: mymeta
 Version: 1.0
 Release: 1
 Vendor: My Corporation
 Group: Development/Libraries
 License: GPL
 Requires: java-1.6.0-openjdk >= 1.6.0.0
 Requires: java-1.6.0-openjdk-devel >= 1.6.0.0
 Requires: gtk+-devel >= 2.0
 Requires: pretend.package
 BuildArch: noarch
 %description
 This may be the smallest valid spec file
 %files
 %changelog
 * Tue Jan 03 2010 Paul Johnson pauljohn32@gmail.com - 1.0
 - Initial version

Supposing you have the rpmbuild package installed and you have created the hierarchy of directories required for making RPMS, place that in the SPEC directory under the name "mymeta.spec" and run "rpm -ba mymeta.spec", it should create the rpm in RPMS/noarch.

Useful Tidbit 2

The RedHat "createrepo" program does the work of scanning a directory structure and building a list that programs like yum can access. It is a fully recursive program, so in the contrib folder, one is free to create subdirectories in order to keep programs tidy. I also have some scripts that can scan through a directory structure and spot "old" versions of packages, I'll try to remember to post them here.

Useful Tidbit 3: RPM tricks I need to try or ask about

It seems like an "obvious" thing to erase an outmoded rpm package from /state/partition1/rocks/install/rolls, but I have not been brave enough to try that yet.

I also think there is some sense in taking the newer versions of rpm packages and placing them into the install/rolls folder hierarchy, but I don't know if it would work. It would work as far as createrepo is concerned, but it might not work with the torrent framework.

Useful Tidbit 4: Make a new roll with the new rpms.

One can wrap those additional RPMS into a Roll for the Rocks Reinstall, as described in this Wiki, including the following section called "After installing the frontend, how do I add additional packages or RPM's manually?" and the entry Upgrade kernel in kernel roll



In that writeup, the focus is on the kernel rpm, but there is no reason the same idea cannot be used for other rpms as well. There are the first 2 steps at the beginning which put a new updated version of an RPM in

   /export/rocks/install/contrib/5.1/arch/RPMS/
 

and then, instead of installing with "rpm -Uvh kernel*.rpm" on the head node, the rocks create distro is run.

Useful Tidbit 5

Don't add repositories to your yum configuration. I've only been running Rocks for 6 weeks, and in the email list the usage of untested rpms from various yum servers is the most common source of trouble.

Instead, if you are eager to use new rpms in your cluster, I'd suggest you keep a separate Centos linux workstation and use it as a test bed for rpms. Install whatever repositories you want in there. In your yum.conf file, set keepCache=1, so it will save all RPMS you are using in /var/cache/yum. Those RPMs can be transferred to the cluster using the methods described above.

If you choose to ignore this advice, well, do so at your own peril. In my experience, the only third party repository that is fairly dependable is EPEL, which aims to provide packages that ARE NOT provided by Centos or RedHat. EPEL does not try to provide updates for Centos packages. Other repositories, such as rpmforge or rpmfusion, may carry updates for Centos packages that cause breakage in other packages. These bugs are notoriously difficult to track down. I recently spent several days trying to debug random crashes in a Centos 5.4 workstation that resulted from upgrading gtk+ from one of those repositories. The RPM system is pretty good at protecting you from these mistakes, but it is not perfect.

Paul Johnson pauljohn@ku.edu 2010-01-03

Accessing the cluster through the web

Q: How do I access the frontend using FTP?

A: Use the SFTP protocol on port 22 for secure FTP.

Q: How do I login to Wordpress to reconfigure it?

A: Use the following login:

user: admin
password: your root password for the cluster

Q: I've noticed that the cluster can give status updates through RSS. How do I access these?

A: The cluster RSS feed is not yet integrated into the main wordpress page. To access it, point your browser to http://your.cluster/rss/ganglia

category:Cluster setup