Create simple HDP + IBM BigSQL + IBM Spectrum Scale cluster on your machine using KVM virtualization.
Before proceeding you need to have IBM Spectrum Scale local repository with all rpms necessary to run successful installation. More details : https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_localrepo.htm
IBM Spectrum Scale Ambari extension
HortonWorks + IBM Spectrum Scale Redbook
Hadoop + IBM Spectrum Scale integration, overwiev
IBM Spectrum Scale knowledge centre
IBM Spectrum Scale and HortonWorks
4 KVM machines
- MGM1 : management node + GPFS Master
- MGM2 : management node
- DATA1 : data,compute node + 40 GB disc attached
- DATA2 : data,compute node + 40 GB disc attached
OS: Centos 7.4 Linux
Discs are created by KVM command
qemu-img create -f qcow2 DISK3.qcow2 40G
Download latest HDFS transparency rpm file (part1 and part2). Unpack them and together with IBM Spectrum Scale rpms upload them to the local repository. The repository content should look like:
Index of /gpfs_rpms
Parent Directory
gpfs.base-5.0.0-0.x86_64.rpm
gpfs.callhome-ecc-client-5.0.0-0.noarch.rpm
gpfs.compression-5.0.0-0.x86_64.rpm
gpfs.docs-5.0.0-0.noarch.rpm
gpfs.ext-5.0.0-0.x86_64.rpm
gpfs.gpl-5.0.0-0.noarch.rpm
gpfs.gskit-8.0.50-79.x86_64.rpm
gpfs.gui-5.0.0-0.noarch.rpm
gpfs.hdfs-protocol-2.7.3-1.x86_64.rpm
gpfs.hdfs-protocol-2.7.3-2.180202.121800.x86_64.rpm
gpfs.java-5.0.0-0.x86_64.rpm
gpfs.license.std-5.0.0-0.x86_64.rpm
gpfs.msg.en_US-5.0.0-0.noarch.rpm
repodata/
rhel7/
sles12/
Copy the file to the host where Ambari server is installed and unpacked it.
tar xvfz ../SpectrumScaleMPack-2.4.2.4.180131.120413.noarch.tar.gz
./SpectrumScaleIntegrationPackageInstaller-2.4.2.4.bin
./SpectrumScaleMPackInstaller.py
./SpectrumScaleMPackUninstaller.py
./SpectrumScale_UpgradeIntegrationPackage
./sum.txt
Make sure that Ambari server is running.
ambari-server status
Using python /usr/bin/python
Ambari-server status
Ambari Server running
Found Ambari Server PID: 1307 at: /var/run/ambari-server/ambari-server.pid
Install management pack.
./SpectrumScaleIntegrationPackageInstaller-2.4.2.4.bin
END OF TERMS AND CONDITIONS
Do you agree to the above license terms? [yes or no]
yes
Installing...
INFO: ***Starting the Mpack Installer***
Enter Ambari Server Port Number. If it is not entered, the installer will take default port 8080 :
INFO: Taking default port 8080 as Ambari Server Port Number.
Enter Ambari Server IP Address. Default=127.0.0.1 :
INFO: Ambari Server IP Address not provided. Taking default Amabri Server IP Address as "127.0.0.1".
Enter Ambari Server Username, default=admin :
INFO: Taking default username "admin" as Ambari Server Username.
Enter Ambari Server Password :
INFO: Verifying Ambari Server Address, Username and Password.
INFO: Verification Successful.
INFO: Adding Spectrum Scale MPack : ambari-server install-mpack --mpack=SpectrumScaleExtension-MPack-2.4.2.4.tar.gz -v
INFO: Spectrum Scale MPack Successfully Added. Continuing with Ambari Server Restart...
INFO: Performing Ambari Server Restart.
INFO: Ambari Server Restart Completed Successfully.
INFO: Running command - curl -u admin:******* -H 'X-Requested-By: ambari' -X POST -d '{"ExtensionLink": {"stack_name":"HDP", "stack_version": "2.6", "extension_name": "SpectrumScaleExtension", "extension_version": "2.4.2.4"}}' http://127.0.0.1:8080/api/v1/links/
INFO: Extension Link Created Successfully.
INFO: Starting Spectrum Scale Changes.
INFO: wrote system_action_definitions.xml successfully.
INFO: Spectrum Scale Changes Successfully Completed.
INFO: Performing Ambari Server Restart.
INFO: Ambari Server Restart Completed Successfully.
INFO: Backing up original HDFS files to hdfs-original-files-backup
INFO: Running command cp -f -r -p -u /var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/ hdfs-original-files-backup
Done.
IBM Spectrum Scale package should be visible on the list of services ready to install.
PGFS cluster will comprise two discs attached to data nodes GDP3 and GDP4. Disk is visible ass /dev/vdb device. Stanza file should be created in /var/lib/ambari-server/resources/ directory.
cd /var/lib/ambari-server/resources/
vi gpfs_nsd
DISK|gdp3.sb.com:/dev/vdb
DISK|gdp4.sb.com:/dev/vdb
Make sure that all service are stopped.
Install GPFS Master along with Ambari server.
Install GPFS nodes on all nodes in the cluster, not only on data nodes.
Prepare URL pointing to repository containing GPFS and GPFS Transparency dependencies (here http://mirror/gpfs_rpms/)
Enter GPFS stanza file name in property window. Make sure that there is no trailing space in the file name!
Also decrease all replica numbers from default 3 to 2, we have only two discs here.
After successful installation the Amabari dashboard should look like:
IBM Spectrum Scale and HDFS transparency are installed.
Very important: restart ambari-server before doing anything!
ambari-server restart
Start all services.
When NameNode is restarted check GDPS mounting point. Directory structure should reflect HDFS directory layout.
[root@gdp1 resources]# ls /bigpfs/
app-logs ats hdp mapred mr-history tmp user
Verify that GPFS replaced HDFS.
touch /bigpfs/tmp/hello.gpfs
root@gdp1 resources]# su - hdfs
[hdfs@gdp1 ~]$ hdfs dfs -ls /tmp
Found 2 items
drwxr-xr-x - hdfs hdfs 0 2018-04-29 21:15 /tmp/entity-file-history
-rw-r--r-- 2 root root 0 2018-04-29 21:21 /tmp/hello.gpfs
Install all other services.