Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
317 lines (238 sloc) 23 KB

NexentaStor High-Availability iSCSI with Linux Clients

Revision info:
Last modified: 12/02/2011
Version: 1.0.3

Notes:
Document not ready for public distribution.

Depending upon criticality of your production environment, and in most cases a requirement when running various Clusters on Linux, such as Heartbeat or RedHat Cluster, it may be necessary to ensure highly-available access to allocated iSCSI LUNs. There are a number of ways to accomplish this, but in this document we are going to focus on an approach where we define two targets each bound to a Target Portal Group, which in-turn is related to a network interface. Each interface is on its own subnet. End-result of this is a configuration where we should always have two distinct paths, by which we can reach our Storage. If one path completely fails, we should still be able to operate in a reduced capacity, but not sustain a complete failure.

Objective:

The objective of this document is to present one of multiple designs for iSCSI multipathing between the NexentaStor SAN and a Linux client which in a real-world scenario may be part of large HA cluster. Same steps apply whether this is a single stand-alone system or a cluster. We are expecting that our Linux client(s) will be configured in a way where four paths exist to each block device mapped from the SAN. There will essentially be two paths per iSCSI interface from the client(s). This is due to the fact that each iSCSI interface will be bound to a dedicated physical network interface, and will login into both targets. It is best to establish a meaningful naming convention in advance of this setup, and follow the convention as much as possible. For clarity, we use the name of network interfaces in the Target Portal Group names, to easily identify which TPG ties to which physical network interface.

We are going to effectively observe a configuration similar to the following, as a result of completing steps in this document.

    Login to [iface: sw-iscsi-1, target: iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58, portal: 10.10.8.90,3260] successful.
    Login to [iface: sw-iscsi-0, target: iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58, portal: 10.10.8.90,3260] successful.
    Login to [iface: sw-iscsi-1, target: iqn.2011-09.dom.homer:02:0e4f921a-407e-eb9f-9f53-aded8d99a723, portal: 10.10.9.90,3260] successful.
    Login to [iface: sw-iscsi-0, target: iqn.2011-09.dom.homer:02:0e4f921a-407e-eb9f-9f53-aded8d99a723, portal: 10.10.9.90,3260] successful.

Assumptions:

We are making a number of assumptions about the environment, and they are as follows:

  • Client(s) are CentOS 6.
  • Client(s) and SAN will both have two physical interfaces, each configured on a unique subnet 10.10.8 and 10.10.9.
  • We are not using a single physical interface with multiple aliases on either client(s) or SAN.
  • SAN and client(s) are either directly attached, or are switched via a network that does not exhibit single points of failure, such a single switch, single path between switches, etc.
  • SAN is configured with two targets, each bound to a Target Portal Group.
  • Each Target Portal Group is tied to a single IP address, assigned to a physical interface, not an aliased interface and default iSCSI port 3260.
  • Each network interface assigned to each Target Portal Group has a completely isolated physical path to the client.
  • Both targets belong to single SCSI Target Group.
  • iscsid and dm packages are correctly installed on the client, and dm-multipath service set to start automatically.
  • For the purposes of this guide we are assuming that Remote Host and Target groups already exist. Please refer to the User's Guide for further details about creation of Host and Target groups.

Config Settings Used:

  • iSCSI (COMSTAR) Target Names

      iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58
      iqn.2011-09.dom.homer:02:0e4f921a-407e-eb9f-9f53-aded8d99a723
    
  • iSCSI (COMSTAR) Target Portal Group name(s) and associated IP(s) TGP / ip:port

      tpg_lab9_e1000g1 / 10.10.8.90:3260 
      tpg_lab9_e1000g2 / 10.10.9.90:3260
    
  • SCSI (COMSTAR) Target Group name(s)

      tg_hydra_databases
    
  • SCSI (COMSTAR) Initiator Group(s)

      hg_hydra_databases
    
  • Initiator(s) from the CentOS client

      iqn.2011-10.dom.homer:1:server-lab6-583bf04eae36
    

Steps on Nexenta SAN:

Perform following steps via Management GUI (NMV).

  1. We are going to create target portal groups first, to which we are going to bind our iscsi targets.

    1. Navigate to Data Management > SCSI Target/Target Plus > iSCSI > Target Portal Groups. Click on Create under Manage iSCSI TPGs.

    2. Enter the name, in our case tpg_lab9_e1000g1 in the Name field, and IP Address:Port, in our case 10.10.8.90:3260 in the Addresses field, then click Create. We choose to use a descriptive name for Target Portal Groups, including the name of network interface to which the group is bound.

    3. Repeat above step for second Target Portal Group. Each group is bound to a single IP Address. If more than two interfaces are being used, repeat for each interface/IP.

  2. Next, we are adding iSCSI Targets, Target Groups and Initiator Groups.

    1. Navigate to Data Management > SCSI Target/Target Plus > iSCSI > Targets. Click Create under Manage iSCSI Targets.

    2. Enter the name, in our case iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58 in the Name field. You many choose to leave this field empty to auto-generate a name.

    3. We are not using aliases, as they are optional, but you may choose to setup an alias for each target.

    4. Under iSCSI Target Portal Groups select one of the Target Portal groups created earlier. For simplicity of management, we used :01: and :02: in the name of the targets to designate first and second target and we relate first target to tpg_lab9_e1000g1 and second target to tpg_lab9_e1000g2, then click Create.

    5. Navigate to Data Management > SCSI Target/Target Plus > SCSI Target > Target Groups. Click on Create under Manage Target Groups.

    6. Enter the name, in our case tg_hydra_databases in the Name field, and select all targets, in our case, we select two targets we created earlier, then click Create.

    7. Navigate to Data Management > SCSI Target/Target Plus > SCSI Target > Target Groups.

    8. Enter the name, in our case hg_hydra_databases in the Name field, and if Initiator names are already known you may enter them manually in the Additional Initiators field, otherwise we will update this group later after logging in from our client(s), then click Create.

  3. At this point we can create mappings as necessary. We have to have ZVOLs created first, prior to setting up mappings. Please, refer to the User's Guide for further details on creation of ZVOLs and mappings. For the purposes of this document we are using four ZVOLs, with LUN ID's 10 through 13 and hg_hydra_databases as the Host Group and tg_hydra_databases as the Target Group.

Steps Client-Side:

  1. Quick validation of the iscsid service is necessary to make sure that indeed it is setup correctly. Command chkconfig --list iscsid should return state of the iscsid service. We expect to have it enabled in runlevels 3, 4, and 5. If not enabled, run chkconfig iscsid on, which will assume defaults and enable service in runlevels 3, 4, and 5. Keep in mind, different distribution of Linux may have different names for services and tools/methods to enable/disable automatic startup of services.

     prague# chkconfig --list iscsid
     iscsid          0:off   1:off   2:off   3:on    4:on    5:on    6:off
    
  2. Next, we validate multipathd service is working correctly. The mpathconf utility will return information about the state of the multipath configuration.

     prague# mpathconf 
     multipath is enabled
     find_multipaths is disabled
     user_friendly_names is enabled
     dm_multipath module is loaded
     multipathd is chkconfiged on
    
  3. Now we will configure our iSCSI initiator settings. Depending upon your client, configuration files may/may not be in the same location(s). We will modify the /etc/iscsi/initiatorname.iscsi with a custom initiator name. By default, the file will already have an InitiatorName entry. We replace it with a custom entry, but this is strictly optional, and should be part of your naming convention decisions.

     InitiatorName=iqn.2011-10.dom.homer:1:server-lab6-583bf04eae36
    
  4. We are going to create a virtual iSCSI interface for each physical network interface, and bind the physical interfaces to the virtual iSCSI interfaces. The end-result will be two new iscsi interfaces: sw-iscsi-0 and sw-iscsi-1, bound to physical network interfaces eth1 and eth2. Avoid naming logical interfaces with the same names as the physical NICs.

    1. First, we create the logical interfaces for which a corresponding configuration file named sw-iscsi-0 and sw-iscsi-0 will be generated under /var/lib/iscsi/ifaces.

       prague# iscsiadm --mode iface --op=new --interface sw-iscsi-0
       prague# iscsiadm --mode iface --op=new --interface sw-iscsi-1
      
    2. Following the two iscsiadm commands we are going to modify two newly created config files with additional parameters. Here's an example of one of the configuration files modified with details about the physical interface. Note, we are defining parameters here specific to each interface, and your configuration will certainly vary. To quickly collect information about each interface you could simply use the ip command: ip addr show <interface-name>|egrep 'inet|link'. This configuration explicitly binds our virtual interfaces to physical interfaces. Each interface is on its own network, in our case 10.10.8 and 10.10.9. We can always validate our configuration with this: for i in 0 1; do iscsiadm -m iface -I sw-iscsi-$i; done, replacing 0 and 1 with whatever number of the iSCSI interface(s) we used.

       prague# cat /var/lib/iscsi/ifaces/sw-iscsi-0
       # BEGIN RECORD 2.0-872
       iface.iscsi_ifacename = sw-iscsi-0
       iface.net_ifacename = eth1
       iface.hwaddress = 00:0c:29:94:5b:83
       iface.ipaddress = 10.10.8.60
       iface.transport_name = tcp
       # END RECORD
       
       prague# cat /var/lib/iscsi/ifaces/sw-iscsi-1
       # BEGIN RECORD 2.0-872
       iface.iscsi_ifacename = sw-iscsi-1
       iface.net_ifacename = eth2
       iface.hwaddress = 00:0c:29:94:5b:8d
       iface.ipaddress = 10.10.9.60
       iface.transport_name = tcp
       # END RECORD
      
  5. Depending upon your client, tcp/ip kernel parameter rp_filter may need to be tuned, in order to allow for correct multipathing with iSCSI. For the purposes of this guide we set 0. For each physical interface we add an entry to /etc/sysctl.conf. In our example, we are modifying this tunable for eth1 and eth2. If the kernel tuning is being applied, either set the parameters via the sysctl command, or reboot system prior to next steps.

     prague# grep eth[0-9].rp_filter /etc/sysctl.conf 
     net.ipv4.conf.eth1.rp_filter = 0
     net.ipv4.conf.eth2.rp_filter = 0
    
  6. Next, we will perform a discovery of the targets via the portals that we exposed previously. We can do a discovery against one of both portals, and the result should be identical.

    1. We discover targets for each configured logical iSCSI interface. If you choose to have more than two logical interfaces, and therefore more than two paths to the SAN, perform the following step for each logical interface.

       prague# iscsiadm -m discovery -t sendtargets --portal=10.10.8.90 -I sw-iscsi-0 --discover
       prague# iscsiadm -m discovery -t sendtargets --portal=10.10.9.90 -I sw-iscsi-1 --discover
      
    2. We should validate nodes created as a result of the discovery. We expect to see two nodes for each portal on the SAN.

       prage# iscsiadm -m node
       10.10.8.90:3260,2 iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58
       10.10.8.90:3260,2 iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58
       10.10.9.90:3260,2 iqn.2011-09.dom.homer:02:0e4f921a-407e-eb9f-9f53-aded8d99a723
       10.10.9.90:3260,2 iqn.2011-09.dom.homer:02:0e4f921a-407e-eb9f-9f53-aded8d99a723
      
  7. At this point we have each logical iSCSI interface configured to login into all known portals on the SAN, and logging into all known targets. Thus, we have a choice to make. We can allow for this configuration to remain, or we can choose to isolate each logical interface to a single portal on the SAN.
    Each node will have a directory under /var/lib/iscsi/nodes with name identical to target name on the SAN, and a sub-directory for each portal. Here, we can see that leaf objects of this tree structure are files named identical to our logical iSCSI interfaces. In fact, these are config files generated upon successful target discovery for each interface.

     prague# ls -l /var/lib/iscsi/nodes/iqn.2011-09.dom.homer:02:0e4f921a-407e-eb9f-9f53-aded8d99a723/10.10.9.90,3260,2
     total 8
     -rw-------. 1 root root 1784 Oct 16 12:42 sw-iscsi-0
     -rw-------. 1 root root 1784 Oct 16 12:42 sw-iscsi-1
    
     prague# ls -l /var/lib/iscsi/nodes/iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58/10.10.8.90,3260,2
     total 8
     -rw-------. 1 root root 1784 Oct 16 12:42 sw-iscsi-0
     -rw-------. 1 root root 1784 Oct 16 12:42 sw-iscsi-1  
    

    Deletion of one of the interface configuration files under each node will in effect restrict that interface from logging into the target. At any time, you can choose which interfaces will login into which portals. For the purposes of this document we are going to assume that it is acceptable for each logical iSCSI interface to login into both portals.

  8. Next, we validate ability to login into both portals. We expect to see a successful login for each logical iSCSI interface into each configured portal.

     prague#  iscsiadm -m node -l
     Logging in to [iface: sw-iscsi-1, target: iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58, portal: 10.10.8.90,3260]
     ...trimmed...
     Login to [iface: sw-iscsi-1, target: iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58, portal: 10.10.8.90,3260] successful.
     ...trimmed...
    
  9. We validate that we are able see new iSCSI LUNs after logging into the portals by listing contents of /dev/disk/by-* directories. We expect to see four device files for each LUN, assuming we did not remove any interface configuration files as described in step 7 above.

     prague# ls /dev/disk/by-path/|egrep "10.10.*.90"|egrep lun-1[0-3]
     ip-10.10.8.90:3260-iscsi-iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58-lun-10
     ip-10.10.8.90:3260-iscsi-iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58-lun-11
     ip-10.10.8.90:3260-iscsi-iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58-lun-12
     ip-10.10.8.90:3260-iscsi-iqn.2011-09.dom.homer:01:13611547-b4e6-e4b9-d97b-d1f4df275f58-lun-13
     ...trimmed...
    
  10. Because we have one additional layer between OS and iSCSI (DM-MP layer), we want to tune iSCSI parameters adjusting the time it takes to hand-off failed commands to the DM-MP layer. Typically, it takes 120 seconds for the iSCSI service to give-up on a command when there are issues with completing the command. We want this period to be a lot shorter, and allow DM-MP to try and switch to another path instead of potentially trying down the same path for 2 minutes. We need to modify /etc/iscsi/iscsid.conf, and comment out the default entry with value 120 (seconds), and add new entry with value 10.

    prague# grep node.session.timeo.replacement_timeout /etc/iscsi/iscsid.conf
    #node.session.timeo.replacement_timeout = 120
    node.session.timeo.replacement_timeout = 10
    
  11. There are a number of parameters that we have to configure in order for the device mapper to correctly multipath with these LUNs. First we are going to configure our /etc/multipath.conf configuration file, with basic settings necessary to properly manage multipath behavior and path failure. This is not an end-all-be-all configuration, rather a very good starting point for most NexentaStor deployments. Please be aware that this configuration of multipath may fail on systems with older version of multipath. If you are running Debian and older RedHat-based distributions, parameters that will most likely need to be adjusted are: checker_timer,getuid_callout, path_selector. Be certain to review your distribution's multipath documentation, or a commented sample multipath.conf file.

    defaults {
            checker_timer               120
            getuid_callout              "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
            no_path_retry               12
            path_checker                directio
            path_grouping_policy        group_by_serial
            prio                        const
            polling_interval            10
            queue_without_daemon        yes
            rr_min_io                   1000
            rr_weight                   uniform
            selector                    "round-robin 0"
            udev_dir                    /dev
            user_friendly_names         yes
    }
    blacklist {
            devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
            devnode "^hd[a-z]"
            devnode "^sda"
            devnode "^sda[0-9]"
            device {
                    vendor DELL
                    product "PERC|Universal|Virtual"
            }
    }
    devices {
          device {
                  ## This section Applicable to ALL NEXENTA/COMSTAR provisioned LUNs
                  ## And will set sane defaults, necessary to multipath correctly
                  ## Specific parameters and deviations from the defaults should be
                  ## configured in the multipaths section on a per LUN basis
                  ##
                  vendor                NEXENTA
                  product               COMSTAR
                  path_checker          directio
                  path_grouping_policy  group_by_node_name
                  failback              immediate
                  getuid_callout        "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
                  rr_min_io             2000
                  ##
                  ##
                  ## Adjust `node.session.timeo.replacement_timeout` in /etc/iscsi/iscsid.conf
                  ## in order to rapidly fail commands down to the multipath layer
                  ## and allow DM-MP to manage path selection after failure
                  ## set node.session.timeo.replacement_timeout = 10
                  ##
                  ## The features parameter works with replacement_timeout adjustment
                  features              "1 queue_if_no_path"
            }
    }
    
  12. Next, we describe each LUN that we are going to access via multiple paths. Any parameters that we already set in device and defaults sections, we can skip, assuming we are accepting the global settings in the the device and defaults sections. Notice, we explicitly define WWID for each LUN. We also supply an alias, which will make it much easier to specifically identify LUNs; this is strictly optional. Parameters vendor and product will always be NEXENTA and COMSTAR respectively, unless explicitly changed on the SAN, which is out of scope of this configuration.

    multipaths {
                ## Define specifics about each LUN in this section, including any
                ## parameters that will be different from defaults and device
                ##
           multipath {
                  alias                 hydra_apps-a01
                  wwid                  3600144f0f484400000004e8685e70001
                  vendor                NEXENTA
                  product               COMSTAR
                  path_selector         "service-time 0"
                  failback              immediate
           }
           multipath {
                  alias                 hydra_db-a01
                  wwid                  3600144f0f484400000004e8685f80002
                  vendor                NEXENTA
                  product               COMSTAR
                  path_selector         "service-time 0"
                  failback              immediate
            }
            multipath {
                  alias                 hydra_db-a02
                  wwid                  3600144f0f484400000004e86860a0003
                  vendor                NEXENTA
                  product               COMSTAR
                  path_selector         "service-time 0"
                  failback              immediate
           }
            multipath {
                  alias                 hydra_db-a03
                  wwid                  3600144f0f484400000004e8686210004
                  vendor                NEXENTA
                  product               COMSTAR
                  path_selector         "service-time 0"
                  failback              immediate
           }
    }
    
  13. After we save the file as /etc/multipath.conf, we are going to flush and reload device-mapper maps. At this point we are assuming that the multipathd service is running on the system. Running multipath -v2 gives verbose enough details to make sure that maps are being created correctly.

    prague# multipath -F
    prague# multipath -v2  
    

    A typical multipath configuration for any single LUN will look very similar to the following:

    hydra_db-a02 (3600144f0f484400000004e86860a0003) dm-4 NEXENTA,COMSTAR
    size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
    `-+- policy='service-time 0' prio=-1 status=active
      |- 10:0:0:12 sdf 8:80  active undef running
      `- 11:0:0:12 sdg 8:96  active undef running
    
  14. Because we assigned an alias to each LUN, by default we will observe symbolic links to devices representing the LUNs, created under /dev/mapper. At this point we are ready to format these LUNs and apply a filesystem over them. Aliases are very convenient and will allow you to maintain access to the multipath device via a more accessible name.

    prague# ls -l /dev/mapper
    total 0
    crw-rw----. 1 root root 10, 58 Oct 22 15:25 control
    lrwxrwxrwx. 1 root root      7 Oct 23 07:42 hydra_apps-a01 -> ../dm-6
    lrwxrwxrwx. 1 root root      7 Oct 23 07:42 hydra_db-a01 -> ../dm-2
    lrwxrwxrwx. 1 root root      7 Oct 23 07:42 hydra_db-a02 -> ../dm-4
    lrwxrwxrwx. 1 root root      7 Oct 23 07:42 hydra_db-a03 -> ../dm-3
    ...trimmed...
    
You can’t perform that action at this time.