Clarification needed on AWS CFE examples #17

C0missar · 2020-03-03T20:33:21Z

The AWS section of the CFE user guide leaves a lot of questions open, or if the answers are there, I didn't understand them.

https://clouddocs.f5.com/products/extensions/f5-cloud-failover/latest/userguide/aws.html

The drawing doesn't match either the text or the example declarations. It would help to have a single drawing, with the declarations matching it exactly, and discussion using that scenario and IPs - preferably two examples, a same-AZ and an across-AZ case. The routing considerations are substantially different.

• What route(s) are to be updated? The Big-IPs can be in different subnets.
• The examples talk about both the default route and RFC 1918 routes being updated.
• Must the web servers' default route be pointed at the Big-IPs internal interface?
• Is iLX installation required? It appears so.
• Can CFE share the same S3 bucket as the one created by the CFT? It appears so.
• The failover drawing shows that VIPs must be in traffic group 'none' – why?
• Using addresses like '10.0.1.10' and '10.0.11.10' is confusing and hard to read. Why not '10.0.20.x' and '10.0.30.x' so the differences stand out?

When it comes to operations, I haven't been able to make CFE do anything. Although it accepted my declaration and responds appropriately to status and failover triggers, nothing is actually happening. Not terribly surprising, as I still don't understand it, but I should get some indication back.

• How do you troubleshoot CFE?
• Why does a call to Trigger Failover return "SUCCEEDED" when nothing happened?
• The Across-AZ CFT creates an EIP and a private VIP on bigip1, but no private VIP on bigip2, so there is nothing to associate the EIP with on failover.

Thanks,
Stan

The text was updated successfully, but these errors were encountered:

shyawnkarim · 2020-03-06T20:55:13Z

Thanks for reaching out to us about our documentation. We released a new version of the Cloud Failover Extension last night and have made many additions and improvements to our documentation. Please take a look at it and let me know if there are items that still need clarification.

chen23 · 2020-03-17T08:17:27Z

I'll add to @C0missar comments that the "quick start" can omit details related how CFE behaves when deployed via the CFT. Specifically:

the "stack name" will map to "mydeployment"
all fields are "required" (for those that have used the HA iApp previously it was a "surprise" to have an S3 dependency)

In my own testing I could only get it to work using routeTag and not static. It would be helpful to add additional screen shots for updating EIP / Routes to illustrate the desired outcome. Here's an example of what my route table looks like after applying tags.

the term "failover.scopingTags" in the docs got lost to me whether you should specify a custom key name or just key value. After looking at the outcome of the CFT it became apparent that the key defaults to the value that is used in the documentation, it would be nice to have a note to example/doc to reference the default values when using the CFT.

To the debug steps it would be helpful to mention where to look for logs (restnoded.log) and how to trigger the script manually (either by force to stand-by or trigger the scripts directly via bash).

AFAIK traffic group NONE helps in cases where you have an active/active and/or you want the BIG-IP to still accept traffic while traffic is being sent to the stand-by device during failover (otherwise the traffic would immediately get dropped).

It would help to mention the destination of the JSON body. I had to hunt around back to the quickstart page to get the URL. Maybe quickstart is per-environment (AWS, Azure, GCP) and makes assumption that you are starting with a CFT, ARM, GDM template? It gets a bit repetitive, but also makes it easier to start/end on the same page.

Here's an example of my JSON output that I used in my environment using the CFT deployment.

{
    "class": "Cloud_Failover",
    "environment": "aws",
        "externalStorage": {
        "scopingTags": {
            "f5_cloud_failover_label": "erchen-cross-az"
        }
    },
    "failoverAddresses": {
        "scopingTags": {
            "f5_cloud_failover_label": "erchen-cross-az"
        }
    },
    "failoverRoutes": {
        "scopingTags": {
            "f5_cloud_failover_label": "erchen-cross-az"
        },
        "scopingAddressRanges": [
            {
                "range": "192.168.1.0/24"
            }
        ],
        "defaultNextHopAddresses": {
            "discoveryType": "routeTag"
        }
    }
}

C0missar · 2020-03-17T18:09:55Z

I honestly didn't see much in the way of documentation changes. The drawing in AWS now shows the pair of VSs in TG1, but that was all I noticed (in the AWS section).

I would like to have a clearer statement about how CFE relates to traffic groups and traditional HA. I get that the mechanism is dual VIPs and API based rather that GARP, but will normal box failover events trigger the standby box to make the API calls and reassociate the EIPs? If you force a TG to standby, will that trigger CFE?

I'd also like to understand why Active-Active is not recommended. Other articles on failover (non-CFE) have recommended Active-Active, and it seems to me that for CFE, it wouldn't really matter, since you are failing over to a different VIP anyway.

On routing, the only way I can think of to make HA across AZs work is with a SNATpool or SNAT automap. I don't see the use case for route updates, and I'd like to know what assumptions are made about Web server routing/default gateways.

I have yet to see this work. My declaration looks like that of Chen23 above, except that the discoveryType is "static", and the scopingAddressRanges is "[]". I don't understand the intent for this field, and my PS consultant had this exact declaration working in his lab using that empty value.

{
    "class": "Cloud_Failover",
    "environment": "aws",
    "externalStorage": {
        "scopingTags": {
            "f5_cloud_failover_label": "AWSUSE2PROD-cfe"
        }
    },
    "failoverAddresses": {
        "scopingTags": {
            "f5_cloud_failover_label": "AWSUSE2PROD-cfe"
        }
    },
    "failoverRoutes": {
        "scopingTags": {
            "f5_cloud_failover_label": "AWSUSE2PROD-cfe"
        },
        "scopingAddressRanges": [],
        "defaultNextHopAddresses": {
            "discoveryType": "static",
            "items": []
        }
    },
    "controls": {
        "class": "Controls",
        "logLevel": "silly"
    }
}

The results I'm getting are a) forced failover of TG1 via the Config UI has absolutely no effect on CFE or the contents of the CFE state file in the S3 bucket; b) A trigger via declaration returns "SUCCEEDED" and the state file is updated, but the EIP is not re-associated and the state of TG1 remains unchanged.

The S3 bucket has the f5_cloud_failover_label tag matching the declaration, and the EIP contains both the f5_cloud_failover_label tag and matching value, and the VIPS tag containing the two VIPs separated by a comma as shown in the example.

shyawnkarim · 2020-03-17T21:31:17Z

Thanks for all these details on how to improve our documentation. I've created internal issue AUTOSDK-230 to improve our documentation. I've included detailed notes for our documentation team on all the points you have raised here.

C0missar · 2020-03-17T21:59:06Z

I also don't understand these terms or how to configure them. None of them are questions that come up in traditional failover discussions.

defaultNextHopAddresses - Next hop for what? On which interface?
DiscoveryType - What is discovered, given that each failover object is declared individually?
scopingAddressRanges - Isn't that what is specified in the VIPS tag on the EIP?
What assumptions are made for the web server's routing?

And what does tgactive.sh have to do with anything - the AWS CFE page had it being removed for same net deployments, but didn't even mention anything for across-AZ. There was discussion of tgactive.sh on Issue #19 as if it was something I needed to be concerned with, but I don't know what or why.

Is any of this stuff written down, or am I supposed to be reading the source code? I've been messing with this for several weeks and still can't get it to work, and support won't open a ticket on it. I'd post more info on my config, but don't know what would be useful.

rafkruczkowski · 2020-03-26T03:25:34Z

I'm getting lost on what element fails over. In the gif diagram, the private IP address of the NIC is show, but would that be a manually configured floating IP on the F5, or a AWS object?

Also, the doc hints at that the declaration can be used for failover address and/or failover routes. My deployment only has one NIC, so I posted the declaration below and got a success back but still not sure how to test the ingress traffic to the active unit.

{ "class": "Cloud_Failover", "environment": "aws", "externalStorage": { "scopingTags": { "f5_cfe": "alpha" } }, "failoverAddresses": { "scopingTags": { "f5_cfe": "alpha" } } }

Lastly, the keys indicate they can be anything, and looks like it, but the only one that should be set is the NIC Mapping one and that one needs to be f5_cloud_failover_nic_map Would request clarity in the documentation and the declaration on this point.

alaari-f5 · 2020-04-16T23:14:10Z

Closing this issue

As of release CFE 1.2 we moved this CFE repo under F5Networks. Your issue was recreated there. To follow-up on this issue visit:

F5Networks/f5-cloud-failover-extension#9

C0missar mentioned this issue Mar 17, 2020

400 error on any POST to CFE #19

Closed

shyawnkarim added the documentation Improvements or additions to documentation label Apr 3, 2020

alaari-f5 mentioned this issue Apr 16, 2020

Clarification needed on AWS CFE examples F5Networks/f5-cloud-failover-extension#9

Closed

alaari-f5 closed this as completed Apr 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification needed on AWS CFE examples #17

Clarification needed on AWS CFE examples #17

C0missar commented Mar 3, 2020 •

edited

Loading

shyawnkarim commented Mar 6, 2020

chen23 commented Mar 17, 2020

C0missar commented Mar 17, 2020 •

edited

Loading

shyawnkarim commented Mar 17, 2020

C0missar commented Mar 17, 2020

rafkruczkowski commented Mar 26, 2020 •

edited

Loading

alaari-f5 commented Apr 16, 2020

Clarification needed on AWS CFE examples #17

Clarification needed on AWS CFE examples #17

Comments

C0missar commented Mar 3, 2020 • edited Loading

shyawnkarim commented Mar 6, 2020

chen23 commented Mar 17, 2020

C0missar commented Mar 17, 2020 • edited Loading

shyawnkarim commented Mar 17, 2020

C0missar commented Mar 17, 2020

rafkruczkowski commented Mar 26, 2020 • edited Loading

alaari-f5 commented Apr 16, 2020

C0missar commented Mar 3, 2020 •

edited

Loading

C0missar commented Mar 17, 2020 •

edited

Loading

rafkruczkowski commented Mar 26, 2020 •

edited

Loading