Map Okta groups to AWS roles and add Okta Application ReplyUris (#191)

* Fix for issue #31 * internal_commit * adding factors * trusted origin, full sync ok * api doc * run clean - added transform layer * Update oktaintel.py working full sync * Update oktaintel.py * updated doc * unit test foundation * Update __init__.py * lint * Update oktaintel.py * Update oktaintel.py * Update oktaintel.py * Update oktaintel.py * unit tests * cred setup * sync integration * Update okta_import_cleanup.json * lint bs * lint bs * Group member bug fix * PR Feedback * Update oktaintel.py * Update README.md * removing cli parameter * evan review part 1 * evan feedback part 2 - refactoring into smaller chunks * evan feedback - part 2 - CLI parameters * lint * utils * testing * bug fix * Update cli.py * splitting get and transform * Update users.py * Update groups.py * Update roles.py * fix doc * change unit test * Update test_syntax.py * fix unit test * fix index * Added in reply uris for okta applications * Add in dns querying for reply urls * added awssaml module * Adding in awssaml module into okta * Uncomment other syncs * Made CLI updates to get regex and fixed the applications unit test * added awssaml unittest * Add in CAN_ASSUME_ROLE mapping between AWSRole and Humans. Added in cleanup jobs * Updated readme's * Fix numbering in the readme * Fixed lint * Add an index for ReplyUri's * get reply urls in alphabetical order * Added CLI parameter for replyuri dns resolution * Reorder okta cleanup * Fix unit test to make sure its all working * Revert "Added CLI parameter for replyuri dns resolution" This reverts commit 9b65a68. * Revert "Fix unit test to make sure its all working" This reverts commit 6cb885f. * fix unit test * remove unneded list traversal * Fix docstring * Fix a comment in the cli help * Fixing PR feedback * removing extra line * reformat cleanup
lyft · Nov 6, 2019 · 80908b6 · 80908b6
1 parent d0317e0
commit 80908b6
Show file tree

Hide file tree

Showing 12 changed files with 429 additions and 106 deletions.
diff --git a/README.md b/README.md
@@ -59,55 +59,57 @@ Time to set up the server that will run Cartography.  Cartography _should_ work
 
 			⚠️ At this time we run our automated tests on Neo4j version 3.5.\*.  Other versions may work but are not explicitly supported. ⚠️
 
-	2. [Install](https://neo4j.com/docs/operations-manual/current/installation/) Neo4j on the server you will run Cartography on.
+	1. [Install](https://neo4j.com/docs/operations-manual/current/installation/) Neo4j on the server you will run Cartography on.
 
-2. If you're an AWS user, **prepare your AWS account(s)**
+1. If you're an AWS user, **prepare your AWS account(s)**
 
 	- **If you only have a single AWS account**
 
 		1. Set up an AWS identity (user, group, or role) for Cartography to use.  Ensure that this identity has the built-in AWS [SecurityAudit policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_job-functions.html#jf_security-auditor) (arn:aws:iam::aws:policy/SecurityAudit) attached.  This policy grants access to read security config metadata.
-		2. Set up AWS credentials to this identity on your server, using a `config` and 	`credential` file.  For details, see AWS' [official guide](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
+		1. Set up AWS credentials to this identity on your server, using a `config` and 	`credential` file.  For details, see AWS' [official guide](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
 
  	- **If you want to pull from multiple AWS accounts**, see [here](#multiple-aws-account-setup).
 
- 3. If you're a GCP user, **prepare your GCP credential(s)**
+ 1. If you're a GCP user, **prepare your GCP credential(s)**
 
     1. Create an identity - either a User Account or a Service Account - for Cartography to run as
-    2. Ensure that this identity has the [securityReviewer](https://cloud.google.com/iam/docs/understanding-roles) role attached to it.
-    3. Ensure that the machine you are running Cartography on can authenticate to this identity.
+    1. Ensure that this identity has the [securityReviewer](https://cloud.google.com/iam/docs/understanding-roles) role attached to it.
+    1. Ensure that the machine you are running Cartography on can authenticate to this identity.
         - **Method 1**: You can do this by setting your `GOOGLE_APPLICATION_CREDENTIALS` environment variable to point to a json file containing your credentials.  As per SecurityCommonSense™️, please ensure that only the user account that runs Cartography has read-access to this sensitive file.
         - **Method 2**: If you are running Cartography on a GCE instance or other GCP service, you can make use of the credential management provided by the default service accounts on these services.  See the [official docs](https://cloud.google.com/docs/authentication/production) on Application Default Credentials for more details.
 
-4. If you're a CRXcavator user, **prepare your CRXcavator API key**
+1. If you're a CRXcavator user, **prepare your CRXcavator API key**
 
     1. Generate an API key from your CRXcavator [user page](https://crxcavator.io/user/settings#)
-    2. Populate the following environment variables in the shell running Cartography
+    1. Populate the following environment variables in the shell running Cartography
         1. CRXCAVATOR_URL - the full URL to the CRXcavator API. https://api.crxcavator.io/v1 as of 07/09/19
-        2. CREDENTIALS_CRXCAVATOR_API_KEY - your API key generated in the previous step. Note this is a credential and should be stored in an appropriate secret store to be populated securely into your runtime environment.
-    3. If the credentials are configured, the CRXcavator module will run automatically on the next sync
+        1. CREDENTIALS_CRXCAVATOR_API_KEY - your API key generated in the previous step. Note this is a credential and should be stored in an appropriate secret store to be populated securely into your runtime environment.
+    1. If the credentials are configured, the CRXcavator module will run automatically on the next sync
 
-5.  If you're using GSuite, **prepare your GSuite Credential**
+1.  If you're using GSuite, **prepare your GSuite Credential**
 
     Ingesting GSuite Users and Groups utilizes the [Google Admin SDK](https://developers.google.com/admin-sdk/).
 
     1. [Enable Google API access](https://support.google.com/a/answer/60757?hl=en)
-    2. Create a new G Suite user account and accept the Terms of Service. This account will be used as the domain-wide delegated access.
-    3. [Perform G Suite Domain-Wide Delegation of Authority](https://developers.google.com/admin-sdk/directory/v1/guides/delegation)
-    4.  Download the service account's credentials
-    5.  Export the environmental variables:
+    1. Create a new G Suite user account and accept the Terms of Service. This account will be used as the domain-wide delegated access.
+    1. [Perform G Suite Domain-Wide Delegation of Authority](https://developers.google.com/admin-sdk/directory/v1/guides/delegation)
+    1.  Download the service account's credentials
+    1.  Export the environmental variables:
         1. `GSUITE_GOOGLE_APPLICATION_CREDENTIALS` - location of the credentials file.
-        2. `GSUITE_DELEGATED_ADMIN` - email address that you created in step 2
+        1. `GSUITE_DELEGATED_ADMIN` - email address that you created in step 2
 
-6. If you're using Okta intel module, **prepare your Okta API token**
+1. If you're using Okta intel module, **prepare your Okta API token**
     1. Generate your API token by following the steps from [Okta Create An API Token documentation](https://developer.okta.com/docs/guides/create-an-api-token/overview/)
-    2. Populate an environment variable with the API token. You can pass the environment variable name via the cli --okta-api-key-env-var parameter
-    3. Use the cli --okta-org-id parameter with the organization id you want to query. The organization id is the first part of the Okta url for your organization.
+    1. Populate an environment variable with the API token. You can pass the environment variable name via the cli --okta-api-key-env-var parameter
+    1. Use the cli --okta-org-id parameter with the organization id you want to query. The organization id is the first part of the Okta url for your organization.
+	1. If you are using Okta to [administer AWS as a SAML provider](https://saml-doc.okta.com/SAML_Docs/How-to-Configure-SAML-2.0-for-Amazon-Web-Service#scenarioC) then the module will automatically match OktaGroups to the AWSRole they control access for
+		- If you are using a regex other than the standard okta group to role regex `^aws\#\S+\#(?{{role}}[\w\-]+)\#(?{{accountid}}\d+)$` defined in [Step 5: Enabling Group Based Role Mapping in Okta](https://saml-doc.okta.com/SAML_Docs/How-to-Configure-SAML-2.0-for-Amazon-Web-Service#scenarioC)  then you can specify your regex with the --okta-saml-role-regex parameter.
 
-7. **Get and run Cartography**
+1. **Get and run Cartography**
 
 	1. Run `pip install cartography` to install our code.
 
-	2. Finally, to sync your data:
+	1. Finally, to sync your data:
 
 		- If you have one AWS account, run
 

diff --git a/cartography/cli.py b/cartography/cli.py
@@ -141,6 +141,17 @@ def _build_parser(self):
                 'Required if you are using the Okta intel module. Ignored otherwise.'
             ),
         )
+        parser.add_argument(
+            '--okta-saml-role-regex',
+            type=str,
+            default=r"^aws\#\S+\#(?{{role}}[\w\-]+)\#(?{{accountid}}\d+)$",
+            help=(
+                'The regex used to map Okta groups to AWS roles when using okta as a SAML provider.'
+                'The regex is the one entered in Step 5: Enabling Group Based Role Mapping in Okta'
+                'https://saml-doc.okta.com/SAML_Docs/How-to-Configure-SAML-2.0-for-Amazon-Web-Service#c-step5'
+                'The regex must contain the {{role}} and {{accountid}} tags'
+            ),
+        )
         return parser
 
     def main(self, argv):

diff --git a/cartography/config.py b/cartography/config.py
@@ -23,6 +23,8 @@ class Config:
     :param okta_org_id: Okta organization id. Optional.
     :type okta_api_key: str
     :param okta_api_key: Okta API key. Optional.
+    :type okta_saml_role_regex: str
+    :param okta_saml_role_regex: The regex used to map okta groups to AWS roles. Optional.
     """
 
     def __init__(
@@ -35,6 +37,7 @@ def __init__(
         analysis_job_directory=None,
         okta_org_id=None,
         okta_api_key=None,
+        okta_saml_role_regex=None,
     ):
         self.neo4j_uri = neo4j_uri
         self.neo4j_user = neo4j_user
@@ -44,3 +47,4 @@ def __init__(
         self.analysis_job_directory = analysis_job_directory
         self.okta_org_id = okta_org_id
         self.okta_api_key = okta_api_key
+        self.okta_saml_role_regex = okta_saml_role_regex
diff --git a/cartography/data/indexes.cypher b/cartography/data/indexes.cypher
@@ -62,13 +62,15 @@ CREATE INDEX ON :OktaOrganization(id);
 CREATE INDEX ON :OktaUser(id);
 CREATE INDEX ON :OktaUser(email);
 CREATE INDEX ON :OktaGroup(id);
+CREATE INDEX ON :OktaGroup(name);
 CREATE INDEX ON :OktaApplication(id);
 CREATE INDEX ON :OktaUserFactor(id);
 CREATE INDEX ON :OktaTrustedOrigin(id);
 CREATE INDEX ON: OktaAdministrationRole(id);
 CREATE INDEX ON :PublicIpAddress(ip);
 CREATE INDEX ON :RDSInstance(db_instance_identifier);
 CREATE INDEX ON :RDSInstance(id);
+CREATE INDEX ON :ReplyUri(id)
 CREATE INDEX ON :S3Acl(id);
 CREATE INDEX ON :S3Bucket(name);
 CREATE INDEX ON :User(arn);
diff --git a/cartography/data/jobs/cleanup/okta_import_cleanup.json b/cartography/data/jobs/cleanup/okta_import_cleanup.json
@@ -4,7 +4,7 @@
       "query": "MATCH (:OktaOrganization{id: {OKTA_ORG_ID}})-[:RESOURCE]->(n:OktaUser)-[:FACTOR]->(n:OktaUserFactor) WHERE n.lastupdated <> {UPDATE_TAG} WITH n LIMIT {LIMIT_SIZE} DETACH DELETE (n) return COUNT(*) as TotalCompleted",
       "iterative": true,
       "iterationsize": 100,
-      "__comment__": "Deletate stale OktaUserFactor"
+      "__comment__": "Delete stale OktaUserFactor nodes"
     },
     {
       "query": "MATCH (:OktaOrganization{id: {OKTA_ORG_ID}})-[:RESOURCE]->(:OktaUser)-[r:FACTOR]->(:OktaUserFactor)  WHERE r.lastupdated <> {UPDATE_TAG} WITH r LIMIT {LIMIT_SIZE} DELETE (r) return COUNT(*) as TotalCompleted",
@@ -18,6 +18,12 @@
       "iterationsize": 100,
       "__comment__": "Delete stale OktaUser"
     },
+    {
+      "query": "MATCH (:AWSRole)<-[r:ALLOWED_BY]-(:OktaGroup) WHERE r.lastupdated <> {UPDATE_TAG} WITH r LIMIT {LIMIT_SIZE} DELETE r return COUNT(*) as TotalCompleted",
+      "iterative": true,
+      "iterationsize": 100,
+      "__comment__": "Delete Stale OktaGroup to AWSRole ALLOWED_BY relationship"
+    },
     {
       "query": "MATCH (:OktaOrganization{id: {OKTA_ORG_ID}})-[:RESOURCE]->(:OktaGroup)<-[r:MEMBER_OF_OKTA_GROUP]-(:OktaUser) WHERE r.lastupdated <> {UPDATE_TAG} WITH r LIMIT {LIMIT_SIZE} DELETE (r) return COUNT(*) as TotalCompleted",
       "iterative": true,
@@ -64,6 +70,24 @@
       "query": "MATCH (n:OktaOrganization{id: {OKTA_ORG_ID}}) WHERE n.lastupdated <> {UPDATE_TAG} DELETE (n)",
       "iterative": false,
       "__comment__": "Delete stale OktaOrganization"
+    },
+    {
+      "query": "MATCH (n:ReplyUri) WHERE n.lastupdated <> {UPDATE_TAG} WITH n LIMIT {LIMIT_SIZE} DETACH DELETE (n) return COUNT(*) as TotalCompleted",
+      "iterative": true,
+      "iterationsize": 100,
+      "__comment__": "Delete Stale ReplyUri nodes"
+    },
+    {
+      "query": "MATCH (:ReplyUri)<-[r:REPLYURI]-(:OktaApplication) WHERE r.lastupdated <> {UPDATE_TAG} WITH r LIMIT {LIMIT_SIZE} DELETE r return COUNT(*) as TotalCompleted",
+      "iterative": true,
+      "iterationsize": 100,
+      "__comment__": "Delete Stale ReplyUri relationships"
+    },
+    {
+      "query": "MATCH (:AWSRole)<-[r:CAN_ASSUME_ROLE]-(:Human) WHERE r.lastupdated <> {UPDATE_TAG} WITH r LIMIT {LIMIT_SIZE} DELETE r return COUNT(*) as TotalCompleted",
+      "iterative": true,
+      "iterationsize": 100,
+      "__comment__": "Delete Stale Human to AWSRole CAN_ASSUME_ROLE relationship"
     }
   ],
   "name": "Okta intel module cleanup"

diff --git a/cartography/intel/okta/__init__.py b/cartography/intel/okta/__init__.py
@@ -3,6 +3,7 @@
 from okta.framework.OktaError import OktaError
 
 from cartography.intel.okta import applications
+from cartography.intel.okta import awssaml
 from cartography.intel.okta import factors
 from cartography.intel.okta import groups
 from cartography.intel.okta import organization
@@ -49,6 +50,7 @@ def start_okta_ingestion(neo4j_session, config):
     applications.sync_okta_applications(neo4j_session, config.okta_org_id, config.update_tag, config.okta_api_key)
     factors.sync_users_factors(neo4j_session, config.okta_org_id, config.update_tag, config.okta_api_key, state)
     origins.sync_trusted_origins(neo4j_session, config.okta_org_id, config.update_tag, config.okta_api_key)
+    awssaml.sync_okta_aws_saml(neo4j_session, config.okta_saml_role_regex, config.update_tag)
 
     # need creds with permission
     # soft fail as some won't be able to get such high priv token

diff --git a/cartography/intel/okta/applications.py b/cartography/intel/okta/applications.py
@@ -1,47 +1,44 @@
 # Okta intel module - Applications
 import json
 import logging
+from datetime import datetime
 
-from okta import AppInstanceClient
 from okta.framework.OktaError import OktaError
 
 from cartography.intel.okta.utils import create_api_client
 from cartography.intel.okta.utils import is_last_page
 
-logger = logging.getLogger(__name__)
-
 
-def _create_application_client(okta_org, okta_api_key):
-    """
-    Create Okta AppInstanceClient
-    :param okta_org: Okta organization name
-    :param okta_api_key: Okta API key
-    :return: Instance of AppInstanceClient
-    """
-    app_client = AppInstanceClient(
-        base_url=f"https://{okta_org}.okta.com/",
-        api_token=okta_api_key,
-    )
-
-    return app_client
+logger = logging.getLogger(__name__)
 
 
-def _get_okta_applications(app_client):
+def _get_okta_applications(api_client):
     """
     Get application data from Okta server
     :param app_client: api client
     :return: application data
     """
     app_list = []
 
-    page_apps = app_client.get_paged_app_instances()
-
+    next_url = None
     while True:
-        for current_application in page_apps.result:
-            app_list.append(current_application)
-        if not page_apps.is_last_page():
-            # Keep on fetching pages of users until the last page
-            page_apps = app_client.get_paged_app_instances(url=page_apps.next_url)
+        try:
+            # https://developer.okta.com/docs/reference/api/apps/#list-applications
+            if next_url:
+                paged_response = api_client.get(next_url)
+            else:
+                params = {
+                    'limit': 500,
+                }
+                paged_response = api_client.get_path('/', params)
+        except OktaError as okta_error:
+            logger.debug(f"Got error while listing applications {okta_error}")
+            break
+
+        app_list.extend(json.loads(paged_response.text))
+
+        if not is_last_page(paged_response):
+            next_url = paged_response.links.get("next").get("url")
         else:
             break
 
@@ -181,34 +178,50 @@ def transform_okta_application_list(okta_applications):
 
 
 def transform_okta_application(okta_application):
-    # https://github.com/okta/okta-sdk-python/blob/master/okta/models/app/AppInstance.py
     app_props = {}
-    app_props["id"] = okta_application.id
-    app_props["name"] = okta_application.name
-    app_props["label"] = okta_application.label
-    if okta_application.created:
-        app_props["created"] = okta_application.created.strftime("%m/%d/%Y, %H:%M:%S")
+    app_props["id"] = okta_application["id"]
+    app_props["name"] = okta_application["name"]
+    app_props["label"] = okta_application["label"]
+    if "created" in okta_application and okta_application["created"]:
+        app_props["created"] = datetime.strptime(
+            okta_application["created"], "%Y-%m-%dT%H:%M:%S.%fZ",
+        ).strftime("%m/%d/%Y, %H:%M:%S")
     else:
         app_props["created"] = None
 
-    if okta_application.lastUpdated:
-        app_props["okta_last_updated"] = okta_application.lastUpdated.strftime("%m/%d/%Y, %H:%M:%S")
+    if "lastUpdated" in okta_application and okta_application["lastUpdated"]:
+        app_props["okta_last_updated"] = datetime.strptime(
+            okta_application["lastUpdated"], "%Y-%m-%dT%H:%M:%S.%fZ",
+        ).strftime("%m/%d/%Y, %H:%M:%S")
     else:
         app_props["okta_last_updated"] = None
 
-    app_props["status"] = okta_application.status
+    app_props["status"] = okta_application["status"]
 
-    if okta_application.activated:
-        app_props["activated"] = okta_application.activated.strftime("%m/%d/%Y, %H:%M:%S")
+    if "activated" in okta_application and okta_application["activated"]:
+        app_props["activated"] = datetime.strptime(
+            okta_application["activated"], "%Y-%m-%dT%H:%M:%S.%fZ",
+        ).strftime("%m/%d/%Y, %H:%M:%S")
     else:
         app_props["activated"] = None
 
-    app_props["features"] = okta_application.features
-    app_props["sign_on_mode"] = okta_application.signOnMode
+    app_props["features"] = okta_application["features"]
+    app_props["sign_on_mode"] = okta_application["signOnMode"]
 
     return app_props
 
 
+def transform_okta_application_extract_replyurls(okta_application):
+    """
+    Extracts the reply uri information from an okta app
+    """
+
+    if "oauthClient" in okta_application["settings"]:
+        if "redirect_uris" in okta_application["settings"]["oauthClient"]:
+            return okta_application["settings"]["oauthClient"]["redirect_uris"]
+    return None
+
+
 def _load_okta_applications(neo4j_session, okta_org_id, app_list, okta_update_tag):
     """
     Add application into the graph
@@ -303,6 +316,39 @@ def _load_application_group(neo4j_session, app_id, group_list, okta_update_tag):
     )
 
 
+def _load_application_reply_urls(neo4j_session, app_id, reply_urls, okta_update_tag):
+    """
+    Add reply urls to their applications
+    :param neo4j_session: session with the Neo4j server
+    :param app_id: application to map the reply urls to
+    :param reply_urls: reply urls to map
+    :param okta_update_tag: The timestamp value to set our new Neo4j resources with
+    :return: Nothing
+    """
+    if not reply_urls:
+        return
+    ingest = """
+    MATCH (app:OktaApplication{id: {APP_ID}})
+    WITH app
+    UNWIND {URL_LIST} as url_list
+    MERGE (uri:ReplyUri{id: url_list})
+    ON CREATE SET uri.firstseen = timestamp()
+    SET uri.uri = url_list,
+    uri.lastupdated = {okta_update_tag}
+    WITH app, uri
+    MERGE (uri)<-[r:REPLYURI]-(app)
+    ON CREATE SET r.firstseen = timestamp()
+    SET r.lastupdated = {okta_update_tag}
+    """
+
+    neo4j_session.run(
+        ingest,
+        APP_ID=app_id,
+        URL_LIST=reply_urls,
+        okta_update_tag=okta_update_tag,
+    )
+
+
 def sync_okta_applications(neo4j_session, okta_org_id, okta_update_tag, okta_api_key):
     """
     Sync okta application
@@ -314,15 +360,13 @@ def sync_okta_applications(neo4j_session, okta_org_id, okta_update_tag, okta_api
     """
     logger.debug("Syncing Okta Applications")
 
-    app_client = _create_application_client(okta_org_id, okta_api_key)
+    api_client = create_api_client(okta_org_id, "/api/v1/apps", okta_api_key)
 
-    okta_app_data = _get_okta_applications(app_client)
+    okta_app_data = _get_okta_applications(api_client)
     app_data = transform_okta_application_list(okta_app_data)
     _load_okta_applications(neo4j_session, okta_org_id, app_data, okta_update_tag)
 
-    api_client = create_api_client(okta_org_id, "/api/v1/apps", okta_api_key)
-
-    for app in app_data:
+    for app in okta_app_data:
         app_id = app["id"]
         user_list_data = _get_application_assigned_users(api_client, app_id)
         user_list = transform_application_assigned_users_list(user_list_data)
@@ -331,3 +375,6 @@ def sync_okta_applications(neo4j_session, okta_org_id, okta_update_tag, okta_api
         group_list_data = _get_application_assigned_groups(api_client, app_id)
         group_list = transform_application_assigned_groups_list(group_list_data)
         _load_application_group(neo4j_session, app_id, group_list, okta_update_tag)
+
+        reply_urls = transform_okta_application_extract_replyurls(app)
+        _load_application_reply_urls(neo4j_session, app_id, reply_urls, okta_update_tag)