Added Support For Azure CosmosDB #573

kedarghule · 2021-03-19T06:22:59Z

This PR adds initial support for Azure CosmosDB. Following resources are populated -

Azure Database Accounts
Azure CosmosDB Locations
Azure Database Accounts Cors Policies
Azure Database Accounts Failover Policies
Private Endpoint Connections
Virtual Network Rules
SQL Databases
SQL Containers
Cassandra Keyspaces
Cassandra Tables
MongoDB Databases
MongoDB Collections
CosmosDB Table Resources

# Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.

…ccount

docs/schema/azure.md

cartography/intel/azure/cosmosdb.py

achantavy · 2021-04-06T05:48:48Z

cartography/intel/azure/cosmosdb.py

+        cassandra_keyspaces = get_cassandra_keyspaces(credentials, subscription_id, database_account)
+        mongodb_databases = get_mongodb_databases(credentials, subscription_id, database_account)
+        table_resources = get_table_resources(credentials, subscription_id, database_account)
+        yield database_account['id'], database_account['name'], database_account[


[question] Why use yield here when the convention is to return a list of dicts? Did you run into memory issues when testing? If so, add a comment on the reason here.

@achantavy I saw the codes for AWS and figured the convention was to use yield in a function like this where we get the secondary resources stemming from our primary resource. I followed that for SQL and Storage so I followed the same for this code too.

achantavy · 2021-04-06T05:51:51Z

cartography/intel/azure/cosmosdb.py

+        return []
+    except HttpResponseError as e:
+        logger.warning(f"Error while retrieving Cassandra keyspaces list - {e}")
+        return []


Something to keep in mind for the other get functions: if we have a transient HTTP error, this function will return null, and then the cleanup functions will run, resulting in all data from a previous run being deleted.

Not going to block on this since this is a larger problem with cartography itself (we should make error handling configurable and expose that to all modules), but just wanted to make sure you considered this.

cc: @mpurusottamc

@achantavy Yes, this makes a lot of sense to handle transient HTTP errors. I will open an issue to keep track and work on all the get functions for both Azure and AWS.

One idea is to return multiple values from get functions. First value would be error code and second value would be the actual list. If there's an error, we ignore the load and cleanup steps. Let me know what you think about this.
cc: @kedarghule

@mpurusottamc Let's continue the conversation on this in #166.

achantavy · 2021-04-06T05:53:58Z

cartography/intel/azure/cosmosdb.py

+    try:
+        client = get_client(credentials, subscription_id)
+        mongodb_database_list = list(
+            map(


[question] I'm a bit confused by this syntax, can you explain? I'm guessing list_mongo_db_databases returns a list of objects and you use as_dict() to convert it to a list of dicts?

Yes @achantavy, you're right. It returns a list of objects and I am converting them to a list of dicts.

achantavy · 2021-04-06T06:01:39Z

cartography/intel/azure/cosmosdb.py

+    mongodb_databases: List[Dict] = []
+    table_resources: List[Dict] = []
+
+    for account_id, name, resourceGroup, sql_database, cassandra_keyspace, mongodb_database, table in details:


Same comment here about decoupling the data transforms from the load functions.

I also think it's more maintainable to split these by resource type: one function for sql dbs, one for cassandra, one for mongo, one for tables.

@achantavy - I added a transform function for the manipulations happening here. My bad I forgot to do that.

About the splitting up bit, firstly, it would just repeat a lot of code. I have split it up later for the individual resources under SQL databases, Cassandra keyspaces and Mongo. Second, these are the direct resources under Database Accounts (our primary resource) so I figured I'll keep them in this function as similar conventions were followed in the AWS codes. Lastly, the details gets populated in get_database_account_details() (line 436). If I have to split up the code for each resource and I'll have to do some major overhauls which I feel might repeat the code.

Fair points on copying prior patterns. Disclaimer: there are lots of prior code in the AWS modules that aren't exactly shining examples of great software :-)

In general, I think it's less bad to repeat code than it is to tightly couple and generalize functions too early ("write code that is easy to delete" and all that).

All that said, we're working to improve things by standardizing on the load steps in the near future: https://docs.google.com/document/d/1IZ12R3oROn11LcYj5XunokyOjJkKu-H2O1TEk065Dsk/edit#

Will not block for now.

Alright. I'll def check out this document :) Thanks for letting me know :)

achantavy · 2021-04-06T06:05:31Z

cartography/intel/azure/cosmosdb.py

+            for loc in database_account['locations']:
+                locations.append(loc)
+        # Selecting only the unique location entries
+        loc = [i for n, i in enumerate(locations) if i not in locations[n + 1:]]


How does this select only uniques? Could we use a set instead?

@achantavy - So locations is a list of dictionaries with each item having the following sample structure -
{
"id": "DA1-eastus",
"location_name": "East US",
"document_endpoint": "https://DA1-eastus.documents.azure.com:443/",
"provisioning_state": "Succeeded",
"failover_priority": 0,
}
And the plan is to return such unique dicts from the list of locations. Since locations is a list of dicts, set() can't be used since it is unhashable. I tried it and got a TypeError and saw that explanation on stackoverflow.

So I used the method 2 on this link - https://www.geeksforgeeks.org/python-removing-duplicate-dicts-in-list/ and it worked like a peach! Tested this code and it works :)

Are all the items unique by their "id"? Here's a nicer way you could get the unique items, borrowing from the third example in your link.

>>> test_list = [{'id' : i} for i in range(3)] * 2 >>> test_list [{'id': 0}, {'id': 1}, {'id': 2}, {'id': 0}, {'id': 1}, {'id': 2}] >>> unique_lists = {x['id'] : x for x in test_list}.values() >>> unique_lists dict_values([{'id': 0}, {'id': 1}, {'id': 2}]) >>> test_list = [{'id' : i} for i in range(3)]*2 >>> test_list [{'id': 0}, {'id': 1}, {'id': 2}, {'id': 0}, {'id': 1}, {'id': 2}] >>> uniques = {x['id'] : x for x in test_list}.values() >>> uniques dict_values([{'id': 0}, {'id': 1}, {'id': 2}]) >>> list(uniques) [{'id': 0}, {'id': 1}, {'id': 2}]

The id field is not unique. Hence I stuck to the method I used.

Can you provide an example?

Something like this - You may have this in 'read locations'
{
"id": "DA1-eastus",
"location_name": "East US",
"document_endpoint": "https://DA1-eastus.documents.azure.com:443/",
"provisioning_state": "Succeeded",
"failover_priority": 0,
}

And you may get this in 'write locations' -
{
"id": "DA1-eastus",
"location_name": "East US",
"document_endpoint": "https://DA1-eastus.documents.azure.com:443/",
"provisioning_state": "Succeeded",
"failover_priority": 10,
}

In this case, they need to be treated as two different "locations" because of the different values in failover_priority.

I think I understand now. And later on in line 161-170 you do another filtering of this list. I recommend creating separate lists for each of the types "read", "write", and "associated". Would these individual lists each need to be reduced to have their items be unique?

For example

write_locations = database_account.get('write_locations', [])

And then I think for your _load_..._locations() you might also use UNWIND.

achantavy · 2021-04-06T06:17:12Z

cartography/intel/azure/cosmosdb.py

+            {'UPDATE_TAG': azure_update_tag, 'AZURE_SUBSCRIPTION_ID': subscription_id},
+        )
+
+        if 'cors' in database_account and len(database_account['cors']) > 0:


This function does too many things at once. It does a transform+load for each type of policies/rules, and it does a transform+load for each type of location.

I'd recommend separating this for each resource.

I'll clarify this in a follow-up comment if needed; hold off on editing if you're confused. It's getting late here but I wanted to make sure I gave you a somewhat timely second pass. :)

kedarghule · 2021-04-06T11:42:34Z

@achantavy Made necessary changes and left replies for you :) Lemme know of any more changes.

cartography/intel/azure/cosmosdb.py

ramonpetgrave64

Please address the comments about the _load_... discussion.

kedarghule · 2021-04-15T05:38:53Z

Please address the comments about the _load_... discussion.

Made the necessary changes. Lemme know if there are any more changes.

docs/schema/azure.md

achantavy · 2021-04-15T05:48:32Z

docs/schema/azure.md

+- Azure Database Account has write permissions from, read permissions from and is associated with Azure CosmosDB Locations.
+
+        ```
+        (AzureCosmosDBAccount)-[WRITE_PERMISSIONS_FROM]->(AzureCosmosDBLocation)


I'm a little confused about the wording of this since I am not familiar with Cosmos. What does it mean for a CosmosDBAccount to have read permissions from a CosmosDBLocation versus having it be associated with a CosmosDBLocation?

read_locations : An array that contains of the read locations enabled for the Cosmos DB account.

write_locations: An array that contains the write location for the Cosmos DB account.

locations: This is the one which creates the "associated with" relationship. An array that contains all of the locations enabled for the Cosmos DB account. Also, specifies a region in which the Azure Cosmos DB database account is deployed.

Maybe something like CAN_READ and CAN_WRITE?

Database accounts can't read a location. They will have read permission from a location. Likewise for write permissions.

What do you mean by "They will have read permission from a location"? For a particular database and location, do you mean the accounts will have permissions to read the data when accessing the database through that location?

How about CAN_READ_FROM?

Yes, you are right. Okay! I'll change it to CAN_READ_FROM and CAN_WRITE_FROM.

kedarghule · 2021-04-15T06:25:56Z

@achantavy Made the changes. Let me know if there are any more :)

kedarghule · 2021-04-20T05:09:38Z

@achantavy and @ramonpetgrave64 All changes made :) Let me know if there are any more! :)

achantavy

Thanks again for the contribution!

kedarghule · 2021-04-21T06:27:47Z

@achantavy Updated the latest in this branch. You can merge into master :)

ramonpetgrave64

Thanks for your patience!

mpurusottamc and others added 30 commits December 5, 2020 07:47

Updated Readme to add Cloudanix as a Cartography user

9eaba96

Merge remote-tracking branch 'upstream/master' into master

2cd649d

# Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.

Merge branch 'master' of github.com:lyft/cartography

8d33f8d

service principal based azure auth added

ba65b42

load azure tenant & subscriptions

4a1ce21

added support for cli auth and add azure principal

c18035b

added indexes for azure tenant, principal and subscription

1ea3718

added support for azure sync

f5fbc5c

log message fix

4f0d5d8

fixed azure auth credentials parameters issue

d417525

azure auth issues with subscription fixed

2080ecc

Started Azure module CosmosDB - Added basic resource Database Accounts

5b3172d

Updates to the database accounts code in the Azure CosmosDB module

8d5af51

Added support for SQL and Cassandra resources in Azure CosmosDB module

d70e996

Added support for MongoDB resources in Azure CosmosDB module

018752c

Added support for Table resources in the CosmosDB module

cefa8fd

Added cleanup code for Azure CosmosDB module

38401b4

Renamed AzureCosmosDBEnabledLocation to AzureCosmosDBLocation

f5a1cb9

Testing done for resources - table, Cassandra resources and DatabaseA…

93bebfe

…ccount

Some updates relating to cleanup code

7cd501a

Used UNWIND instead of loops

49794e4

Specified Exception, Added type hints, Added dependency

6cd9de0

Column width 120 adhered to

f84ce66

Merge branch 'master-upstream' into azure-cosmosdb-fixed

81e4d4b

Added documentation

af92596

Modified Cypher queries

cc541e2

Added integration tests

562d662

Removed unnecessary files

f81eb33

Formatting changes

3192e2d

Added trailing commas

1317213

Merge branch 'master' into azure-cosmosdb-fixed

22dd166

achantavy requested changes Apr 6, 2021

View reviewed changes

kedarghule added 3 commits April 6, 2021 14:34

Changed indexes, modified the way exceptions are handled

97c26c6

Added transform function

c5578ed

Added type hints

802db80

kedarghule requested a review from achantavy April 6, 2021 11:39

ramonpetgrave64 reviewed Apr 14, 2021

View reviewed changes

cartography/intel/azure/cosmosdb.py Outdated Show resolved Hide resolved

ramonpetgrave64 reviewed Apr 14, 2021

View reviewed changes

cartography/intel/azure/cosmosdb.py Outdated Show resolved Hide resolved

ramonpetgrave64 suggested changes Apr 15, 2021

View reviewed changes

Made changes as per review

b637097

kedarghule requested a review from ramonpetgrave64 April 15, 2021 05:31

kedarghule and others added 2 commits April 15, 2021 11:03

Merge branch 'master' into azure-cosmosdb-fixed

1b639ac

Made linter changes

ba1b248

achantavy reviewed Apr 15, 2021

View reviewed changes

docs/schema/azure.md Show resolved Hide resolved

achantavy reviewed Apr 15, 2021

View reviewed changes

kedarghule requested a review from achantavy April 15, 2021 06:02

Updated documentation

8b04c07

kedarghule and others added 4 commits April 15, 2021 12:20

Some major code restructuring done

611fc57

Cleaned up the code a little

9a66261

Changed CosmosDBLocation relationships

ea44877

Merge branch 'master' into azure-cosmosdb-fixed

804e7c6

achantavy approved these changes Apr 21, 2021

View reviewed changes

Merge branch 'master' into azure-cosmosdb-fixed

73d70c7

ramonpetgrave64 approved these changes Apr 22, 2021

View reviewed changes

ramonpetgrave64 merged commit 142bc2b into lyft:master Apr 22, 2021

Added Support For Azure CosmosDB #573

Added Support For Azure CosmosDB #573

Conversation

kedarghule commented Mar 19, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kedarghule Apr 6, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ramonpetgrave64 Apr 15, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kedarghule commented Apr 6, 2021

ramonpetgrave64 left a comment

Choose a reason for hiding this comment

kedarghule commented Apr 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kedarghule Apr 20, 2021 • edited

Choose a reason for hiding this comment

kedarghule commented Apr 15, 2021

kedarghule commented Apr 20, 2021

achantavy left a comment

Choose a reason for hiding this comment

kedarghule commented Apr 21, 2021

ramonpetgrave64 left a comment

Choose a reason for hiding this comment

kedarghule Apr 6, 2021 •

edited

ramonpetgrave64 Apr 15, 2021 •

edited

kedarghule Apr 20, 2021 •

edited