# Search and Find Data Assets using Google Data Catalog API

Google Data Catalog API Reference:
https://googleapis.dev/python/datacatalog/latest/index.html

Google Cloud Data Catalog hands-on guide: search, get & lookup with Python:
https://medium.com/google-cloud/data-catalog-hands-on-guide-search-get-lookup-with-python-82d99bfb4056

Setup environment and import the necessary libraries

In [None]:
%env GOOGLE_APPLICATION_CREDENTIALS=

In [2]:
from google.cloud import datacatalog_v1

In [3]:
datacatalog = datacatalog_v1.DataCatalogClient()

In [4]:
scope = datacatalog_v1.types.SearchCatalogRequest.Scope()
scope.include_project_ids.append('sarun-project')

## Search Catalog using Query

Any query that works in Google Data Catalog also works here.

### Ex1: search with filter for finding BigQuery datasets

In [5]:
results = datacatalog.search_catalog(scope=scope, query='system=bigquery type=dataset')

In [6]:
results

SearchCatalogPager<results {
  search_result_type: ENTRY
  search_result_subtype: "entry.dataset"
  relative_resource_name: "projects/sarun-project/locations/us/entryGroups/@bigquery/entries/cHJvamVjdHMvc2FydW4tcHJvamVjdC9kYXRhc2V0cy9kc2lfZGF0YXNldA"
  linked_resource: "//bigquery.googleapis.com/projects/sarun-project/datasets/dsi_dataset"
  integrated_system: BIGQUERY
}
>

### Ex2: search with filter for finding Tag Templates

In [7]:
results = datacatalog.search_catalog(scope=scope, query='type=TAG_TEMPLATE')

In [8]:
fetched_results = [result for result in results]
fetched_results

[search_result_type: TAG_TEMPLATE
 search_result_subtype: "tag_template"
 relative_resource_name: "projects/sarun-project/locations/us/tagTemplates/data_loss_prevention"
 linked_resource: "//datacatalog.googleapis.com/projects/sarun-project/locations/us/tagTemplates/data_loss_prevention",
 search_result_type: TAG_TEMPLATE
 search_result_subtype: "tag_template"
 relative_resource_name: "projects/sarun-project/locations/us/tagTemplates/test_column_metadata"
 linked_resource: "//datacatalog.googleapis.com/projects/sarun-project/locations/us/tagTemplates/test_column_metadata",
 search_result_type: TAG_TEMPLATE
 search_result_subtype: "tag_template"
 relative_resource_name: "projects/sarun-project/locations/us/tagTemplates/test_table_metadata"
 linked_resource: "//datacatalog.googleapis.com/projects/sarun-project/locations/us/tagTemplates/test_table_metadata",
 search_result_type: TAG_TEMPLATE
 search_result_subtype: "tag_template"
 relative_resource_name: "projects/sarun-project/locations/

### Ex3: search using keywords

In [9]:
results = datacatalog.search_catalog(scope=scope, query='Python')
results

SearchCatalogPager<results {
  search_result_type: TAG_TEMPLATE
  search_result_subtype: "tag_template"
  relative_resource_name: "projects/sarun-project/locations/us/tagTemplates/python_tag_template"
  linked_resource: "//datacatalog.googleapis.com/projects/sarun-project/locations/us/tagTemplates/python_tag_template"
}
>

### Ex4: search with filter for finding policy tag named PII

In [10]:
results = datacatalog.search_catalog(scope=scope, query='policyTag:PII')
results

SearchCatalogPager<results {
  search_result_type: ENTRY
  search_result_subtype: "entry.table"
  relative_resource_name: "projects/sarun-project/locations/us/entryGroups/@bigquery/entries/cHJvamVjdHMvc2FydW4tcHJvamVjdC9kYXRhc2V0cy9kc2lfZGF0YXNldC90YWJsZXMvb3JnX2V4cGVkaXRlX2ludHVpdGl2ZV9wYXJhZGlnbXNfM2RlMA"
  linked_resource: "//bigquery.googleapis.com/projects/sarun-project/datasets/dsi_dataset/tables/org_expedite_intuitive_paradigms_3de0"
  integrated_system: BIGQUERY
}
results {
  search_result_type: ENTRY
  search_result_subtype: "entry.table"
  relative_resource_name: "projects/sarun-project/locations/us/entryGroups/@bigquery/entries/cHJvamVjdHMvc2FydW4tcHJvamVjdC9kYXRhc2V0cy9kc2lfZGF0YXNldC90YWJsZXMvdGF4aV90cmlwcw"
  linked_resource: "//bigquery.googleapis.com/projects/sarun-project/datasets/dsi_dataset/tables/taxi_trips"
  integrated_system: BIGQUERY
}
>

### Ex5: search with filter for finding data assets with specific tag templates

In [11]:
results = datacatalog.search_catalog(scope=scope, query='tag:sarun-project.python_tag_template')
results

SearchCatalogPager<results {
  search_result_type: ENTRY
  search_result_subtype: "entry.table"
  relative_resource_name: "projects/sarun-project/locations/us/entryGroups/@bigquery/entries/cHJvamVjdHMvc2FydW4tcHJvamVjdC9kYXRhc2V0cy9kc2lfZGF0YXNldC90YWJsZXMvdGF4aV90cmlwcw"
  linked_resource: "//bigquery.googleapis.com/projects/sarun-project/datasets/dsi_dataset/tables/taxi_trips"
  integrated_system: BIGQUERY
}
results {
  search_result_type: ENTRY
  search_result_subtype: "entry.table"
  relative_resource_name: "projects/sarun-project/locations/us/entryGroups/test/entries/public_company"
  linked_resource: "ec2-52-20-66-171.compute-1.amazonaws.com/public/company"
  user_specified_system: "test"
}
>

## List Entry Groups and Entries

In [12]:
project_id = 'sarun-project'
location = 'us'

First, get a list of all entry groups

In [13]:
results = datacatalog.list_entry_groups(parent=f'projects/{project_id}/locations/{location}')
fetched_results = [result for result in results]
fetched_results

[name: "projects/sarun-project/locations/us/entryGroups/postgresql"
 data_catalog_timestamps {
   create_time {
     seconds: 1610692312
     nanos: 374000000
   }
   update_time {
     seconds: 1610692312
     nanos: 374000000
   }
 },
 name: "projects/sarun-project/locations/us/entryGroups/test"
 data_catalog_timestamps {
   create_time {
     seconds: 1610692843
     nanos: 581000000
   }
   update_time {
     seconds: 1610692843
     nanos: 581000000
   }
 }]

Second, fetch entries from an entry group.

In [14]:
datacatalog.list_entries(parent=fetched_results[0].name)

ListEntriesPager<entries {
  name: "projects/sarun-project/locations/us/entryGroups/postgresql/entries/public"
  display_name: "public"
  linked_resource: "ec2-52-20-66-171.compute-1.amazonaws.com/public"
  user_specified_type: "schema"
  user_specified_system: "postgresql"
}
entries {
  name: "projects/sarun-project/locations/us/entryGroups/postgresql/entries/public_company"
  display_name: "company"
  schema {
    columns {
      type: "character"
      mode: "NULLABLE"
      column: "address"
    }
    columns {
      type: "integer"
      mode: "NULLABLE"
      column: "age"
    }
    columns {
      type: "integer"
      mode: "NULLABLE"
      column: "id"
    }
    columns {
      type: "text"
      mode: "NULLABLE"
      column: "name"
    }
    columns {
      type: "real"
      mode: "NULLABLE"
      column: "salary"
    }
  }
  linked_resource: "ec2-52-20-66-171.compute-1.amazonaws.com/public/company"
  user_specified_type: "table"
  user_specified_system: "postgresql"
}
>

## Lookup Entry

Use for finding specific item in Google Cloud Platform

### Ex1: finding BigQuery table

In [15]:
project_id = 'sarun-project'
location = 'us'
dataset_id = 'dsi_dataset'
table_id = 'taxi_trips'

In [16]:
resource_name = '//bigquery.googleapis.com/projects/{}' \
                '/datasets/{}/tables/{}'.format(project_id, dataset_id, table_id)
table_entry = datacatalog.lookup_entry(request={"linked_resource": resource_name})

In [17]:
table_entry

name: "projects/sarun-project/locations/us/entryGroups/@bigquery/entries/cHJvamVjdHMvc2FydW4tcHJvamVjdC9kYXRhc2V0cy9kc2lfZGF0YXNldC90YWJsZXMvdGF4aV90cmlwcw"
type: TABLE
schema {
  columns {
    type: "STRING"
    description: "A code indicating the LPEP provider that provided the record. 1= Creative Mobile Technologies, LLC; 2= VeriFone Inc."
    mode: "REQUIRED"
    column: "vendor_id"
  }
  columns {
    type: "DATETIME"
    description: "The date and time when the meter was engaged"
    mode: "NULLABLE"
    column: "pickup_datetime"
  }
  columns {
    type: "DATETIME"
    description: "The date and time when the meter was disengaged"
    mode: "NULLABLE"
    column: "dropoff_datetime"
  }
  columns {
    type: "STRING"
    description: "This flag indicates whether the trip record was held in vehicle memory before sending to the vendor, aka \342\200\234store and forward,\342\200\235 because the vehicle did not have a connection to the server. Y= store and forward trip N= not a store