Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect usage history (access logs) data #265

Closed
mabdh opened this issue Aug 23, 2022 · 1 comment · Fixed by #331
Closed

Collect usage history (access logs) data #265

mabdh opened this issue Aug 23, 2022 · 1 comment · Fixed by #331
Assignees
Labels
access enhancement New feature or request roadmap

Comments

@mabdh
Copy link
Member

mabdh commented Aug 23, 2022

Summary
As part of #171 , one scope of access monitoring is from the provider access logs: Analytics about what resources are being actually queried and how frequently. We need to figure out how guardian utilize access usage history data for data access monitoring.

Some questions to help

  1. How many times a resource is accessed
  2. How many times sensitive resources are getting accessed
  3. Identify what operations are done on a resource (read/write)
  4. Identify users with excess access rights (access that is active but is not used)

Proposed solution

  1. Track access log to figure which user access what resource
  2. Check which providers that support usage history extraction
    • Bigquery
    • GCS
    • Gcloud IAM
    • Metabase
    • Tableau
    • Grafana
  3. For provider that does not support access log (Expose APIs to submit access logs as well to help capture for providers which does not provide direct log extraction)
  4. Labels for resources to mark as sensitive
  5. Define generic data model

Additional context
This issue is concluded once these points are clearly answered

  • Clarity on what providers that support usage history extraction and what doesn’t
  • Generic data model on how the usage history/activities logs will be persisted in Guardian
  • Approach on how to label a resource as sensitive
  • Clear approaches on how to collect usage history
    • From provider that support usage history extraction
    • From provider that does not support usage history extraction
      • Exposed API to submit access logs
@mabdh mabdh added the access label Aug 23, 2022
@mabdh mabdh changed the title feat: collecting usage history (access logs) data Data access monitoring: Collecting usage history (access logs) data Aug 23, 2022
@mabdh mabdh added the roadmap label Aug 23, 2022
@rahmatrhd rahmatrhd added the enhancement New feature or request label Aug 25, 2022
@rahmatrhd
Copy link
Member

rahmatrhd commented Sep 2, 2022

type ProviderActivity struct {
	ID         string
	ProviderID string
	ResourceID string
	AccountID  string
	Timestamp  time.Time
	// Action correlates with grant role/permissions
	Action string // read | write | ...
	// Type is specific to the provider. It defines what kind of activity is being run
	Type     string // query | view | export | etc...
	Metadata map[string]interface{}
}

// example
bqQueryLog := ProviderActivity{
	ID: "123",
	ProviderID: "<provider-id>",
	ResourceID: "<resource-id>",
	AccountID: "user@example.com",
	Timestamp: "2022-09-02T12:00:00:00Z",
	Action: "read",
	Type: "query",
	Metadata: map[string]interface{}{...},
}

I think this is how the activity log entity should looks like, at least
note: this is still the first proposal, might update this in the future

Will check on each provider what kind of Types that they're support in their activity logs

@ravisuhag ravisuhag changed the title Data access monitoring: Collecting usage history (access logs) data Collect usage history (access logs) data Sep 21, 2022
@ravisuhag ravisuhag linked a pull request Nov 30, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
access enhancement New feature or request roadmap
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants