Skip to content
This repository has been archived by the owner on Jun 5, 2023. It is now read-only.

Removed all cai fallback. #3116

Merged
merged 2 commits into from
Aug 21, 2019
Merged

Conversation

joecheuk
Copy link
Contributor

@joecheuk joecheuk commented Aug 21, 2019

Resolves #3102

Copy link
Contributor

@blueandgold blueandgold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet!!!!

@hshin-g hshin-g merged commit 5873894 into inventory-optimization Aug 21, 2019
@dekuhn dekuhn added this to the v2.22.0 milestone Sep 16, 2019
hshin-g added a commit that referenced this pull request Oct 10, 2019
* Incremented version to 2.18.0.

* Added missing comma in the iam roles list.

* Speed up import of cloudasset data into temporary sql table.

* Strip unused data before writing to the database
* Switch from SQLAlchemy ORM to Core style bulk CAI inserts
* Ensure the maximum number of rows are written on each bulk insert

This is one of a series of changes to reduce the time to complete
a forseti inventory snapshot for large organizations.

This change reduces the time to import data into a cloudsql database
from a test system by about 50%, from about 312 seconds to about 144
seconds for around 350 megabytes of raw data.

* Ignore unactionable ResourceWarning messages in tests.

* Update CAI temporary table read methods to use SQLAlchemy Core.

* Remove per thread sessions from CAI crawler implementation
* Remove ORM overhead when reading assets from CAI temporary table

This is part of a series of changes to reduce the run time of the
forseti inventory crawler for large organizations.

* Move the CAI temporary table to a local sqlite database.

* This reduces network load by 50% by reducing round trips to the
cloudsql server, removing 700MB of network traffic when 350MB of test
assets is imported.
* Total time to import data into the sqlite database is further reduced
from 150 seconds to 30 seconds for 350MB of test assets.
* Temporary file is cleaned up at the end of inventory, reducing storage
requirements on server.

* Further optimize sqlite database interface.

* Set pragmas to speed up writes and reads. Can write 350MB in 11
seconds on test system (down from 330 seconds for version 2.18 baseline).

* Optimize per thread reading, can complete over 100 queries per second
on 10 threads with an 8 CPU virtual machine. Queried 5000 project
cloudsql instances in 45 seconds on test VM.

* Fix db_migrator and lint issue.

* Pass thread count consistently across constructors.

* Update dependent google python libraries to latest versions. (#3021)

* Stream CloudAsset data straight to sqlite database.

* Change cloudasset implementation to stream data from GCS to the local
sqlite database using OS pipes instead of downloading the full file to a
temporary location and then reading it from there.

This change reduces storage requirements on Forseti Server and allows
data to be processed only once instead of writing and rereading each row
of data from the CloudAsset export.

* Fix flake.

* Increase max asset name size to 2048 and add additional comments.

* Remove extra imports in test.

* Move methods to read from the inventory table into DataAccess class.

This cleans up the base Storage class to focus on efficient writing of
data into the Inventory table.

* Refactored inventory to remove global write lock on database.

* Use SQLAlchemy Core for writing inventory rows to Cloud SQL.
* Removed all updates to inventory data, each row is only written once.
* Added full_name to the inventory table for each resource.
* Created a memory cache for written resources, detects and skips duplicate full names. This introduces a small global lock on updating the cache, but is less than a millisecond on average.
* Ensured warning messages written to the inventory index table always contain the full context.
* Fix external project scanner to use new DAO for inventory data.

* Fix flakes.

* Fix docstring.

* Address comments.

* Update Forseti version (#3047)

* + Add _iter_bigquery_tables func
+ Add bigquery.googleapis.com/Table

* + Add mock data

* Revert "Merge branch 'inventory-optimization-3035' into inventory-optimization"

This reverts commit babc2ef, reversing
changes made to 52d0a0e.

* Move inventory warning messages to a child table of inventory_index.

This change allows warnings to be written in parallel from multiple
threads, and removes the need to update the existing inventory_index row
for each new warning.

Warnings are loaded along side the inventory index through the DAO.

* Add a warnings_count virtual column to InventoryIndex.

The warnings_count field is more efficient when code just needs to check
if warnings exist.

Added additional comments to inventory tables to make the current usage
of various columns more explicit.

* Add full resource name field to output warning messages.

* Add Bigquery table data (#3057)

* + Add Bigquery table data

* + Update test for bigquery table

* Reformatted code.

* Ensure errors during inventory run are handled correctly.

* Make sure errors are recorded in inventory index table.
* Make sure errors are handled by the service without raising grpc
exception.
* Add a unit test to verify functionality.

* Allow inventory module to ingest GCS dump files directly. (#3065)

* Added dump file path variable in config file.

* Removed unncessary changes.

* Removed unused variables in docstring.

* Addressed PR comments.

* updates

* updates.

* Removed unused import.

* [CAI] Add compute.googleapis.com/RegionDisk from CAI to Forseti Inventory (#3073)

* + Add try except block for CAI export

* pylint fix

* Added locking before incrementing object count. (#3080)

* Updated to use the engine execute method on rollback / commit. (#3105)

* Updated to use the execute method on rollback / commit.

* Updates

* Added additional checks for FAILURE status on commit.

* flake8 format updates.

* Added expunge to inventory index object.

* updates.

* updates.

* Added extra log statement.

* Add Cloud Profiler to Forseti server (#3113)

* Update python base image

* Update python base image

* explicit package vwersion, pylint fixes

* pylint fixes, nits

* Add cloud profiler to optional packages, add try-except for cloud profiler import

* Remove cloud profiler from requirements.txt

* Removed all cai fallback. (#3116)

* Removed all cai fallback.

* updates.

* Added command to remove tmp files prefixed with forseti-cai. (#3129)

* Added command to remove tmp files prefixed with forseti-cai.

* updates.

* updates completed_at_datetime with a utc timestamp. (#3130)

* Commit instead of flush while building the data model.

* Commit once every 500k resources flushed.

* Fix Flaky Firewall Test (#3118)

* Fix flaky test

* fix replay test

* add comment

* add print

* add print

* add print

* add print

* remove print

(cherry picked from commit cde7b83)

* Fix Flaky Replay Test (#3119)

* test

* test

* fake request.uri

* fix in test

* revert replay.py

* add comment

(cherry picked from commit 8d3d162)

* Updated to commit once every 100k rows flushed. (#3154)

* commit per scanner run.

* Commit per scanner run (#3158)

* Commit per resource type, use add() instead of add_all when inserting to the session.

* Check for role uniqueness properly.

* Updated to use the .get() method when retrieving values in a dict in method _convert_role().

* Added comprehensive debug messages.

* Updated to use resource count instead of flush count when logging.

* Added collation to name column in Role table to make sure the column is case sensitive.

* Updated binding table to reference to role name column with the correct type.

* + Add Service Usage API

* pylint fixes

* Added paging for config validator scanner. (#3191)

* Updated config validator scanner to page the input & output.

* Updated max page size to 3.5 mb.

* Updated the mechanism to estimate dictionary size.

* Added debug log statement.

* Updated to audit once every 50 MB.

* Add unittests

* Nit changes

* Updated scanner_iter method to use query.slice() instead of yield_per and increased the cv audit size to 100mb per. (#3244)

* Use query.slice() instead of yield_per to avoid losing connection to mysql server.

* updates

* Updated to to audit every 100mb.

* Updated block size to 4096.

* Merge changes from RC (#3253)

* commit per scanner run.

* Commit per resource type, use add() instead of add_all when inserting to the session.

* Check for role uniqueness properly.

* Updated to use the .get() method when retrieving values in a dict in method _convert_role().

* Added comprehensive debug messages.

* Updated to use resource count instead of flush count when logging.

* Added collation to name column in Role table to make sure the column is case sensitive.

* Updated binding table to reference to role name column with the correct type.

* + Add Service Usage API

* pylint fixes

* Add unittests

* Nit changes

* Updated iter() method in storage.py to use slice() instead of yield_p… (#3255)

* Updated iter() method in storage.py to use slice() instead of yield_per().

* updates.

* Lowercased generator in docstring.

* Removed page_query usage in storage.py (#3271)

* Update sizes of columns (#3267)

* Update sizes of columns

* Update sizes of columns

* Updated to use yield_per for all the scanners except CV. (#3274)

* Fix bigtable CAI resources (#3297)

* Fix bigtable CAI resources

* lint fixes

* Fix CAI Disk resource (#3298)
@gkowalski-google gkowalski-google deleted the remove-cai-fallback branch February 25, 2020 14:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants