Skip to content
This repository has been archived by the owner on Jun 5, 2023. It is now read-only.

Update dependent google python libraries to latest versions. #3021

Merged
merged 1 commit into from
Aug 5, 2019

Conversation

ahoying
Copy link
Collaborator

@ahoying ahoying commented Aug 3, 2019

Thanks for opening a Pull Request!

Here's a handy checklist to ensure your PR goes smoothly.

These guidelines and more can be found in our contributing guidelines.

@ahoying ahoying added this to the v2.20.0 milestone Aug 3, 2019
@ahoying ahoying requested a review from joecheuk August 3, 2019 23:16
@joecheuk joecheuk changed the base branch from dev to inventory-optimization August 5, 2019 16:10
@joecheuk joecheuk merged commit b81c0e5 into inventory-optimization Aug 5, 2019
@joecheuk joecheuk deleted the requirements-update branch August 5, 2019 16:17
hshin-g added a commit that referenced this pull request Aug 8, 2019
* Replace mariadb version in Dockerfile (#2980)

Error from apt:
Package libmariadbclient18 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
  libmariadb3

* Fixing base image

* Normalize network name in FirewallRules when a short network name is supplied (#2979)

* Merge release-2.18.0 into dev (#2990)

* Incremented version to 2.18.0.

* Added missing comma in the iam roles list.

* use WatchedFileHandler to better handle log rotation of forseti.log (#2994)

* use WatchedFileHandler to better handle log rotation of forseti.log

* fix lint issue

* Add ability to specify server endpoint on CLI.

* CSCC API exceptions are now muted (#2987)

* cscc api exceptions are now muted

* not raising exception when finding is not created

* fixed pylint error

* nits

* Silence error if finding already exists

* Removed blank line

* addressed comments

* nit

* Fixed pylint error

* checking for error code instead of error message

* updated logger to log finding

* Add Deprecation Message to Python Installer (#2985)

* Add deprecation message to installer.

* remove import

* test

* re-enable pause

* add url for documentation

* remove the input

* Add Root Resources to the Inventory Summary (#3007)

* Add root resources to the inventory summary

* fix lint

* fix test

* Update export_assets method to take support different outputs. (#3013)

* Create helper functions to build correct Output Config message for
each supported destination.
* Update mixin to expect an OutputConfig dict instead of a destination
object.
* Add stub for bigquery destination pending support in v1 API.

* Allow users to exclude resources during the inventory phase. (#2997)

* Added resource exclusion in inventory module.

* Updated inventory config to default to empty list when exclude_resources is not specified.

* Added unit tests.

* Updates.

* Addressed PR comments.

* Mute 501 Not Implemented for Listing AppEngine Instances (#3014)

* Mute 501 Not Implemented for Listing AppEngine Instances

* disable lint

* tweak

* Fix Missing Group Members in Inventory (#3002)

* Fix missing group members in inventory

* add unit test

* add unit test

* add unit test

* Add Error Handling When Root Resource Is Not Configured Properly (#3008)

* Make it clear that the Explain must be disabled when starting the server.

* add error handling

* tweak error message

* tweak

* change check

* Speed up import of cloudasset data into temporary sql table.

* Strip unused data before writing to the database
* Switch from SQLAlchemy ORM to Core style bulk CAI inserts
* Ensure the maximum number of rows are written on each bulk insert

This is one of a series of changes to reduce the time to complete
a forseti inventory snapshot for large organizations.

This change reduces the time to import data into a cloudsql database
from a test system by about 50%, from about 312 seconds to about 144
seconds for around 350 megabytes of raw data.

* Ignore unactionable ResourceWarning messages in tests.

* Update CAI temporary table read methods to use SQLAlchemy Core.

* Remove per thread sessions from CAI crawler implementation
* Remove ORM overhead when reading assets from CAI temporary table

This is part of a series of changes to reduce the run time of the
forseti inventory crawler for large organizations.

* Move the CAI temporary table to a local sqlite database.

* This reduces network load by 50% by reducing round trips to the
cloudsql server, removing 700MB of network traffic when 350MB of test
assets is imported.
* Total time to import data into the sqlite database is further reduced
from 150 seconds to 30 seconds for 350MB of test assets.
* Temporary file is cleaned up at the end of inventory, reducing storage
requirements on server.

* Further optimize sqlite database interface.

* Set pragmas to speed up writes and reads. Can write 350MB in 11
seconds on test system (down from 330 seconds for version 2.18 baseline).

* Optimize per thread reading, can complete over 100 queries per second
on 10 threads with an 8 CPU virtual machine. Queried 5000 project
cloudsql instances in 45 seconds on test VM.

* Fix db_migrator and lint issue.

* Pass thread count consistently across constructors.

* Update dependent google python libraries to latest versions. (#3021)

* Stream CloudAsset data straight to sqlite database.

* Change cloudasset implementation to stream data from GCS to the local
sqlite database using OS pipes instead of downloading the full file to a
temporary location and then reading it from there.

This change reduces storage requirements on Forseti Server and allows
data to be processed only once instead of writing and rereading each row
of data from the CloudAsset export.

* Fix flake.

* Increase max asset name size to 2048 and add additional comments.

* Remove extra imports in test.

* Move methods to read from the inventory table into DataAccess class.

This cleans up the base Storage class to focus on efficient writing of
data into the Inventory table.

* Refactored inventory to remove global write lock on database.

* Use SQLAlchemy Core for writing inventory rows to Cloud SQL.
* Removed all updates to inventory data, each row is only written once.
* Added full_name to the inventory table for each resource.
* Created a memory cache for written resources, detects and skips duplicate full names. This introduces a small global lock on updating the cache, but is less than a millisecond on average.
* Ensured warning messages written to the inventory index table always contain the full context.
* Fix external project scanner to use new DAO for inventory data.

* Fix flakes.

* Fix docstring.

* Address comments.

* Update Forseti version (#3047)
hshin-g added a commit that referenced this pull request Oct 10, 2019
* Incremented version to 2.18.0.

* Added missing comma in the iam roles list.

* Speed up import of cloudasset data into temporary sql table.

* Strip unused data before writing to the database
* Switch from SQLAlchemy ORM to Core style bulk CAI inserts
* Ensure the maximum number of rows are written on each bulk insert

This is one of a series of changes to reduce the time to complete
a forseti inventory snapshot for large organizations.

This change reduces the time to import data into a cloudsql database
from a test system by about 50%, from about 312 seconds to about 144
seconds for around 350 megabytes of raw data.

* Ignore unactionable ResourceWarning messages in tests.

* Update CAI temporary table read methods to use SQLAlchemy Core.

* Remove per thread sessions from CAI crawler implementation
* Remove ORM overhead when reading assets from CAI temporary table

This is part of a series of changes to reduce the run time of the
forseti inventory crawler for large organizations.

* Move the CAI temporary table to a local sqlite database.

* This reduces network load by 50% by reducing round trips to the
cloudsql server, removing 700MB of network traffic when 350MB of test
assets is imported.
* Total time to import data into the sqlite database is further reduced
from 150 seconds to 30 seconds for 350MB of test assets.
* Temporary file is cleaned up at the end of inventory, reducing storage
requirements on server.

* Further optimize sqlite database interface.

* Set pragmas to speed up writes and reads. Can write 350MB in 11
seconds on test system (down from 330 seconds for version 2.18 baseline).

* Optimize per thread reading, can complete over 100 queries per second
on 10 threads with an 8 CPU virtual machine. Queried 5000 project
cloudsql instances in 45 seconds on test VM.

* Fix db_migrator and lint issue.

* Pass thread count consistently across constructors.

* Update dependent google python libraries to latest versions. (#3021)

* Stream CloudAsset data straight to sqlite database.

* Change cloudasset implementation to stream data from GCS to the local
sqlite database using OS pipes instead of downloading the full file to a
temporary location and then reading it from there.

This change reduces storage requirements on Forseti Server and allows
data to be processed only once instead of writing and rereading each row
of data from the CloudAsset export.

* Fix flake.

* Increase max asset name size to 2048 and add additional comments.

* Remove extra imports in test.

* Move methods to read from the inventory table into DataAccess class.

This cleans up the base Storage class to focus on efficient writing of
data into the Inventory table.

* Refactored inventory to remove global write lock on database.

* Use SQLAlchemy Core for writing inventory rows to Cloud SQL.
* Removed all updates to inventory data, each row is only written once.
* Added full_name to the inventory table for each resource.
* Created a memory cache for written resources, detects and skips duplicate full names. This introduces a small global lock on updating the cache, but is less than a millisecond on average.
* Ensured warning messages written to the inventory index table always contain the full context.
* Fix external project scanner to use new DAO for inventory data.

* Fix flakes.

* Fix docstring.

* Address comments.

* Update Forseti version (#3047)

* + Add _iter_bigquery_tables func
+ Add bigquery.googleapis.com/Table

* + Add mock data

* Revert "Merge branch 'inventory-optimization-3035' into inventory-optimization"

This reverts commit babc2ef, reversing
changes made to 52d0a0e.

* Move inventory warning messages to a child table of inventory_index.

This change allows warnings to be written in parallel from multiple
threads, and removes the need to update the existing inventory_index row
for each new warning.

Warnings are loaded along side the inventory index through the DAO.

* Add a warnings_count virtual column to InventoryIndex.

The warnings_count field is more efficient when code just needs to check
if warnings exist.

Added additional comments to inventory tables to make the current usage
of various columns more explicit.

* Add full resource name field to output warning messages.

* Add Bigquery table data (#3057)

* + Add Bigquery table data

* + Update test for bigquery table

* Reformatted code.

* Ensure errors during inventory run are handled correctly.

* Make sure errors are recorded in inventory index table.
* Make sure errors are handled by the service without raising grpc
exception.
* Add a unit test to verify functionality.

* Allow inventory module to ingest GCS dump files directly. (#3065)

* Added dump file path variable in config file.

* Removed unncessary changes.

* Removed unused variables in docstring.

* Addressed PR comments.

* updates

* updates.

* Removed unused import.

* [CAI] Add compute.googleapis.com/RegionDisk from CAI to Forseti Inventory (#3073)

* + Add try except block for CAI export

* pylint fix

* Added locking before incrementing object count. (#3080)

* Updated to use the engine execute method on rollback / commit. (#3105)

* Updated to use the execute method on rollback / commit.

* Updates

* Added additional checks for FAILURE status on commit.

* flake8 format updates.

* Added expunge to inventory index object.

* updates.

* updates.

* Added extra log statement.

* Add Cloud Profiler to Forseti server (#3113)

* Update python base image

* Update python base image

* explicit package vwersion, pylint fixes

* pylint fixes, nits

* Add cloud profiler to optional packages, add try-except for cloud profiler import

* Remove cloud profiler from requirements.txt

* Removed all cai fallback. (#3116)

* Removed all cai fallback.

* updates.

* Added command to remove tmp files prefixed with forseti-cai. (#3129)

* Added command to remove tmp files prefixed with forseti-cai.

* updates.

* updates completed_at_datetime with a utc timestamp. (#3130)

* Commit instead of flush while building the data model.

* Commit once every 500k resources flushed.

* Fix Flaky Firewall Test (#3118)

* Fix flaky test

* fix replay test

* add comment

* add print

* add print

* add print

* add print

* remove print

(cherry picked from commit cde7b83)

* Fix Flaky Replay Test (#3119)

* test

* test

* fake request.uri

* fix in test

* revert replay.py

* add comment

(cherry picked from commit 8d3d162)

* Updated to commit once every 100k rows flushed. (#3154)

* commit per scanner run.

* Commit per scanner run (#3158)

* Commit per resource type, use add() instead of add_all when inserting to the session.

* Check for role uniqueness properly.

* Updated to use the .get() method when retrieving values in a dict in method _convert_role().

* Added comprehensive debug messages.

* Updated to use resource count instead of flush count when logging.

* Added collation to name column in Role table to make sure the column is case sensitive.

* Updated binding table to reference to role name column with the correct type.

* + Add Service Usage API

* pylint fixes

* Added paging for config validator scanner. (#3191)

* Updated config validator scanner to page the input & output.

* Updated max page size to 3.5 mb.

* Updated the mechanism to estimate dictionary size.

* Added debug log statement.

* Updated to audit once every 50 MB.

* Add unittests

* Nit changes

* Updated scanner_iter method to use query.slice() instead of yield_per and increased the cv audit size to 100mb per. (#3244)

* Use query.slice() instead of yield_per to avoid losing connection to mysql server.

* updates

* Updated to to audit every 100mb.

* Updated block size to 4096.

* Merge changes from RC (#3253)

* commit per scanner run.

* Commit per resource type, use add() instead of add_all when inserting to the session.

* Check for role uniqueness properly.

* Updated to use the .get() method when retrieving values in a dict in method _convert_role().

* Added comprehensive debug messages.

* Updated to use resource count instead of flush count when logging.

* Added collation to name column in Role table to make sure the column is case sensitive.

* Updated binding table to reference to role name column with the correct type.

* + Add Service Usage API

* pylint fixes

* Add unittests

* Nit changes

* Updated iter() method in storage.py to use slice() instead of yield_p… (#3255)

* Updated iter() method in storage.py to use slice() instead of yield_per().

* updates.

* Lowercased generator in docstring.

* Removed page_query usage in storage.py (#3271)

* Update sizes of columns (#3267)

* Update sizes of columns

* Update sizes of columns

* Updated to use yield_per for all the scanners except CV. (#3274)

* Fix bigtable CAI resources (#3297)

* Fix bigtable CAI resources

* lint fixes

* Fix CAI Disk resource (#3298)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants