## Compute Services

### Compute Engine
- Virtual Machines
- Linux/Windows server

### Kubernetes Engine
- Cluster of Servers
- Cluster of Containers

### Cloud Run
- Serverless service for containerized applications that are stateless
- it can deploy a service or a job
- a service continues to run (is available) e.g., an API call that provides weather data
- a job runs on intervals like, every night.
- every Cloud Run Service is provided with an HTTPS endpoint on a unique subdomain of the `.run.app` domain.
- Completely autoscalable
- built-in traffic management

Cloud Run is focused on container-based development, allowing you to run applications serving multiple endpoints on a larger scale and with fewer architectural restrictions.

### Cloud Functions
- Event driven serverless functions
- It uses Trigger for this like, a Pub/Sub service. Pub/Sub is a messaging service that enables exchange of information between services that are working independently. When a message is written to a Pub/Sub topic, actions can be taken by Cloud Functions.

## Storage Services

### Cloud Storage - Object storage
Serverless and managed service with low cost and high scalability

Use cases
- Cloud native applications
- Analytics and ML
- Backup and archive
- Media

Objects in Cloud Storage 
- Bucket: An abstraction for organizing folders and objects or files. A bucket is a top level point of entry into object storage for the data that we want to organize as a unit. 

Commands
- list files: `gsutil ls` ~ `gcloud storage ls`
- `gcloud storage` is more optimized for larger data.
- copy files: `gsutil cp <'gs://path/to/file'> <path/to/dir>`
- delete file: `gsutil rm`

Storage Classes
- Standard
- Nearline
- Coldline
- Archive  
Storage class can be specified by creating a `Rule` under `Lifecycle` menu. It is unidirectional and can be lowered only but not raised.

Protection menu
- Object Versioning
- Retention Policy


### Persistent Disks - Block storage
Used with VMs and containers

### Database Storage
- Relational databases: Cloud SQL, Cloud Spanner
- NoSQL databases: Cloud Firestore, BigTable
- Analytical database: BigQuery

**Cloud SQL**
- MySQL
- PostgrSQL
- SQl Server

**Cloud Firestore**: serverless Document database
- Collection - analogous to tables
- Documents - analogous to rows
- Fields - analogous to columns or attributes

## Data Services
- BigQuery - Warehouse with BI and analytics (ad-hoc like querying) and ML service
- Dataflow: managed service based on Apache Beam for Batch and Stream processing of data
- Dataproc: managed service, supports both Apache Hadoop and Spark

## Most widely used cammands
- `gcloud`: for most google cloud services
- `bq`: specific for BigQuery
- `kubectl`: Kubernetes Clusters
- `gsutil`: Google cloud storage

## Google Kubernetes Engine
- managed service to deploy and use kubernetes clusters
- allows to orchestrate a large number of containers that work together.
- allows to run stateful applications in containers.

Key Concepts
- Container: lightweight and isolated enironment to package an application with configuration needed to run it.
- Pod: In distributed applications that use containers, containers often work closely together and have very similar lifecycles. So those sets of highly integrated, tightly coupled containers are treated as a single unit. This abstraction in Kebernetes is called a Pod. A pod may contain one or more containers.
- Node: Pods are deployed to a resource known as a Node. Nodes can run on virtual machines or physical servers. A node can have multiple pods running on them.
- Cluster: A set of nodes that run pods are known as a Cluster.

A Kubernetes cluster allows to manage multiple nodes or multiple servers as a single logical unit. This is advantageous in designing systems that are highly available and highly scalable.

Nodes can be configured into a Node Pool. A node pool is a set of nodes that are similarly configured and often used to configure certain kinds of nodes and then ensuring that certain pods run on those nodes. E.g, GPUs for ML tasks.

### Cluster Modes
- Autopilot: Google cloud manages the cluster and size, and add nodes as needed. It does not allocate any infrastructure until it is given a workload.

- Standard: Cluster is managed by the user. Some infrastructure is allocated when it is created. It allows to configure node pools.

When we create nodes or node pools, VMs are created on compute engine. These VMs are managed by GKE clusters.

### Monitor Kubernetes cluster states
Observability menu shows metrics and logs.
- cyan - info
- yellow - warning
- red - eror

## BigQuery
Managed serverless data warehousing and analytics platform which is constantly being expanded. Now it includes support for ML using SQL.

- Serverless data warehouse
    - Petabyte scale
    - Uses SQL but not a relational database
    - Analytical database

- Other features
    - BigQuery ML for machine learning
    - BigQuery BI Engine for high performance ad hoc querying
    - BigQuery GIS for geographic information systems
    - BigQuery Omni, a version of BigQuery that can run in other clouds

**Ways to Query data in BigQuery**
- SQL GUI
- bq command
- Storage API (to pull data into your platform or framework)
    - Spark
    - Tensorflow
    - Dataflow
    - Pandas
    - Scikit-learn

BigQuery uses this convention when it's naming tables (for querying using SQL). It starts with the project name and then follow that by the dataset name and then finally comes the table. It uses a 'dot' in between.

Create dataset using table with a specific view:
1. Create a dataset by clicking three stacked dots.
2. Use following code in SQL GUI:
```sql
CREATE VIEW dataset_name.view_name AS
SELECT
    column_1, column_2, column_3
FROM
    project_id.dataset_name.table_name
```


### BigQuery ML Model Types
- Logistic Regression

- Kmeans
- PCA

- AutoML Classifier
- Boosted Tree Classifier
- Random Forest Classifier

- ARIMA Plus  

and more ...

**Create and use a ML model**  

- BigQuery has extended SQL to include a new statement called `CREATE MODEL`

```sql
CREATE MODEL gce_bqml.gce_model_1
OPTIONS(model_type='logistic_reg') AS
SELECT
    IF(totals.transactions IS NULL, 0, 1) AS label,
    IFNULL(device.operatingSystem, "") AS os,
    device.isMobile AS is_mobile,
    IFNULL(geoNetwork.country, "") AS country,
    IFNULL(totals.pageviews, 0) AS pageviews
FROM
    bigquery-public-data.google_analytics_sample.ga_sessions_*      
WHERE
    _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'
```

> \* is a wild card which allows to query from several tables that have this pattern which is restricted by the `WHERE` clause.


- It has also provided with `ML` a library of functions related to ML like `EVALUATE`.

```sql
SELECT
    *
FROM
    ML.EVALUATE(MODEL gce_bqml.gce_model_1, (
SELECT
    IF(totals.transactions IS NULL, 0, 1) AS label,
    IFNULL(device.operatingSystem, "") AS os,
    device.isMobile AS is_mobile,
    IFNULL(geoNetwork.country, "") AS country,
    IFNULL(totals.pageviews, 0) AS pageviews
FROM
    bigquery-public-data.google_analytics_sample.ga_sessions_*      
WHERE
    _TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
```

## IAM
**Identity Types**  
- Google account: A gmail account associated with google cloud
- Google group: A group of accounts that need same level of privileges are added to a group and a role or permissions are given to the group
- Google Workspace or Cloud Identity: Identity created by an organization using Cloud Identity service
- Service account: to give roles or permissions to a service to do certain things without the need of a username and password

**Access Management**  
- Identities granted access to resources
- Permission specify allowed operations
- Permissions are assigned to roles
- Roles are assigned to identities

**3 Kind of Roles**  
- Predefined
- Custom
- Basic

- Principle: A term for an identity for a person or a service account that can do certain things. In the google cloud, email addresses (like, 837782-compute@developer.gserviceaccount.com) are used a way to identify them.

- Naming convention for Permission: service_name.resource_name.operation_name

## Networking
**VPC (Virtual Private Cloud)**  
A partioned off section of the Google cloud infrastructure that is managed by users. Networking in google cloud is thought in terms of virtual private clouds.

- Flow logs: Networking logs, helpful for debugging networking issues, can get pretty big, pretty fast and can incur huge cost on storage.

Terms
- Network
- Subnet
- CIDR (Classless Internet Routing Domains): Specify internal addresses that are private to that subnet.

## Artifact and Container Registry
- A single registry for storing and managing packages and container images
    - Integrates Artifact Registry with Google Cloud CI/CD services
    - Deploy artifacts to Google Cloud compute resources

- Protect software supply chain
    - Scan for container vulnerabalities
    - Enforce deployment policies with Binary Authorization

- Create multiple regional repositories within a project

> A registry is a location to store and manage packages, images and libraries.

Artifact Registry integrates the components it keeps with different continuous integration and continuous deployment services that Google Cloud has, like Cloud Build. It allows to have a central place to perform operations like, scan for vulnerabilities or define ploicies for additional security controls, on container images.

## Cloud Build Repositories
- Create connnections to source code repositories
    - GitHub
    - GitHub Enterprise
    - GitLab Enterprise Edition
    - Bitbucket Server
    - Bitbucket Data Center

- Create connections to source code repositories
    - 1st Generation - manual, PubSub, Webhook triggers
    - 2nd Generation - Terraform integration

> Artifact Registry is designed to manage objects or components for an application. And then Cloud Build is used for assembling those components together and building deployable images that can run in Cloud Run or Kebernetes Engine.