# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.



# Get statistics on number of observations per asset and per snapshot




This notebook is designed to analyze asset and observation data from Imagery Insights. It performs the following key tasks:

1.  **Data Retrieval**: Connects to a specified BigQuery project and dataset to fetch `all_assets` and `all_observations` tables.
2.  **SQL Query Construction**: Formulates a SQL query to join the asset and observation data, count unique observations per asset and snapshot, and create geographical points from location coordinates.
3.  **Data Processing**: Reads the query results into a pandas DataFrame for in-memory analysis.
4.  **Statistical Calculation**: Computes overall statistics, including the total number of unique assets, observations, and snapshots. It also determines the distribution of observations per asset.
5.  **Data Visualization**: Generates a pie chart to visually represent the distribution of assets based on their number of observations.
6.  **Tabular Output**: Formats and displays key statistics and data samples in a clear, tabular format.

In [None]:
import pandas_gbq

# Define project ID and dataset ID
project_id = "" # @param {type:"string"}
dataset_id = "" # @param {type:"string"}

### Construct the SQL query with variables
 This query joins the 'all_assets' and 'all_observations' tables
 to count unique observations per asset and snapshot,
 and creates a GEOGRAPHY POINT object from location coordinates.

In [36]:

sql_query = f"""

SELECT
  t1.asset_id,
  t1.snapshot_id,
  t1.location,
  ST_GEOGPOINT(t1.location.longitude, t1.location.latitude) AS latlong,
  COUNT(DISTINCT t2.observation_id) AS count_of_unique_observations
FROM
  `{project_id}`.`{dataset_id}`.`all_assets` AS t1
INNER JOIN
  `{project_id}`.`{dataset_id}`.`all_observations` AS t2
ON
  t1.asset_id = t2.asset_id
  AND t1.snapshot_id = t2.snapshot_id
GROUP BY
  t1.asset_id,
  t1.snapshot_id,
  t1.location;
"""



## Print overall statsitics of dataset

This cell is responsible for printing various statistical summaries and parts of the DataFrame in a formatted, human-readable way. It presents overall statistics, the distribution of assets by observation count, the head of the DataFrame, and the asset-snapshot observation counts.


In [None]:
print("--- Overall Statistics ---")
print(f"Total unique assets: {total_assets:,}")
print(f"Total unique snapshots: {total_snapshots:,}")
print("--- Distribution of Assets by Number of Observations ---")
for num_observations, count in observations_per_asset.items():
    print(f"  {count:,} assets have {num_observations:,} observation(s)")


### Pie chart visualization

This cell imports necessary libraries, defines BigQuery project and dataset IDs, constructs an SQL query to retrieve and process asset and observation data, and then performs statistical calculations and generates a pie chart to visualize the distribution of observations per asset.

In [None]:


# Prepare data for the pie chart
pie_labels = observations_per_asset.index.astype(str) + ' observation(s)'
pie_sizes = observations_per_asset.values

# Create a pie chart
plt.figure(figsize=(10, 8))
plt.pie(pie_sizes, labels=pie_labels, autopct='%1.1f%%', startangle=140)
plt.title('Distribution of Assets by Number of Observations')
plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()

 ## Create a new column combining asset_id and snapshot_id
 This cell processes the DataFrame to create a combined 'asset_snapshot_id' column and then displays a table showing these combined IDs along with the count of unique observations. This is intended to provide a clear, row-by-row view of observation counts per asset and snapshot.


In [None]:

df['asset_snapshot_id'] = df['asset_id'] + ':' + df['snapshot_id']

# Select and rename the relevant columns for the desired output
output_df = df[['asset_snapshot_id', 'count_of_unique_observations']].copy()
output_df.rename(columns={'count_of_unique_observations': 'number of observations,count'}, inplace=True)

# Print the full table
display(output_df)

In [None]:
# dataframe: output_df
# output_variable:

import google.colabsqlviz.explore_dataframe as _vizcell
_vizcell.explore_dataframe(output_df)