In [1]:
import sys
sys.path.append('../src')

import os
import pickle
import numpy as np
from importlib import reload
from glob import glob
from joblib import Parallel, delayed
from IPython.display import Markdown

from geoq import gemini


  from .autonotebook import tqdm as notebook_tqdm


## Embeddings of a random chip

In [2]:
descriptions_dir = '/datasets/genai-geo-embeddings/chips'
files = glob(f'{descriptions_dir}/*.pkl')
len(files)

48313

In [3]:
file = np.random.choice(files)
with open(file, 'rb') as f:
    z = pickle.load(f)

Markdown(z['description'])

Here is a detailed textual description of the satellite image.

### Dominant Land Cover
The image is dominated by a mixture of agricultural and barren land.
*   **Agricultural Land:** Covers approximately 60% of the image. It is characterized by a patchwork of rectangular fields, most of which are a reddish-brown or dark brown color, suggesting fallow or dry soil. A significant portion of these fields, particularly in the eastern half, are stark white.
*   **Barren/Desert Land:** Constitutes about 35% of the view. This land is light tan or beige in color with a soft texture, appearing as sand or dry, eroded soil. It is most prominent in a large, irregular swath on the right side of the image and in patches encroaching on the agricultural areas.
*   **Water Bodies / Salt Flats:** Make up the remaining 5%, visible as small, scattered features.

### Terrain
The terrain is predominantly a flat plain, characteristic of a river valley or an alluvial fan in an arid region. There are no significant elevations like mountains or hills. The main topographical variation comes from:
*   **Type:** Low-lying sandy areas, possibly small dunes or aeolian deposits.
*   **Location:** These features are most concentrated in a wide, irregular band running from the bottom-center towards the upper-right. They also appear as smaller incursions into the agricultural land from the edges.
*   **Shape and Attributes:** The sandy areas have soft, rounded, and sometimes lobe-like shapes that contrast sharply with the geometric agricultural grid. Their texture appears fine-grained and wind-swept.

### Vegetation
Visible vegetation is almost exclusively agricultural, with very little natural vegetation apparent.
*   **Type:** Cultivated fields.
*   **Location:** The fields are organized in a grid-like pattern across most of the image, except for the large barren swath on the right.
*   **Extent:** The patchwork is dense, with fields of various rectangular sizes.
*   **Health and Appearance:** The majority of the fields are reddish-brown, indicating they are likely fallow, recently tilled, or contain dry vegetation. There is no visible lush green vegetation.
*   **Patterns:** The most distinctive pattern is the presence of numerous bright white rectangular plots. These are likely plastic-covered greenhouses or fields with heavy salt encrustation (salinization).

### Water Bodies
Water features are small and sparse.
*   **Type:** Small ponds or salt flats.
*   **Location:** The most distinct feature is located in the upper-central part of the image. It has a small area of turquoise-green water surrounded by a larger, bright white crust, indicative of a salt pan or evaporation pond. Other smaller, irregularly shaped white patches scattered across the landscape may be ephemeral water bodies or salt deposits.
*   **Type:** A narrow canal or drainage channel.
*   **Location:** A thin, dark, meandering line can be seen winding from the central area towards the upper right, crossing through both agricultural and barren lands.

### Man-Made Structures
Human activity is extensive and clearly defines the landscape.
*   **Type:** Agricultural infrastructure (fields, greenhouses/salt pans, canals).
*   **Location:** The rectangular fields form a widespread grid. The bright white rectangular plots (likely greenhouses) are particularly concentrated in the eastern half of the image.
*   **Arrangement:** The fields create a distinct rectilinear pattern.
*   **Type:** Transportation corridor and settlement.
*   **Location:** A prominent, dense, linear feature runs vertically through the center of the image. This appears to be a primary road or canal lined with a dense concentration of small buildings, greenhouses, or other structures. A small, more clustered settlement is visible on the far-left edge of the frame. A much thinner road or track cuts diagonally across the upper-right quadrant.

### Geological Features
*   **Type:** Aeolian (wind-blown) deposits.
*   **Location:** These sandy deposits form the large, light-tan areas, most notably on the right side of the image.
*   **Shape, Color, and Texture:** They are light tan with a soft, mounded texture, appearing to encroach upon the darker, developed agricultural land.
*   **Type:** Soil Salinization.
*   **Location:** Evidenced by the bright white crusts around the water body in the upper-center and potentially on the surface of many rectangular fields. This is a common geological process in arid, irrigated regions.

### Other Distinctive Features
The image starkly illustrates the interaction and conflict between human land use and a natural arid environment. There is a clear demarcation between the intensively cultivated and irrigated areas (brown and white plots) and the encroaching tan-colored desert. The central, heavily developed vertical corridor acts as a spine of human activity within this landscape.

---
### Coverage estimation
```json
{
  "Agricultural Land (brown soil)": "45%",
  "Barren/Sandy Land": "35%",
  "Agricultural Land (white plots/greenhouses)": "15%",
  "Water Bodies/Salt Flats": "5%"
}
```
### Geographical location
```json
{
    "plus_code": "6GQX+P2",
    "political": "Xinhe County, Xinjiang",
    "locality": "Aksu Prefecture",
    "administrative_area_level_1": "Xinjiang",
    "country": "China",
    "coords": {
        "lon": "82.5475",
        "lat": "41.2393"
    }
}
```
    

In [4]:
z['text_embedding_model'], z['description_model']

('gemini-embedding-001', 'gemini-2.5-pro')

In [5]:
gem = gemini.GeminiMultimodalModel(api_key='../../secrets/gemini.txt', verbose=True, generation_model_name='gemini-2.5-flash-lite')
emb = gem.get_embedding(z['description'])

[32m2025-08-10 22:46:05.389[0m | [1mINFO    [0m | [36mgeoq.gemini[0m:[36m__init__[0m:[36m115[0m - [1musing generation model gemini-2.5-flash-lite[0m
[32m2025-08-10 22:46:05.390[0m | [1mINFO    [0m | [36mgeoq.gemini[0m:[36m__init__[0m:[36m116[0m - [1musing embeddings model gemini-embedding-001[0m
[32m2025-08-10 22:46:05.391[0m | [1mINFO    [0m | [36mgeoq.gemini[0m:[36m__init__[0m:[36m117[0m - [1musing config {'temperature': 1, 'top_p': 0.95, 'max_output_tokens': 8192, 'response_mime_type': 'text/plain'}[0m


In [6]:
emb, z['text_embedding']

(array([ 0.00656826,  0.01263861,  0.00414742, ..., -0.0055393 ,
        -0.01376249, -0.01900553], shape=(3072,)),
 array([ 0.00656826,  0.01263861,  0.00414742, ..., -0.0055393 ,
        -0.01376249, -0.01900553], shape=(3072,)))

In [7]:
z.keys()

dict_keys(['chipset_id', 'chip_id', 'season', 'description', 'img', 'geometry', 'image_embedding', 'text_embedding', 'lonlat', 'description_model', 'text_embedding_model'])

## Get the embeddings of a single description

In [8]:
gem = gemini.GeminiMultimodalModel(api_key='../../secrets/gemini.txt', verbose=True, generation_model_name='gemini-2.5-flash-lite')

[32m2025-08-10 22:46:12.295[0m | [1mINFO    [0m | [36mgeoq.gemini[0m:[36m__init__[0m:[36m115[0m - [1musing generation model gemini-2.5-flash-lite[0m
[32m2025-08-10 22:46:12.296[0m | [1mINFO    [0m | [36mgeoq.gemini[0m:[36m__init__[0m:[36m116[0m - [1musing embeddings model gemini-embedding-001[0m
[32m2025-08-10 22:46:12.297[0m | [1mINFO    [0m | [36mgeoq.gemini[0m:[36m__init__[0m:[36m117[0m - [1musing config {'temperature': 1, 'top_p': 0.95, 'max_output_tokens': 8192, 'response_mime_type': 'text/plain'}[0m


In [36]:
def f(file):
    with open(file, 'rb') as f:
        z = pickle.load(f)    

    gem = gemini.GeminiMultimodalModel(api_key='../../secrets/gemini.txt', verbose=False, generation_model_name='gemini-2.5-flash-lite')        
    z['text_embedding'] = gem.get_embedding(z['description'])
    z['text_embedding_model'] = gem.embeddings_model_name

    with open(file, 'wb') as f:
        pickle.dump(z, f)

In [37]:
_ = Parallel(n_jobs=100, verbose=5)(delayed(f)(file) for file in files)

[Parallel(n_jobs=100)]: Using backend LokyBackend with 100 concurrent workers.
[Parallel(n_jobs=100)]: Done  88 tasks      | elapsed:    1.6s
[Parallel(n_jobs=100)]: Done 250 tasks      | elapsed:    5.7s
[Parallel(n_jobs=100)]: Done 448 tasks      | elapsed:   10.7s
[Parallel(n_jobs=100)]: Done 682 tasks      | elapsed:   16.8s
[Parallel(n_jobs=100)]: Done 952 tasks      | elapsed:   23.6s
[Parallel(n_jobs=100)]: Done 1258 tasks      | elapsed:   31.5s
[Parallel(n_jobs=100)]: Done 1600 tasks      | elapsed:  1.3min
[Parallel(n_jobs=100)]: Done 1978 tasks      | elapsed:  1.5min
[Parallel(n_jobs=100)]: Done 2392 tasks      | elapsed:  1.7min
[Parallel(n_jobs=100)]: Done 2842 tasks      | elapsed:  2.5min
[Parallel(n_jobs=100)]: Done 3328 tasks      | elapsed:  2.7min
[Parallel(n_jobs=100)]: Done 3850 tasks      | elapsed:  3.4min
[Parallel(n_jobs=100)]: Done 4408 tasks      | elapsed:  3.6min
[Parallel(n_jobs=100)]: Done 5002 tasks      | elapsed:  4.5min
[Parallel(n_jobs=100)]: Done 5

In [20]:
ze

array([-0.01055726,  0.01688335, -0.00482449, ...,  0.00298951,
       -0.01554386, -0.00740291], shape=(3072,))