# Wikidata Item Creation Example (GKC)

This notebook demonstrates the end-to-end item creation workflow using GKC:

1. Load a mapping configuration
2. Load source data
3. Transform records to Wikidata JSON
4. Optionally validate against ShEx
5. Submit to Wikidata (or dry-run)

## Prerequisites

- Set environment variables for your bot credentials:
  
  ```bash
  export WIKIVERSE_USERNAME="YourUsername@YourBot"
  export WIKIVERSE_PASSWORD="your_bot_password"
  ```

- The mapping configuration and source data are embedded directly in this notebook for portability.

If you want to swap in your own mapping/data later, update the `mapping_config` and `source_data` definitions in the **Setup** section.

## Setup

Import the modules and define inline mapping/data so this notebook runs without external files.

In [None]:
import json

from gkc import WikiverseAuth
from gkc.bottler import DataTypeTransformer, Distillate
from gkc.spirit_safe import SpiritSafeValidator

# Inline mapping configuration (trim or replace with your own as needed)
mapping_config = {
    "$schema": "https://example.com/gkc/mapping-schema.json",
    "version": "1.0",
    "metadata": {
        "name": "Federally Recognized Tribe Mapping",
        "description": (
            "Maps tribal data to Wikidata items conforming to EntitySchema E502"
        ),
        "author": "GKC",
        "created": "2024-01-15",
        "entity_schema_id": "E502",
        "target_entity_type": "Q7840353",
    },
    "reference_library": {
        "stated_in_federal_register": [
            {
                "property": "P248",
                "value": "Q106648236",
                "datatype": "wikibase-item",
                "comment": "Stated in: Federal Register source",
            },
            {
                "property": "P813",
                "value": "current_date",
                "datatype": "time",
                "comment": "Retrieved date",
            },
        ],
        "stated_in_with_url": [
            {
                "property": "P248",
                "value_from": "source_reference_qid",
                "datatype": "wikibase-item",
            },
            {
                "property": "P854",
                "value_from": "source_url",
                "datatype": "url",
                "comment": "Reference URL",
            },
        ],
    },
    "qualifier_library": {
        "point_in_time": [
            {
                "property": "P585",
                "source_field": "point_in_time_date",
                "datatype": "time",
                "comment": "Point in time qualifier",
            }
        ]
    },
    "mappings": {
        "labels": [
            {"source_field": "tribe_name", "language": "en", "required": True}
        ],
        "aliases": [
            {
                "source_field": "tribe_name_alt",
                "language": "en",
                "required": False,
            },
            {
                "source_field": "tribe_name_aliases",
                "language": "en",
                "required": False,
                "separator": ";",
                "comment": "Multiple aliases separated by semicolon",
            },
        ],
        "descriptions": [
            {
                "source_field": "description",
                "language": "en",
                "default": "Federally recognized tribe in the United States",
            }
        ],
        "claims": [
            {
                "property": "P31",
                "comment": "Instance of: Federally recognized tribe",
                "value": "Q7840353",
                "datatype": "wikibase-item",
                "required": True,
                "references": [{"name": "stated_in_federal_register"}],
            },
            {
                "property": "P30",
                "comment": "Continent: North America",
                "value": "Q49",
                "datatype": "wikibase-item",
            },
            {
                "property": "P17",
                "comment": "Country: United States",
                "value": "Q30",
                "datatype": "wikibase-item",
            },
            {
                "property": "P1705",
                "comment": "Native label",
                "source_field": "native_name",
                "datatype": "monolingualtext",
                "required": False,
                "transform": {
                    "type": "monolingualtext",
                    "language_from": "native_language_code",
                },
                "references": [{"name": "stated_in_federal_register"}],
            },
            {
                "property": "P571",
                "comment": "Inception (date of establishment)",
                "source_field": "established_date",
                "datatype": "time",
                "required": False,
                "transform": {"type": "iso_date_to_wikidata_time", "precision": "year"},
                "references": [{"name": "stated_in_federal_register"}],
            },
            {
                "property": "P2124",
                "comment": "Member count",
                "source_field": "member_count",
                "datatype": "quantity",
                "required": False,
                "transform": {"type": "number_to_quantity", "unit": "1"},
                "qualifiers": [
                    {
                        "property": "P585",
                        "comment": "Point in time",
                        "source_field": "member_count_date",
                        "datatype": "time",
                        "transform": {
                            "type": "iso_date_to_wikidata_time",
                            "precision": "day",
                        },
                    }
                ],
                "references": [{"name": "stated_in_with_url"}],
            },
            {
                "property": "P856",
                "comment": "Official website",
                "source_field": "website_url",
                "datatype": "url",
                "required": False,
                "qualifiers": [
                    {
                        "property": "P407",
                        "comment": "Language of work or name",
                        "value": "Q1860",
                        "datatype": "wikibase-item",
                    }
                ],
            },
            {
                "property": "P159",
                "comment": "Headquarters location",
                "source_field": "headquarters_qid",
                "datatype": "wikibase-item",
                "required": False,
                "qualifiers": [
                    {
                        "property": "P625",
                        "comment": "Coordinate location",
                        "source_field": "headquarters_coordinates",
                        "datatype": "globe-coordinate",
                        "required": False,
                        "transform": {
                            "type": "lat_lon_to_globe_coordinate",
                            "latitude_field": "headquarters_lat",
                            "longitude_field": "headquarters_lon",
                        },
                    },
                    {
                        "property": "P6375",
                        "comment": "Street address",
                        "source_field": "headquarters_address",
                        "datatype": "monolingualtext",
                        "required": False,
                        "transform": {"type": "monolingualtext", "language": "en"},
                    },
                ],
            },
        ],
    },
    "validation": {
        "pre_submit": True,
        "entity_schema": "E502",
        "fail_on_validation_error": True,
    },
    "notes": [
        "This mapping assumes source data includes Wikidata QIDs for reference items",
        "Headquarters location should be pre-resolved to a Wikidata item QID",
        "Member count dates should be in ISO format (YYYY-MM-DD)",
        "The 'source_reference_qid' field should contain the QID of the source document",
    ],
}

# Inline source data (replace or extend as needed)
source_data = [
    {
        "tribe_name": "Cherokee Nation",
        "tribe_name_alt": "Cherokee Nation of Oklahoma",
        "tribe_name_aliases": "CNO; United Keetoowah Band; Eastern Band of Cherokee Indians",
        "description": "Federally recognized tribe in Oklahoma",
        "native_name": "ᏣᎳᎩ ᎠᏰᎵ",
        "native_language_code": "chr",
        "established_date": "1839",
        "member_count": 450000,
        "member_count_date": "2023-01-01",
        "website_url": "https://www.cherokee.org/",
        "headquarters_qid": "Q986506",
        "headquarters_lat": 35.9149,
        "headquarters_lon": -94.8703,
        "headquarters_address": "17675 S. Muskogee Ave, Tahlequah, OK 74464",
        "source_reference_qid": "Q106648236",
        "data_source_qid": "Q106648236",
    },
    {
        "tribe_name": "Navajo Nation",
        "tribe_name_aliases": "Diné Bikéyah; Navajoland",
        "description": "Federally recognized tribe in the southwestern United States",
        "native_name": "Naabeehó Bináhásdzo",
        "native_language_code": "nv",
        "established_date": "1868-06-01",
        "member_count": 399494,
        "member_count_date": "2021-01-01",
        "website_url": "https://www.navajo-nsn.gov/",
        "headquarters_qid": "Q79848",
        "headquarters_lat": 35.6744,
        "headquarters_lon": -109.5505,
        "headquarters_address": "P.O. Box 9000, Window Rock, AZ 86515",
        "source_reference_qid": "Q106648236",
        "data_source_qid": "Q106648236",
    },
    {
        "tribe_name": "Choctaw Nation of Oklahoma",
        "tribe_name_aliases": "Choctaw Nation; CNO",
        "description": "Federally recognized tribe in Oklahoma",
        "native_name": "Chahta Yakni",
        "native_language_code": "cho",
        "established_date": "1830",
        "member_count": 223279,
        "member_count_date": "2022-01-01",
        "website_url": "https://www.choctawnation.com/",
        "headquarters_qid": "Q79876",
        "headquarters_lat": 34.0176,
        "headquarters_lon": -95.7719,
        "headquarters_address": "1802 Chukka Hina, Durant, OK 74701",
        "source_reference_qid": "Q106648236",
        "data_source_qid": "Q106648236",
    },
]

# Instantiate a distillate from the inline configuration
distillate = Distillate(mapping_config)

print(f"Loaded mapping with {len(mapping_config.get('mappings', {}).get('claims', []))} claims")
print(f"Loaded {len(source_data)} source records")

## Data Summary

Before running the workflow, take a quick look at what is embedded in the notebook: number of records, available fields, and a preview of the first record.

In [None]:
def summarize_source(records: list[dict]) -> dict:
    field_counts: dict[str, int] = {}
    for record in records:
        for key in record.keys():
            field_counts[key] = field_counts.get(key, 0) + 1
    return {
        "record_count": len(records),
        "field_counts": dict(sorted(field_counts.items())),
        "fields_per_record": sorted({key for record in records for key in record.keys()}),
    }

summary = summarize_source(source_data)

print("Source data summary")
print("-" * 60)
print(f"Records: {summary['record_count']}")
print(f"Fields ({len(summary['fields_per_record'])}): {', '.join(summary['fields_per_record'])}")

print("\nField coverage (count of records with field):")
for field, count in summary["field_counts"].items():
    print(f"  - {field}: {count}")

print("\nPreview of first record:")
print(json.dumps(source_data[0], indent=2))

## Example 1: Dry Run (Transform Without Submission)

This example transforms a single record into Wikidata JSON without submitting anything.

In [None]:
# Transform first record (dry run)
record = source_data[0]

print("Processing first record (dry run)...")
item_json = distillate.transform_to_wikidata(record)
print(json.dumps(item_json, indent=2))

## Example 2: Transform to Wikidata JSON

This example transforms one record and pretty-prints the JSON output.

In [None]:
record = source_data[0]

wikidata_json = distillate.transform_to_wikidata(record)

print("Resulting Wikidata JSON:")
print(json.dumps(wikidata_json, indent=2))

# Highlight aliases if present
aliases = wikidata_json.get("aliases", {}).get("en", [])
if aliases:
    print("\nAliases extracted from source data:")
    for alias in aliases:
        print(f"  - {alias.get('value')}")

## Example 3: Transform and Validate Against ShEx (Dry Run)

This example demonstrates ShEx validation (EntitySchema E502) using `SpiritSafeValidator`.

In [None]:
# Create validator for EntitySchema E502
validator = SpiritSafeValidator(eid="E502", qid="Q7840353")

print("Running validation (dry run)...")
validator.check()
print(f"Validation result: {validator.is_valid()}")

## Example 4: Batch Processing (Dry Run)

Process multiple records in a single batch and summarize the results.

In [None]:
print(f"Processing {len(source_data)} records (dry run)...")

results = []
for record in source_data:
    item_json = distillate.transform_to_wikidata(record)
    results.append({"record": record, "item_json": item_json})

print(f"\nTransformed: {len(results)}")

if results:
    print("\nSuccessfully processed:")
    for item in results:
        record = item["record"]
        print(f"  - {record.get('tribe_name', 'Unknown')}")

## Example 5: Actual Submission (Optional, Use With Care)

This section shows how to submit items to Wikidata for real. It is **disabled by default**.

- Set `RUN_LIVE = True` only if you understand the consequences
- Only run with bot credentials you control
- Consider testing in Wikidata's sandbox or with a dry run first

In [None]:
RUN_LIVE = False

if RUN_LIVE:
    auth = WikiverseAuth()
    if not auth.is_authenticated():
        print("No credentials found. Set WIKIVERSE_USERNAME and WIKIVERSE_PASSWORD.")
    else:
        print(f"Authenticating as: {auth.username}")
        try:
            auth.login()
            print("✓ Successfully logged in")
        except Exception as e:
            print(f"Login failed: {e}")
            raise

        record = source_data[0]
        print(f"Preparing item for: {record.get('tribe_name', 'Unknown')}")

        item_json = distillate.transform_to_wikidata(record)
        print("Ready to submit item JSON (submission client not included here).")

        auth.logout()
        print("Logged out")
else:
    print("Live submission is disabled. Set RUN_LIVE = True to enable.")

## Example 6: Datatype Transformation Examples

The `DataTypeTransformer` is responsible for converting values into Wikidata datavalue structures.

In [None]:
transformer = DataTypeTransformer()

print("1. Wikibase Item (QID):")
print(json.dumps(transformer.to_wikibase_item("Q7840353"), indent=2))

print("\n2. Quantity:")
print(json.dumps(transformer.to_quantity(450000), indent=2))

print("\n3. Time/Date:")
print(json.dumps(transformer.to_time("2023-01-01"), indent=2))

print("\n4. Monolingual Text:")
print(json.dumps(transformer.to_monolingualtext("Cherokee language", "en"), indent=2))

print("\n5. Globe Coordinate:")
print(json.dumps(transformer.to_globe_coordinate(35.9149, -94.8703), indent=2))

print("\n6. URL:")
print(json.dumps(transformer.to_url("https://www.cherokee.org/"), indent=2))

## Summary

You have walked through the full item creation workflow:

- Use inline mapping and data definitions
- Transform to Wikidata JSON
- Validate with ShEx (optional)
- Run dry-run batch processing
- Submit live items (guarded)
- Understand datatype conversions

### Next Steps

- Adapt `mapping_config` and `source_data` for your own project
- Add additional checks or validation for your domain-specific requirements
- Review the [Authentication Guide](../authentication.md) for production setup details
- Review the [Claims Map Builder](../claims_map_builder.md) documentation for mapping concepts