[WIP] merging the current exploratory work into the main exploratory repo #1

rdhyee · 2024-09-19T01:46:00Z

Work in progress.....

…leanup.

…amples API

…l notebook on geoparquet and duckdb

…the calculation of the distances of the cities are screwy though.

…oparquet_duckdb_tutorial.md

… the isample export

…rements.in

…=isamples_client to requirements.in

…working in Dockerfile

…quet.ipynb for GitHub copilot to make for easier parameterization of the map

…n on the iSamples archive parquet file in zenodo

…et file on zenodo

Created development guide including: - Poetry dependency management and testing commands - Three-tier client architecture overview (IsbClient, IsbClient2, ISamplesBulkHandler) - Jupyter notebook integration patterns - Docker environment setup for reproducible development - Key configuration constants and development patterns 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Updated README.md with architecture overview and geoparquet focus - Added STATUS.md documenting current WIP state and loose ends - Enhanced CLAUDE.md with API offline status and troubleshooting - Created examples/README.md with detailed notebook documentation - Added sample data files (geoparquet, parquet, Excel) - Included experimental notebooks and JavaScript exploration files - Added Node.js configuration files for multi-language development Focus shift: API-dependent workflows → offline-first geoparquet analysis Key notebooks: geoparquet.ipynb (lonboard viz), isample-archive.ipynb (DuckDB) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add node_modules/ and npm log files - Ignore .claude/ temporary files from Claude Code - Exclude Office temporary files (~$*.xlsx, etc.) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- package.json with plantuml-encoder dependency for diagram generation - Separate from root clasp dependency for Google Apps Script - Supports multi-tool JavaScript experimentation in different contexts 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- CROSS_REPO_ALIGNMENT.md: Comprehensive strategy for aligning isamples-python with isamplesorg.github.io companion repository - DATA_SOURCES.md: Shared data source documentation and coordination protocols - Enhanced README.md with ecosystem integration section linking to website repo - Updated examples/README.md with browser tutorial cross-references Strategic alignment recognizes parallel evolution toward geoparquet+DuckDB workflows: - Python repo: Advanced local analysis, lonboard visualization patterns - Website repo: Universal browser access, proven geoparquet migration success - Shared technology: DuckDB, HTTP range requests, same Zenodo data sources - Complementary roles: Deep analysis ↔ Public accessibility Implementation phases: 1. Documentation alignment (complete) 2. Pattern extraction (lonboard → Observable JS) 3. Infrastructure sharing (validation, testing) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add complete property graph model documentation and query patterns - Create enhanced Jupyter notebook with DuckDB analysis examples - Add self-executing Quarto document for browser-based exploration - Include modular sections for easy integration into existing workflows - Document entity relationships, performance optimization, and data quality 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Correct relationship paths in sample location queries - Update notebook to use proper graph traversal (Sample->Event->Location) - Move enhanced .qmd to isamplesorg.github.io/tutorials/ directory - Fix visualization function to return actual coordinate data 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Major enhancements: - Fix critical bug in sample location queries (0 → 1M+ results) - Add comprehensive Ibis examples for readable multi-step joins - Update documentation with corrected relationship paths - Performance comparison showing Ibis ~7% overhead vs raw SQL - Enhanced README highlighting new capabilities Technical improvements: - Proper property graph traversal patterns - Step-by-step query construction examples - Type-safe query building with better error handling - Modular, reusable query components 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

…ation Enhanced the oc_parquet_analysis notebook to clearly differentiate between: - Generic PQG (Property Graph) framework: Domain-agnostic graph representation - OpenContext-specific implementation: Archaeological entity types and predicates Key changes: - Added comprehensive explanation of the two-layer architecture - Annotated code to show which operations are framework vs domain - Updated all query examples with clear distinctions - Added comments identifying OpenContext entity types and predicates - Clarified that otype values are domain-specific, not part of PQG This makes the notebook more educational and helps users understand what's transferable to other PQG implementations vs what's specific to the archaeological domain. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed markdown cell formatting to use proper JSON array structure instead of single-line strings, improving readability and maintenance of the Jupyter notebook cells. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Completed the integration of generic PQG framework vs OpenContext-specific analysis distinction throughout the notebook. All cells now properly identify which operations are domain-agnostic patterns versus archaeological-specific implementations. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Updated documentation to correctly reflect the three-layer architecture: 1. PQG framework (generic property graph structure) 2. iSamples metadata model (domain-agnostic for all scientific samples) 3. Provider data (OpenContext, SESAR, GEOME provide domain-specific values) Changes: - README.md: Updated descriptions to emphasize cross-domain capabilities - oc_parquet_analysis_enhanced.ipynb: Rewrote intro to distinguish layers - ISAMPLES_MODEL_ACTION_PLAN.md: Action plan for systematic correction This correction makes the model's true power clearer - it's a universal framework for material sample metadata across geology, biology, archaeology, environmental science, and other scientific domains. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…omments Changed Path 1 and Path 2 documentation to use precise PQG entity type name "MaterialSampleRecord" instead of generic "Sample". This eliminates semantic confusion and aligns with Eric's query terminology from open-context-py. The term "Sample" is overloaded in scientific contexts. Using the formal PQG entity type name makes the relationship between iSamples model and queries explicit, and helps AI assistants provide more accurate guidance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Added three markdown cells explaining: - Path 1 (direct event location) vs Path 2 (via sampling site) concepts - Full relationship map showing all path types (geo, agent, concept) - Detailed analysis of Eric's 4 query functions and which paths they use This documentation clarifies: - Why "Path 1" and "Path 2" are useful organizing concepts - How Eric's queries from open-context-py map to these paths - That the graph has many more relationships beyond just geographic paths - Direction matters: most queries go sample→geo, but one reverses Key insight: SamplingEvent is the central hub, except for IdentifiedConcept which attaches directly to MaterialSampleRecord. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Path 1 and Path 2 were being presented as "alternative ways to get the same coordinates" which was misleading. This commit clarifies they provide different levels of geographic granularity: - Path 1 (sample_location): Precise field GPS coordinates for individual events - Path 2 (site_location): Administrative site grouping location Key changes: - Updated cells 14-17 to explain complementary nature vs alternatives - Added PKAP Survey Area example (1 site with 544 different sample locations) - Added Suberde example (1 site with 1 location - they converge) - Refined Eric's query function documentation to show path usage patterns - Removed empty cells and executed full notebook successfully This reconciles the conceptually sound Path 1/Path 2 framework with Eric Kansa's clarification that they shouldn't be presented as interchangeable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Demonstrates empirically that Path 1 and Path 2 are not just common patterns but the ONLY mathematically possible paths from MaterialSampleRecord to GeospatialCoordLocation in the iSamples model. New content added: - Markdown cell explaining the proof concept and why it matters - Step 1 query: Proves only SamplingEvent and SamplingSite connect TO GeospatialCoordLocation (1,096,274 and 18,213 edges respectively) - Step 2 query: Proves MaterialSampleRecord has ZERO direct edges to GeospatialCoordLocation - Step 3 query: Shows ALL outbound edges from MaterialSampleRecord, demonstrating only produced_by→SamplingEvent leads to geo data - Step 4 conclusion: Enumerates the exactly 2 paths and explains why any other path is mathematically impossible Key finding: This is a structural constraint of the iSamples metadata model itself, not just a data observation. Future iSamples implementations MUST follow this graph topology to be compliant. Validates that the Path 1/Path 2 framework is complete and exhaustive. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Updated CLAUDE.md with prominent warning about view_state syntax change - Fixed record_counts.ipynb cell 80: * Changed map_kwargs from old zoom/center to new view_state format * Added LIMIT 100000 to prevent loading 6M+ rows (was causing 5+ min hangs) - Added geoparquet0.ipynb as working reference implementation Lonboard 0.12+ requires: map_kwargs={"view_state": {"zoom": 1, "latitude": 0, "longitude": 0}} Instead of: map_kwargs={"zoom": 1, "center": {"lat": 0, "lon": 0}} Performance fix prevents timeout issues when visualizing large parquet datasets. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Created comprehensive jupytext pairing setup for better notebook version control: 1. JUPYTEXT_WORKFLOW.md - Full guide with workflows, troubleshooting, examples 2. QUICKREF_NOTEBOOKS.md - Quick reference card and command cheatsheet 3. .gitattributes - Git configuration for notebook handling 4. Updated CLAUDE.md - Added notebook workflow guidance for future sessions Key benefits: - Pair .ipynb with .py companions for clean git diffs - Edit .py files in Claude Code to avoid token limits on large notebooks - Commit both files: .ipynb for outputs, .py for clean code diffs - Auto-sync changes between paired files Helper script location: ~/bin/nb_pair.sh Related tools: - nb_source_diff.py for one-off diffs without outputs - jupytext pairing for permanent workflow 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Changes: - Updated examples/basic/geoparquet0.ipynb with execution outputs - Updated examples/basic/oc_parquet_analysis.ipynb - Updated examples/basic/oc_parquet_analysis_enhanced.ipynb with latest analysis - Added jupysql, duckdb-engine, toml to dependencies New dependencies support SQL magic commands in notebooks for better DuckDB integration and interactive queries. 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

rdhyee added 30 commits November 9, 2023 13:24

first pass at scraping the fields in the UI

ac679de

latest progress in working with iSamples API

b812b80

current logic of how to adapt pysolr to query /thing/select

26347b0

debugging pysolr get vs post query to /thing/select

f6bfc82

current state of my iSamples work on 2024.01.24 before I make major c…

4870018

…leanup.

simple use Jupyter widgets

e2b6fc3

a pass at getting this to run as a Docker container and also on mybinder

c8a390b

add a first draft of a Python client for bulk data handling in the iS…

1ab522c

…amples API

add ipytree to requirements.in

4171f00

a little start to clearning up isbclient.py

6561ffe

add a record_count method to IsbClient2

cae3890

adapted the facets function for IsbClient2

717044a

adapted pivot for IsbClient2

6260351

Using Claude Sonnet 3.5 + back and forth from RY to produce a tutoria…

1863eec

…l notebook on geoparquet and duckdb

add new dependencies

4288730

new version of tutorial with using polars and reading into pandas -- …

694f456

…the calculation of the distances of the cities are screwy though.

reaching the limits of the Claude-assisted tutorial generation for ge…

531ed59

…oparquet_duckdb_tutorial.md

first version of trying to analyze the geoparquet files coming out of…

9759d2b

… the isample export

update the version of minimal-notebook and removing jupytext in requi…

533a867

…rements.in

add code to install requirements if in google colab

01ea3f5

catchup: 2024.08.29

9779564

refactoring to make package pip installable

0d08e4b

setting up basic package structure for isamples_client

2563fb4

add git+https://github.com/rdhyee/isamples-python.git@exploratory#egg…

f2bfc12

…=isamples_client to requirements.in

ooops forgot to add pyproject.toml to the repo

0ea6998

installing poetry not working yet -- but a stepping stone towards it …

f069370

…working in Dockerfile

runs until the end...but permission problem

2b33382

clean up Dockerfile

8f12090

changes to try to use poetry to install dependencies

436bf23

seems like we can use poetry now to install dependencies in google colab

68b90dc

rdhyee and others added 30 commits February 18, 2025 10:17

some preliminary work on Eric's parquet files

0a12313

catchup: 2025.02.24

2e8d56f

added mysql to dockerfile

6b94d3c

starter sample code of geoparquet

f456abe

capature the current state before asking for major refactor of geopar…

b20218a

…quet.ipynb for GitHub copilot to make for easier parameterization of the map

working version of geoparquet -- rough cut of some interactivity

0f567fa

put code to grab the data file from Zenodo if the file is not available.

689687d

incorporating some documentation of geoparquet exploration

06f3dd0

first draft of isample-archive.ipynb to do a simple duckdb calculatio…

68fce27

…n on the iSamples archive parquet file in zenodo

show the efficiencies of using duckdb to compute on a remote geoparqu…

c6d9228

…et file on zenodo

more complicated analyses of geoparquet

482f45d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] merging the current exploratory work into the main exploratory repo #1

[WIP] merging the current exploratory work into the main exploratory repo #1

Uh oh!

rdhyee commented Sep 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[WIP] merging the current exploratory work into the main exploratory repo #1

Are you sure you want to change the base?

[WIP] merging the current exploratory work into the main exploratory repo #1

Uh oh!

Conversation

rdhyee commented Sep 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant