Skip to content

Conversation

@rdhyee
Copy link

@rdhyee rdhyee commented Sep 19, 2024

Work in progress.....

rdhyee added 30 commits November 9, 2023 13:24
…the calculation of the distances of the cities are screwy though.
rdhyee and others added 30 commits February 18, 2025 10:17
…quet.ipynb for GitHub copilot to make for easier parameterization of the map
…n on the iSamples archive parquet file in zenodo
Created development guide including:
- Poetry dependency management and testing commands
- Three-tier client architecture overview (IsbClient, IsbClient2, ISamplesBulkHandler)
- Jupyter notebook integration patterns
- Docker environment setup for reproducible development
- Key configuration constants and development patterns

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Updated README.md with architecture overview and geoparquet focus
- Added STATUS.md documenting current WIP state and loose ends
- Enhanced CLAUDE.md with API offline status and troubleshooting
- Created examples/README.md with detailed notebook documentation
- Added sample data files (geoparquet, parquet, Excel)
- Included experimental notebooks and JavaScript exploration files
- Added Node.js configuration files for multi-language development

Focus shift: API-dependent workflows → offline-first geoparquet analysis
Key notebooks: geoparquet.ipynb (lonboard viz), isample-archive.ipynb (DuckDB)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add node_modules/ and npm log files
- Ignore .claude/ temporary files from Claude Code
- Exclude Office temporary files (~$*.xlsx, etc.)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- package.json with plantuml-encoder dependency for diagram generation
- Separate from root clasp dependency for Google Apps Script
- Supports multi-tool JavaScript experimentation in different contexts

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- CROSS_REPO_ALIGNMENT.md: Comprehensive strategy for aligning isamples-python
  with isamplesorg.github.io companion repository
- DATA_SOURCES.md: Shared data source documentation and coordination protocols
- Enhanced README.md with ecosystem integration section linking to website repo
- Updated examples/README.md with browser tutorial cross-references

Strategic alignment recognizes parallel evolution toward geoparquet+DuckDB workflows:
- Python repo: Advanced local analysis, lonboard visualization patterns
- Website repo: Universal browser access, proven geoparquet migration success
- Shared technology: DuckDB, HTTP range requests, same Zenodo data sources
- Complementary roles: Deep analysis ↔ Public accessibility

Implementation phases:
1. Documentation alignment (complete)
2. Pattern extraction (lonboard → Observable JS)
3. Infrastructure sharing (validation, testing)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add complete property graph model documentation and query patterns
- Create enhanced Jupyter notebook with DuckDB analysis examples
- Add self-executing Quarto document for browser-based exploration
- Include modular sections for easy integration into existing workflows
- Document entity relationships, performance optimization, and data quality

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Correct relationship paths in sample location queries
- Update notebook to use proper graph traversal (Sample->Event->Location)
- Move enhanced .qmd to isamplesorg.github.io/tutorials/ directory
- Fix visualization function to return actual coordinate data

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Major enhancements:
- Fix critical bug in sample location queries (0 → 1M+ results)
- Add comprehensive Ibis examples for readable multi-step joins
- Update documentation with corrected relationship paths
- Performance comparison showing Ibis ~7% overhead vs raw SQL
- Enhanced README highlighting new capabilities

Technical improvements:
- Proper property graph traversal patterns
- Step-by-step query construction examples
- Type-safe query building with better error handling
- Modular, reusable query components

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ation

Enhanced the oc_parquet_analysis notebook to clearly differentiate between:
- Generic PQG (Property Graph) framework: Domain-agnostic graph representation
- OpenContext-specific implementation: Archaeological entity types and predicates

Key changes:
- Added comprehensive explanation of the two-layer architecture
- Annotated code to show which operations are framework vs domain
- Updated all query examples with clear distinctions
- Added comments identifying OpenContext entity types and predicates
- Clarified that otype values are domain-specific, not part of PQG

This makes the notebook more educational and helps users understand
what's transferable to other PQG implementations vs what's specific
to the archaeological domain.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Fixed markdown cell formatting to use proper JSON array structure
instead of single-line strings, improving readability and maintenance
of the Jupyter notebook cells.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Completed the integration of generic PQG framework vs OpenContext-specific
analysis distinction throughout the notebook. All cells now properly
identify which operations are domain-agnostic patterns versus
archaeological-specific implementations.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Updated documentation to correctly reflect the three-layer architecture:
1. PQG framework (generic property graph structure)
2. iSamples metadata model (domain-agnostic for all scientific samples)
3. Provider data (OpenContext, SESAR, GEOME provide domain-specific values)

Changes:
- README.md: Updated descriptions to emphasize cross-domain capabilities
- oc_parquet_analysis_enhanced.ipynb: Rewrote intro to distinguish layers
- ISAMPLES_MODEL_ACTION_PLAN.md: Action plan for systematic correction

This correction makes the model's true power clearer - it's a universal
framework for material sample metadata across geology, biology, archaeology,
environmental science, and other scientific domains.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…omments

Changed Path 1 and Path 2 documentation to use precise PQG entity type name
"MaterialSampleRecord" instead of generic "Sample". This eliminates semantic
confusion and aligns with Eric's query terminology from open-context-py.

The term "Sample" is overloaded in scientific contexts. Using the formal PQG
entity type name makes the relationship between iSamples model and queries
explicit, and helps AI assistants provide more accurate guidance.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Added three markdown cells explaining:
- Path 1 (direct event location) vs Path 2 (via sampling site) concepts
- Full relationship map showing all path types (geo, agent, concept)
- Detailed analysis of Eric's 4 query functions and which paths they use

This documentation clarifies:
- Why "Path 1" and "Path 2" are useful organizing concepts
- How Eric's queries from open-context-py map to these paths
- That the graph has many more relationships beyond just geographic paths
- Direction matters: most queries go sample→geo, but one reverses

Key insight: SamplingEvent is the central hub, except for IdentifiedConcept
which attaches directly to MaterialSampleRecord.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Path 1 and Path 2 were being presented as "alternative ways to get the same
coordinates" which was misleading. This commit clarifies they provide
different levels of geographic granularity:

- Path 1 (sample_location): Precise field GPS coordinates for individual events
- Path 2 (site_location): Administrative site grouping location

Key changes:
- Updated cells 14-17 to explain complementary nature vs alternatives
- Added PKAP Survey Area example (1 site with 544 different sample locations)
- Added Suberde example (1 site with 1 location - they converge)
- Refined Eric's query function documentation to show path usage patterns
- Removed empty cells and executed full notebook successfully

This reconciles the conceptually sound Path 1/Path 2 framework with Eric
Kansa's clarification that they shouldn't be presented as interchangeable.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Demonstrates empirically that Path 1 and Path 2 are not just common
patterns but the ONLY mathematically possible paths from
MaterialSampleRecord to GeospatialCoordLocation in the iSamples model.

New content added:
- Markdown cell explaining the proof concept and why it matters
- Step 1 query: Proves only SamplingEvent and SamplingSite connect TO
  GeospatialCoordLocation (1,096,274 and 18,213 edges respectively)
- Step 2 query: Proves MaterialSampleRecord has ZERO direct edges to
  GeospatialCoordLocation
- Step 3 query: Shows ALL outbound edges from MaterialSampleRecord,
  demonstrating only produced_by→SamplingEvent leads to geo data
- Step 4 conclusion: Enumerates the exactly 2 paths and explains why
  any other path is mathematically impossible

Key finding: This is a structural constraint of the iSamples metadata
model itself, not just a data observation. Future iSamples implementations
MUST follow this graph topology to be compliant.

Validates that the Path 1/Path 2 framework is complete and exhaustive.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Updated CLAUDE.md with prominent warning about view_state syntax change
- Fixed record_counts.ipynb cell 80:
  * Changed map_kwargs from old zoom/center to new view_state format
  * Added LIMIT 100000 to prevent loading 6M+ rows (was causing 5+ min hangs)
- Added geoparquet0.ipynb as working reference implementation

Lonboard 0.12+ requires:
  map_kwargs={"view_state": {"zoom": 1, "latitude": 0, "longitude": 0}}
Instead of:
  map_kwargs={"zoom": 1, "center": {"lat": 0, "lon": 0}}

Performance fix prevents timeout issues when visualizing large parquet datasets.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Created comprehensive jupytext pairing setup for better notebook version control:

1. JUPYTEXT_WORKFLOW.md - Full guide with workflows, troubleshooting, examples
2. QUICKREF_NOTEBOOKS.md - Quick reference card and command cheatsheet
3. .gitattributes - Git configuration for notebook handling
4. Updated CLAUDE.md - Added notebook workflow guidance for future sessions

Key benefits:
- Pair .ipynb with .py companions for clean git diffs
- Edit .py files in Claude Code to avoid token limits on large notebooks
- Commit both files: .ipynb for outputs, .py for clean code diffs
- Auto-sync changes between paired files

Helper script location: ~/bin/nb_pair.sh

Related tools:
- nb_source_diff.py for one-off diffs without outputs
- jupytext pairing for permanent workflow

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Changes:
- Updated examples/basic/geoparquet0.ipynb with execution outputs
- Updated examples/basic/oc_parquet_analysis.ipynb
- Updated examples/basic/oc_parquet_analysis_enhanced.ipynb with latest analysis
- Added jupysql, duckdb-engine, toml to dependencies

New dependencies support SQL magic commands in notebooks for better
DuckDB integration and interactive queries.

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant