-
Notifications
You must be signed in to change notification settings - Fork 1
[WIP] merging the current exploratory work into the main exploratory repo #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
rdhyee
wants to merge
73
commits into
isamplesorg:main
Choose a base branch
from
rdhyee:exploratory
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…l notebook on geoparquet and duckdb
…the calculation of the distances of the cities are screwy though.
…oparquet_duckdb_tutorial.md
… the isample export
…working in Dockerfile
…quet.ipynb for GitHub copilot to make for easier parameterization of the map
…n on the iSamples archive parquet file in zenodo
…et file on zenodo
Created development guide including: - Poetry dependency management and testing commands - Three-tier client architecture overview (IsbClient, IsbClient2, ISamplesBulkHandler) - Jupyter notebook integration patterns - Docker environment setup for reproducible development - Key configuration constants and development patterns 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Updated README.md with architecture overview and geoparquet focus - Added STATUS.md documenting current WIP state and loose ends - Enhanced CLAUDE.md with API offline status and troubleshooting - Created examples/README.md with detailed notebook documentation - Added sample data files (geoparquet, parquet, Excel) - Included experimental notebooks and JavaScript exploration files - Added Node.js configuration files for multi-language development Focus shift: API-dependent workflows → offline-first geoparquet analysis Key notebooks: geoparquet.ipynb (lonboard viz), isample-archive.ipynb (DuckDB) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add node_modules/ and npm log files - Ignore .claude/ temporary files from Claude Code - Exclude Office temporary files (~$*.xlsx, etc.) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- package.json with plantuml-encoder dependency for diagram generation - Separate from root clasp dependency for Google Apps Script - Supports multi-tool JavaScript experimentation in different contexts 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- CROSS_REPO_ALIGNMENT.md: Comprehensive strategy for aligning isamples-python with isamplesorg.github.io companion repository - DATA_SOURCES.md: Shared data source documentation and coordination protocols - Enhanced README.md with ecosystem integration section linking to website repo - Updated examples/README.md with browser tutorial cross-references Strategic alignment recognizes parallel evolution toward geoparquet+DuckDB workflows: - Python repo: Advanced local analysis, lonboard visualization patterns - Website repo: Universal browser access, proven geoparquet migration success - Shared technology: DuckDB, HTTP range requests, same Zenodo data sources - Complementary roles: Deep analysis ↔ Public accessibility Implementation phases: 1. Documentation alignment (complete) 2. Pattern extraction (lonboard → Observable JS) 3. Infrastructure sharing (validation, testing) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add complete property graph model documentation and query patterns - Create enhanced Jupyter notebook with DuckDB analysis examples - Add self-executing Quarto document for browser-based exploration - Include modular sections for easy integration into existing workflows - Document entity relationships, performance optimization, and data quality 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Correct relationship paths in sample location queries - Update notebook to use proper graph traversal (Sample->Event->Location) - Move enhanced .qmd to isamplesorg.github.io/tutorials/ directory - Fix visualization function to return actual coordinate data 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Major enhancements: - Fix critical bug in sample location queries (0 → 1M+ results) - Add comprehensive Ibis examples for readable multi-step joins - Update documentation with corrected relationship paths - Performance comparison showing Ibis ~7% overhead vs raw SQL - Enhanced README highlighting new capabilities Technical improvements: - Proper property graph traversal patterns - Step-by-step query construction examples - Type-safe query building with better error handling - Modular, reusable query components 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…ation Enhanced the oc_parquet_analysis notebook to clearly differentiate between: - Generic PQG (Property Graph) framework: Domain-agnostic graph representation - OpenContext-specific implementation: Archaeological entity types and predicates Key changes: - Added comprehensive explanation of the two-layer architecture - Annotated code to show which operations are framework vs domain - Updated all query examples with clear distinctions - Added comments identifying OpenContext entity types and predicates - Clarified that otype values are domain-specific, not part of PQG This makes the notebook more educational and helps users understand what's transferable to other PQG implementations vs what's specific to the archaeological domain. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed markdown cell formatting to use proper JSON array structure instead of single-line strings, improving readability and maintenance of the Jupyter notebook cells. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Completed the integration of generic PQG framework vs OpenContext-specific analysis distinction throughout the notebook. All cells now properly identify which operations are domain-agnostic patterns versus archaeological-specific implementations. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Updated documentation to correctly reflect the three-layer architecture: 1. PQG framework (generic property graph structure) 2. iSamples metadata model (domain-agnostic for all scientific samples) 3. Provider data (OpenContext, SESAR, GEOME provide domain-specific values) Changes: - README.md: Updated descriptions to emphasize cross-domain capabilities - oc_parquet_analysis_enhanced.ipynb: Rewrote intro to distinguish layers - ISAMPLES_MODEL_ACTION_PLAN.md: Action plan for systematic correction This correction makes the model's true power clearer - it's a universal framework for material sample metadata across geology, biology, archaeology, environmental science, and other scientific domains. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…omments Changed Path 1 and Path 2 documentation to use precise PQG entity type name "MaterialSampleRecord" instead of generic "Sample". This eliminates semantic confusion and aligns with Eric's query terminology from open-context-py. The term "Sample" is overloaded in scientific contexts. Using the formal PQG entity type name makes the relationship between iSamples model and queries explicit, and helps AI assistants provide more accurate guidance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added three markdown cells explaining: - Path 1 (direct event location) vs Path 2 (via sampling site) concepts - Full relationship map showing all path types (geo, agent, concept) - Detailed analysis of Eric's 4 query functions and which paths they use This documentation clarifies: - Why "Path 1" and "Path 2" are useful organizing concepts - How Eric's queries from open-context-py map to these paths - That the graph has many more relationships beyond just geographic paths - Direction matters: most queries go sample→geo, but one reverses Key insight: SamplingEvent is the central hub, except for IdentifiedConcept which attaches directly to MaterialSampleRecord. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Path 1 and Path 2 were being presented as "alternative ways to get the same coordinates" which was misleading. This commit clarifies they provide different levels of geographic granularity: - Path 1 (sample_location): Precise field GPS coordinates for individual events - Path 2 (site_location): Administrative site grouping location Key changes: - Updated cells 14-17 to explain complementary nature vs alternatives - Added PKAP Survey Area example (1 site with 544 different sample locations) - Added Suberde example (1 site with 1 location - they converge) - Refined Eric's query function documentation to show path usage patterns - Removed empty cells and executed full notebook successfully This reconciles the conceptually sound Path 1/Path 2 framework with Eric Kansa's clarification that they shouldn't be presented as interchangeable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Demonstrates empirically that Path 1 and Path 2 are not just common patterns but the ONLY mathematically possible paths from MaterialSampleRecord to GeospatialCoordLocation in the iSamples model. New content added: - Markdown cell explaining the proof concept and why it matters - Step 1 query: Proves only SamplingEvent and SamplingSite connect TO GeospatialCoordLocation (1,096,274 and 18,213 edges respectively) - Step 2 query: Proves MaterialSampleRecord has ZERO direct edges to GeospatialCoordLocation - Step 3 query: Shows ALL outbound edges from MaterialSampleRecord, demonstrating only produced_by→SamplingEvent leads to geo data - Step 4 conclusion: Enumerates the exactly 2 paths and explains why any other path is mathematically impossible Key finding: This is a structural constraint of the iSamples metadata model itself, not just a data observation. Future iSamples implementations MUST follow this graph topology to be compliant. Validates that the Path 1/Path 2 framework is complete and exhaustive. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Updated CLAUDE.md with prominent warning about view_state syntax change
- Fixed record_counts.ipynb cell 80:
* Changed map_kwargs from old zoom/center to new view_state format
* Added LIMIT 100000 to prevent loading 6M+ rows (was causing 5+ min hangs)
- Added geoparquet0.ipynb as working reference implementation
Lonboard 0.12+ requires:
map_kwargs={"view_state": {"zoom": 1, "latitude": 0, "longitude": 0}}
Instead of:
map_kwargs={"zoom": 1, "center": {"lat": 0, "lon": 0}}
Performance fix prevents timeout issues when visualizing large parquet datasets.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Created comprehensive jupytext pairing setup for better notebook version control: 1. JUPYTEXT_WORKFLOW.md - Full guide with workflows, troubleshooting, examples 2. QUICKREF_NOTEBOOKS.md - Quick reference card and command cheatsheet 3. .gitattributes - Git configuration for notebook handling 4. Updated CLAUDE.md - Added notebook workflow guidance for future sessions Key benefits: - Pair .ipynb with .py companions for clean git diffs - Edit .py files in Claude Code to avoid token limits on large notebooks - Commit both files: .ipynb for outputs, .py for clean code diffs - Auto-sync changes between paired files Helper script location: ~/bin/nb_pair.sh Related tools: - nb_source_diff.py for one-off diffs without outputs - jupytext pairing for permanent workflow 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Changes: - Updated examples/basic/geoparquet0.ipynb with execution outputs - Updated examples/basic/oc_parquet_analysis.ipynb - Updated examples/basic/oc_parquet_analysis_enhanced.ipynb with latest analysis - Added jupysql, duckdb-engine, toml to dependencies New dependencies support SQL magic commands in notebooks for better DuckDB integration and interactive queries. 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Work in progress.....