<a href="https://colab.research.google.com/github/elephant-xyz/notebook/blob/main/PhotoMedtaData.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🐘 Welcome to Step 4 of Elephant Mining

Congratulations on reaching **Step 4**! By now, you’ve successfully **minted your County Data Group**. In this notebook, you'll use your **seed data** and **property images** to mint your **Photo Data Group**.

---

## 🧠 What You’ll Do in This Step

This notebook allows you to:

- Upload your property images  
- Mint a new **Photo Data Group**  
- Automatically generate a **fact sheet** based on the image metadata  

This step completes the visual layer of your dataset, setting you up for further data enrichment.

---

## ✅ Prerequisites

Before continuing, make sure you’ve completed the following two notebooks:

1. [📗 Notebook 1: Seed Minting](https://colab.research.google.com/drive/14tSNSP8Pe-mY4VwX9JhXgfyOvzmN3kC0?usp=chrome_ntp)  
2. [📘 Notebook 2: County Data Minting](https://colab.research.google.com/drive/1ZI_eScKFh2kDIZgwXljhOgBIgrenDhRi?usp=chrome_ntp)

After running both, you should have the following output files ready:

- `upload-results.json`  
- `submit.zip`

Also ensure you have:

- **OpenAI API Key**: Valid API key for AI processing capabilities
- **AWS Account with Credentials**: AWS Access Key ID and Secret Access Key for cloud services
- **Pinata Account**: JWT token for IPFS storage services
- **Elephant Address**: Private key for blockchain integration

**Required Environment Variables:**
---

## 📸 In This Notebook

Once your image files are uploaded:

1. The images will be minted into the **Photo Data Group**  
2. A **fact sheet** will be generated for inspection  
3. You can continue with **image-based metadata extraction**  
4. This will lead to a complete and enriched data product  

---



## 📥 Step 1: Upload the `.env` File

This notebook requires a `.env` file containing your API keys and credentials. Create a file with the following environment variables:

| Variable Name | Purpose |
|---|---|
| `OPENAI_API_KEY` | Access to OpenAI API |
| `AWS_ACCESS_KEY_ID` | AWS access key |
| `AWS_SECRET_ACCESS_KEY` | AWS secret access key |
| `AWS_DEFAULT_REGION` | AWS REGION |
| `S3_BUCKET_NAME` | Your S3 bucket name |
| `IMAGE_FOLDER_NAME` | Image directory |
| `IMAGES_DIR` | Directory path for images |
| `ELEPHANT_PRIVATE_KEY` | Elephant wallet private key |
| `PINATA_JWT` | Pinata authentication token |

### To upload:
1. Click the **folder icon** 📂 in the left sidebar
2. Click the **"Upload"** button
3. Select your `.env` file

### Example `.env` file:
```env
OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXXXXXXXXXX
AWS_ACCESS_KEY_ID=XXXXXX
AWS_SECRET_ACCESS_KEY=XXXXXX
S3_BUCKET_NAME=your-s3-bucket-name-here
IMAGES_DIR=images
IMAGE_FOLDER_NAME=images
ELEPHANT_PRIVATE_KEY=xxxxx
PINATA_JWT=xxxxx
```

> ⚠️ **Security Note:** Never commit your `.env` file to version control or share it publicly.


## Step 2: Upload `upload_results.csv`

Upload the `upload_results.csv` file to the `/content/` directory.

> 📌 **Important**: This file was generated by running **Step 2** of the [Seed Data Notebook](https://colab.research.google.com/drive/14tSNSP8Pe-mY4VwX9JhXgfyOvzmN3kC0?usp=sharing#scrollTo=OFKp4E49651Z)

The file should now be downloaded and ready to upload to `/content/upload_results.csv`

## Step 3: Upload `submit.zip`

Upload the `submit.zip` file to the `/content/` directory.

> 📌 **Important**: This file was generated by running **Step 3** of the [County Data Notebook](https://colab.research.google.com/drive/1ZI_eScKFh2kDIZgwXljhOgBIgrenDhRi#scrollTo=HA0ppLFpUm1j)

The file should now be downloaded and ready to upload to `/content/submit.zip`

## Step 4: Verify Data Exists

Once both files are uploaded to `/content/`, you can proceed with the main workflow that depends on these generated datasets.

**Expected file locations:**
- `/content/upload-results.csv`
- `/content/submit.zip`







In [50]:
!ls -la /content/upload-results.csv
!ls -la /content/submit.zip

-rw-r--r-- 1 root root 449 Jul 26 00:32 /content/upload-results.csv
-rw-r--r-- 1 root root 30282 Jul 26 00:32 /content/submit.zip


##Step 5: Install Package & Setup Folders
This step:

Installs the photo-meta-data-ai package from GitHub
Creates all necessary folders for the project
Saves installation details to a log file for troubleshooting

Takes 1-2 minutes to complete. Once finished, you'll have the AI package installed and folder structure ready for processing photos.



In [51]:
# 1. Install the package
!pip install --force-reinstall --no-cache-dir git+https://github.com/elephant-xyz/photo-meta-data-ai.git > /content/install_log.txt 2>&1

# 2. Set up folders
!colab-folder-setup


##Step 6: Upload Images to Property Subfolders
Upload your property images into the pre-created subfolders:

Each property already has a subfolder named with its Parcel ID under the images folder
Simply drag and drop your images into the correct property subfolder
All images for a specific property should go in that property's designated folder

The AI will process each property's images and generate metadata organized by Parcel ID.






## Step 8 Uploading Photos

In [52]:
!process-photo-data
!npx -y @elephant-xyz/cli@latest validate-and-upload photo_data_group --output-csv photos.csv

  - Processed bafkreigzz5foh5ts76vvhxphzulptpnjwznog6lcnxw5wsvfqa7zlxeioa: 74 images, root.json created

Processing complete:
  - Processed 1 directories with images
  - Generated 74 image metadata files
  - Each directory has its own root.json
  - Supported formats: .jpeg, .jpg, .png, .webp
[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K[1m[34m🐘 Elephant Network CLI - Validate and Upload[39m[22m

7[?25l[?7l[1GInitializing    |[36m████████████████████████████████████████[39m| 100% | 0/0 | Errors: 0 | Skipped: 0 | 0s | ETA: 0s[0K[1GProcessing Files |[36m░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░[39m| 0% | 0/1 | Errors: 0 | Skipped: 0 | 0s | ETA: NFs[0K[1GProcessing Files |[36m░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░[39m| 0% | 0/1 | Errors: 0 | Skipped: 0 | 1s | ETA: NFs[0K[1GPr

## Step 9: Submitting Your Data to the Blockchain

### Submitting Your Data

After running the upload command in the notebook:

1. **Download your results file**
   - The notebook will generate `photos.csv`
   - This file contains your data hashes and IPFS CIDs
   - Download it to your computer

2. **Visit the Oracle Submission Portal**
   - Go to https://oracle.elephant.xyz/
   - Connect your MetaMask wallet when prompted
   - Upload your `submit-results.csv` file

3. **Submit transactions**
   - The portal will read your CSV and prepare transactions
   - Click "Submit to Contract" to begin
   - MetaMask will pop up for each data entry
   - Confirm each transaction (small gas fee applies)
   - Wait for confirmations between submissions

Once complete, your data is permanently recorded on the blockchain. You'll receive vMahout tokens as rewards after consensus is reached (when 3 different oracles submit matching data hashes).

## Step 10: Setup and Run AWS Rekognition

The system automatically sets up and runs Amazon Rekognition to analyze your property images:

- Connects to AWS Rekognition service for AI-powered image analysis
- Processes all images in your property folders automatically
- Extracts detailed information like room types, architectural features, and property characteristics

No action needed from you - the system handles everything automatically and will notify you when processing is complete.


In [53]:
!bucket-manager
!unzip-county-data
!upload-to-s3
!photo-categorizer


📊 COMPREHENSIVE CATEGORIZATION SUMMARY

🏠 TOTAL PROPERTIES PROCESSED: 1
🖼️  TOTAL IMAGES: 74
✅ TOTAL CATEGORIZED: 74
📈 SUCCESS RATE: 100.0%

📁 OVERALL CATEGORY BREAKDOWN:
   exterior: 30 images
   living_room: 14 images
   kitchen: 12 images
   other: 6 images
   bedroom: 6 images
   closet: 2 images
   garage: 2 images
   laundry: 1 images
   pool: 1 images

🏠 PROPERTY-BY-PROPERTY BREAKDOWN:
--------------------------------------------------------------------------------

📍 Property: 52434205310037080
   Address: Property 52434205310037080
   Total Images: 74
   Categorized: 74
   Success Rate: 100.0%
   Categories:
     • exterior: 30 images
     • living_room: 14 images
     • kitchen: 12 images
     • other: 6 images
     • bedroom: 6 images
     • closet: 2 images
     • garage: 2 images
     • laundry: 1 images
     • pool: 1 images



## Step 11: Running AI to Extract Data from Images

The AI system now analyzes your property images to extract valuable metadata:

1. **Image Analysis**: AI examines each photo to identify rooms, features, and property details
2. **Data Extraction**: System pulls out structured information like room types, square footage estimates, architectural elements, and condition assessments

The process runs automatically across all your uploaded property images, generating comprehensive metadata reports for each parcel.



In [85]:
!ai-analyzer --local-folders --parallel-categories --all-properties
!property-summarizer --all-properties


PROPERTY SUMMARY: 52434205310037080

📋 LAYOUTS (13 total)
----------------------------------------
Space Types:
  • Laundry Room
  • Bedroom
  • Home Office
  • Dining Room
  • Pantry
  • Attached Garage
  • Living Room
  • Full Bathroom
  • Closet
  • Kitchen
  • Laundry Room: 
  • Bedroom: 
  • Home Office: 
  • Bedroom: 
  • Dining Room: 
  • Pantry: 
  • Attached Garage: 
  • Living Room: 
  • Full Bathroom: 
  • Closet: 
  • Full Bathroom: 
  • Kitchen: 
  • Full Bathroom: 

🏠 STRUCTURE
----------------------------------------
  No structure data found

🌳 LOT
----------------------------------------
  No lot data found

⚡ UTILITIES
----------------------------------------
  No utility data found

🔌 APPLIANCES (5 total)
----------------------------------------
Types: Oven, Microwave, Washing Machine, Refrigerator
  • Oven
  • Microwave
  • Washing Machine
  • Refrigerator


🎉 COMPLETED: Summarized 1 properties
  • 52434205310037080: 13 layouts, 5 appliances


## Step 12: Data Validation and Submission

The system validates extracted data and prepares it for final submission:

1. **Data Validation**: Reviews and verifies all extracted metadata for accuracy
2. **Submission Preparation**: Validated data is formatted and organized for CLI submission
3. **CLI Submission**: System automatically submits the processed data through the command line interface
4. **Fact Sheet Generation**: Creates comprehensive property fact sheets with all extracted information, images, and metadata

Final deliverables include validated property reports and detailed fact sheets ready for use.


In [91]:
!fix-schema-validation
!copy-all-data-for-submission
!copy-all-files-from-zip
!npx @elephant-xyz/cli@1.16.2 validate-and-upload submit-photo --output-csv submit-results.csv

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K[1m[34m🐘 Elephant Network CLI - Validate and Upload[39m[22m

7[?25l[?7l[1GInitializing    |[36m████████████████████████████████████████[39m| 100% | 0/0 | Errors: 0 | Skipped: 0 | 0s | ETA: 0s[0K7[?25l[?7l[1GFetching Schemas |[36m░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░[39m| 0% | 0/2 | Errors: 0 | Skipped: 0 | 0s | ETA: 0s[0K[1GPre-fetching Schemas |[36m░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░[39m| 0% | 0/1 | Errors: 0 | Skipped: 0 | 0s | ETA: NFs[0K[1GFetching Schemas |[36m████████████████████████████████████████[39m| 100% | 2/2 | Errors: 0 | Skipped: 0 | 0s | ETA: 0s[0K[?25h[?7h8
[1GProcessing Files |[36m░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░[39m| 0% | 0/2 | Errors: 0 | Skipped: 0 | 0s | ETA: NFs[0K[1GP

## Step 13: Submitting Your Data to the Blockchain

### Submitting Your Data

After running the upload command in the notebook:

1. **Download your results file**
   - The notebook will generate `submit-results.csv`
   - This file contains your data hashes and IPFS CIDs
   - Download it to your computer

2. **Visit the Oracle Submission Portal**
   - Go to https://oracle.elephant.xyz/
   - Connect your MetaMask wallet when prompted
   - Upload your `submit-results.csv` file

3. **Submit transactions**
   - The portal will read your CSV and prepare transactions
   - Click "Submit to Contract" to begin
   - MetaMask will pop up for each data entry
   - Confirm each transaction (small gas fee applies)
   - Wait for confirmations between submissions

Once complete, your data is permanently recorded on the blockchain. You'll receive vMahout tokens as rewards after consensus is reached (when 3 different oracles submit matching data hashes).

## Step 14: Package and Download Results

This step creates downloadable files with all your processed data. The system will:

1. **Create Download Package**: Automatically zip the submit-photos folder containing all fact sheets and processed images

2. **Download Results**: Two files will be made available for download

**Files to Download:**
- `submit-results.csv` - Structured data with all extracted property metadata
- `submit-photo.zip` - Complete package containing fact sheets and processed images

Your processed property data is now saved locally for use.

In [None]:
!zip -r submit-photo.zip submit-photo/ > /dev/null 2>&1



## Step 15: Cleanup

Final and optional step to save your results and clean up the workspace:

3. **Cleanup Workspace**: After downloading, the system removes all temporary files and folders including:
  - `images` folder (uploaded property photos)
  - `output` folder (processing files)
  - `county-data` folder (temporary data)
  - `submit-photos` folder (final results)
  - `logs` folder (processing logs)

**Important**: Make sure to download your results before the cleanup step, as all files will be permanently deleted from the workspace.

In [92]:
!rm -rf images/ output/ county-data/ submit-photo/ logs/ photo_data_group/ > /dev/null 2>&1
!find . -maxdepth 1 -type f \( -name "*.csv" -o -name "*.zip" -o -name "*.log" -o -name "*.txt" -o -name ".env" \) -exec rm -f {} \;

!rm -rf /root/.local/bin/fact-sheet
!rm -rf fact-sheet-template/
!rm -rf /root/.elephant-fact-sheet

In [89]:
%%bash

set -e

# Colors (won’t display in Colab but kept for compatibility)
GREEN='\033[0;32m'
BLUE='\033[0;34m'
RED='\033[0;31m'
NC='\033[0m'

INSTALL_DIR="${HOME}/.elephant-fact-sheet"
BIN_DIR="${HOME}/.local/bin"

echo -e "${BLUE}🐘 Elephant Fact Sheet Template Installer${NC}"

# Check Node.js
if ! command -v node &> /dev/null; then
    echo -e "${RED}❌ Node.js is not installed${NC}"
    exit 1
fi

NODE_VERSION=$(node -v | cut -d'v' -f2 | cut -d'.' -f1)
if [ "$NODE_VERSION" -lt 18 ]; then
    echo -e "${RED}❌ Node.js 18+ required${NC}"
    exit 1
fi

echo -e "${GREEN}✓ Node.js $(node -v) and npm $(npm -v) detected${NC}"

# Cleanup
rm -rf "$INSTALL_DIR"
mkdir -p "$BIN_DIR"

# Clone and build
echo $INSTALL_DIR
git clone https://github.com/elephant-xyz/fact-sheet-template.git "$INSTALL_DIR"
cd "$INSTALL_DIR"
npm install
npm run build
npm link

# Symlink
ln -sf "$INSTALL_DIR/bin/fact-sheet.js" /root/.local/bin/fact-sheet
chmod +x "$INSTALL_DIR/bin/fact-sheet.js"

echo ""
echo -e "${GREEN}✅ Installed successfully in $INSTALL_DIR${NC}"
echo "To run it:"
echo "    node $INSTALL_DIR/bin/fact-sheet.js --help"


[0;34m🐘 Elephant Fact Sheet Template Installer[0m
[0;32m✓ Node.js v20.19.0 and npm 10.8.2 detected[0m
/root/.elephant-fact-sheet

> @elephant/fact-sheet@1.0.0 postinstall
> npm run build


> @elephant/fact-sheet@1.0.0 build
> tsc


> @elephant/fact-sheet@1.0.0 prepare
> npm run build


> @elephant/fact-sheet@1.0.0 build
> tsc


added 242 packages, and audited 243 packages in 12s

63 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities

> @elephant/fact-sheet@1.0.0 build
> tsc


up to date, audited 3 packages in 10s

found 0 vulnerabilities

[0;32m✅ Installed successfully in /root/.elephant-fact-sheet[0m
To run it:
    node /root/.elephant-fact-sheet/bin/fact-sheet.js --help


Cloning into '/root/.elephant-fact-sheet'...
npm warn deprecated inflight@1.0.6: This module is not supported, and leaks memory. Do not use it. Check out lru-cache if you want a good and tested way to coalesce async requests by a key value, which is much more comprehensive and powerful.
npm warn deprecated @humanwhocodes/config-array@0.13.0: Use @eslint/config-array instead
npm warn deprecated rimraf@3.0.2: Rimraf versions prior to v4 are no longer supported
npm warn deprecated @humanwhocodes/object-schema@2.0.3: Use @eslint/object-schema instead
npm warn deprecated glob@7.2.3: Glob versions prior to v9 are no longer supported
npm warn deprecated eslint@8.57.1: This version is no longer supported. Please see https://eslint.org/version-support for other options.


In [79]:
!echo "$INSTALL_DIR"





In [94]:
%%bash

# Elephant Fact Sheet Template Installer
# This script clones the repository, builds it, and sets up the command globally

set -e  # Exit on error

# Colors for output
GREEN='\033[0;32m'
BLUE='\033[0;34m'
RED='\033[0;31m'
NC='\033[0m' # No Color

# Default installation directory
INSTALL_DIR="${HOME}/.elephant-fact-sheet"
BIN_DIR="${HOME}/.local/bin"

echo -e "${BLUE}🐘 Elephant Fact Sheet Template Installer${NC}"
echo ""

# Check if Node.js is installed
if ! command -v node &> /dev/null; then
    echo -e "${RED}❌ Node.js is not installed. Please install Node.js 18+ first.${NC}"
    exit 1
fi

# Check Node.js version
NODE_VERSION=$(node -v | cut -d'v' -f2 | cut -d'.' -f1)
if [ "$NODE_VERSION" -lt 18 ]; then
    echo -e "${RED}❌ Node.js 18+ is required. Current version: $(node -v)${NC}"
    exit 1
fi

# Check if npm is installed
if ! command -v npm &> /dev/null; then
    echo -e "${RED}❌ npm is not installed. Please install npm first.${NC}"
    exit 1
fi

echo -e "${GREEN}✓ Node.js $(node -v) and npm $(npm -v) detected${NC}"
echo ""

# Remove existing installation if present
if [ -d "$INSTALL_DIR" ]; then
    echo "Removing existing installation..."
    rm -rf "$INSTALL_DIR"
fi

# Clone the repository
echo "Cloning repository..."
git clone https://github.com/elephant-xyz/fact-sheet-template.git "$INSTALL_DIR"

# Navigate to installation directory
cd "$INSTALL_DIR"

# Install dependencies
echo ""
echo "Installing dependencies..."
npm install

# Build the project
echo ""
echo "Building project..."
npm run build

# Create bin directory if it doesn't exist
mkdir -p "$BIN_DIR"

STATIC_TARGET_DIR="$BIN_DIR/assets/static"
mkdir -p "$STATIC_TARGET_DIR"
cp -r "$INSTALL_DIR/template/assets/static/"* "$STATIC_TARGET_DIR"

# Create symlink for global command
echo ""
echo "Setting up global command..."
ln -sf "$INSTALL_DIR/bin/fact-sheet.js" "$BIN_DIR/fact-sheet"

# Make the script executable
chmod +x "$INSTALL_DIR/bin/fact-sheet.js"

# Check if ~/.local/bin is in PATH
if [[ ":$PATH:" != *":$BIN_DIR:"* ]]; then
    echo ""
    echo -e "${BLUE}ℹ️  Add the following line to your shell configuration file (.bashrc, .zshrc, etc.):${NC}"
    echo ""
    echo "    export PATH=\"\$HOME/.local/bin:\$PATH\""
    echo ""
    echo "Then reload your shell configuration:"
    echo "    source ~/.bashrc  # or source ~/.zshrc"
    echo ""
else
    echo -e "${GREEN}✓ $BIN_DIR is already in your PATH${NC}"
fi

echo ""
echo -e "${GREEN}✅ Installation complete!${NC}"
echo ""
echo "You can now use the fact-sheet command:"
echo "    fact-sheet generate --input ./data --output ./websites"
echo ""
echo "For help:"
echo "    fact-sheet --help"
echo ""
echo "To uninstall, run:"
echo "    rm -rf $INSTALL_DIR"
echo "    rm $BIN_DIR/fact-sheet"

[0;34m🐘 Elephant Fact Sheet Template Installer[0m

[0;32m✓ Node.js v20.19.0 and npm 10.8.2 detected[0m

Removing existing installation...
Cloning repository...

Installing dependencies...

> @elephant/fact-sheet@1.0.0 postinstall
> npm run build


> @elephant/fact-sheet@1.0.0 build
> tsc


> @elephant/fact-sheet@1.0.0 prepare
> npm run build


> @elephant/fact-sheet@1.0.0 build
> tsc


added 242 packages, and audited 243 packages in 12s

63 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities

Building project...

> @elephant/fact-sheet@1.0.0 build
> tsc


Setting up global command...

[0;34mℹ️  Add the following line to your shell configuration file (.bashrc, .zshrc, etc.):[0m

    export PATH="$HOME/.local/bin:$PATH"

Then reload your shell configuration:
    source ~/.bashrc  # or source ~/.zshrc


[0;32m✅ Installation complete![0m

You can now use the fact-sheet command:
    fact-sheet generate --input ./data --output ./websites

For help:


Cloning into '/root/.elephant-fact-sheet'...
npm warn deprecated inflight@1.0.6: This module is not supported, and leaks memory. Do not use it. Check out lru-cache if you want a good and tested way to coalesce async requests by a key value, which is much more comprehensive and powerful.
npm warn deprecated @humanwhocodes/config-array@0.13.0: Use @eslint/config-array instead
npm warn deprecated rimraf@3.0.2: Rimraf versions prior to v4 are no longer supported
npm warn deprecated @humanwhocodes/object-schema@2.0.3: Use @eslint/object-schema instead
npm warn deprecated glob@7.2.3: Glob versions prior to v9 are no longer supported
npm warn deprecated eslint@8.57.1: This version is no longer supported. Please see https://eslint.org/version-support for other options.
