# Wallet Agent Analysis Notebook

This notebook loads real wallet data, builds a prompt for an LLM (OpenAI GPT-4), and displays the analysis.

**Dependencies:**
- requests
- openai
- pydantic
- jupyter

You can install them with:
```python
!pip install openai requests pydantic
```
```

---

### 2. **Code**
```python
# Install dependencies if needed
# Uncomment the next line if running for the first time
# !pip install openai requests pydantic
```

---

### 3. **Markdown**
```markdown
## Set your OpenAI API key securely
You can use an environment variable or input() for security.
```

---

### 4. **Code**
```python
import os
import openai

# Option 1: Set via environment variable
openai.api_key = os.getenv('OPENAI_API_KEY')

# Option 2: Prompt for key (uncomment if needed)
# openai.api_key = input('Enter your OpenAI API key: ')
```

---

### 5. **Markdown**
```markdown
## Load agent input data
This should be generated by fetch_wallet_data.py for a real wallet.
```

---

### 6. **Code**
```python
import json
from pathlib import Path

input_path = Path('../agent_input.json') if not Path('agent_input.json').exists() else Path('agent_input.json')
if not input_path.exists():
    raise FileNotFoundError('agent_input.json not found. Please run fetch_wallet_data.py first.')

with open(input_path, 'r') as f:
    agent_input = json.load(f)

# Show the loaded data
import pprint
pprint.pprint(agent_input)
```

---

### 7. **Markdown**
```markdown
## Edit the instruction (if needed)
You can change the instruction for the LLM here.
```

---

### 8. **Code**
```python
# Edit the instruction
agent_input['instruction'] = "Provide a comprehensive summary of this wallet's trading activity, performance, and any notable behavioral or risk patterns."
print('Instruction set to:', agent_input['instruction'])
```

---

### 9. **Markdown**
```markdown
## Build the LLM prompt
This cell formats the agent input for the LLM.
```

---

### 10. **Code**
```python
def build_prompt(agent_input):
    prompt = (
        f"Wallet Analysis Request:\n"
        f"Instruction: {agent_input['instruction']}\n"
        f"Summary: {json.dumps(agent_input['summary'], indent=2)}\n"
        f"PNL Overview: {json.dumps(agent_input['pnl_overview'], indent=2)}\n"
        f"Behavior: {json.dumps(agent_input['behavior'], indent=2)}\n"
        f"Token Performance: {json.dumps(agent_input['token_performance'], indent=2)}\n"
    )
    if agent_input.get('similarity'):
        prompt += f"Similarity: {json.dumps(agent_input['similarity'], indent=2)}\n"
    return prompt

prompt = build_prompt(agent_input)
print(prompt[:1000] + ('...\n' if len(prompt) > 1000 else ''))
```

---

### 11. **Markdown**
```markdown
## Call OpenAI and display the result
This cell sends the prompt to GPT-4 and prints the response.
```

---

### 12. **Code**
```python
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a crypto wallet analysis expert."},
        {"role": "user", "content": prompt}
    ],
    max_tokens=800,
    temperature=0.7,
)
print(response['choices'][0]['message']['content'])
```

---

### 13. **Markdown**
```markdown
## Next steps and extension ideas
- Try different instructions and prompt formats.
- Add structured output parsing (e.g., ask LLM to return JSON).
- Analyze each section separately and then synthesize.
- Integrate with your dashboard or API.
```

---

**How to use:**  
- In VSCode or Jupyter, click “+ Code” or “+ Markdown” to add a new cell of the correct type.
- Paste the content above into each cell, in order.

Let me know if you want the full .ipynb JSON for direct import!

In [None]:
# Install dependencies if needed
%pip install openai requests pydantic anthropic

## Set your OpenAI API key securely
You can use an environment variable or input() for security.

In [None]:
import os
import openai
from anthropic import Anthropic

# Set up API keys
openai_key = "sk-your-openai-api-key-here"  # Replace with your OpenAI key
anthropic_key = "sk-ant-your-anthropic-key-here"  # Replace with your Anthropic key

print(f"✓ OpenAI API key set")
print(f"✓ Anthropic API key set")

openai.api_key = openai_key

## Load agent input data
This should be generated by fetch_wallet_data.py for a real wallet.

In [7]:
import json
import os
from pathlib import Path

# Load the wallet data (using the correct filename)
input_path = Path('../agent_input_gake.json') if not Path('agent_input_gake.json').exists() else Path('agent_input_gake.json')
if not input_path.exists():
    raise FileNotFoundError('agent_input_gake.json not found. Please run fetch_wallet_data.py first.')

with open(input_path, 'r') as f:
    agent_input = json.load(f)

# Show the loaded data
import pprint
print("Loaded wallet data:")
pprint.pprint(agent_input)

Loaded wallet data:
{'behavior': {'active_trading_periods': {'activity_focus_score': 28.26486817903127,
                                         'hourly_trade_counts': {'0': 81,
                                                                 '1': 123,
                                                                 '10': 106,
                                                                 '11': 66,
                                                                 '12': 51,
                                                                 '13': 40,
                                                                 '14': 58,
                                                                 '15': 81,
                                                                 '16': 112,
                                                                 '17': 47,
                                                                 '18': 50,
                                                                 '19':

## Load the 2 prompt templates
Load the 2 different wallet analysis prompts for testing.

In [None]:
# Load only the two good prompt files
prompt_files = {
    'gemini_v2': '../wallet_analysis_prompt_v2_gemini.md',
    'sonnet_4': '../wallet_analysis_prompt_v2_sonnet_4.md'
}

prompts = {}
for name, file_path in prompt_files.items():
    with open(file_path, 'r', encoding='utf-8') as f:
        prompts[name] = f.read()
    print(f"✓ Loaded {name}")

print(f"\n✅ Loaded {len(prompts)} prompts ready for testing with GPT-4 and Claude")

## Build the LLM prompt
This cell formats the agent input for the LLM.

In [9]:
# Prepare wallet data as JSON for the prompts
wallet_json = json.dumps(agent_input, indent=2)
print(f"Wallet data prepared: {len(wallet_json)} characters")
print("Ready to test all prompts with this data")

Wallet data prepared: 13285 characters
Ready to test all prompts with this data


## Call OpenAI and display the result
This cell tests all three prompts and displays their responses.

In [None]:
from openai import OpenAI
from anthropic import Anthropic
import time

# Initialize clients
openai_client = OpenAI(api_key=openai_key)
anthropic_client = Anthropic(api_key=anthropic_key)

# Test matrix: 2 prompts × 2 models = 4 combinations
models = {
    'gpt4': {'client': openai_client, 'model': 'gpt-4', 'type': 'openai'},
    'claude': {'client': anthropic_client, 'model': 'claude-3-5-sonnet-20241022', 'type': 'anthropic'}
}

# Collect all results for assessment
analysis_results = {}

for prompt_name, prompt_text in prompts.items():
    for model_name, model_config in models.items():
        test_key = f"{prompt_name}_{model_name}"
        print(f"\n{'='*60}")
        print(f"TESTING: {prompt_name.upper()} + {model_name.upper()}")
        print(f"{'='*60}")
        
        # Prepare the full prompt with wallet data
        if "PASTE WALLET JSON OBJECT HERE" in prompt_text:
            full_prompt = prompt_text.replace("<PASTE WALLET JSON OBJECT HERE>", wallet_json)
        elif "<PASTE ONE OR MORE WALLET JSON OBJECTS>" in prompt_text:
            full_prompt = prompt_text.replace("<PASTE ONE OR MORE WALLET JSON OBJECTS>", wallet_json)
        else:
            full_prompt = f"{prompt_text}\n\nHere is the wallet data:\n```json\n{wallet_json}\n```"
        
        try:
            if model_config['type'] == 'openai':
                response = model_config['client'].chat.completions.create(
                    model=model_config['model'],
                    messages=[
                        {"role": "system", "content": "You are an expert cryptocurrency analyst."},
                        {"role": "user", "content": full_prompt}
                    ],
                    max_tokens=1500,
                    temperature=0.7,
                )
                content = response.choices[0].message.content
                tokens = response.usage.total_tokens
                
            else:  # Anthropic
                response = model_config['client'].messages.create(
                    model=model_config['model'],
                    max_tokens=1500,
                    temperature=0.7,
                    system="You are an expert cryptocurrency analyst.",
                    messages=[{"role": "user", "content": full_prompt}]
                )
                content = response.content[0].text
                tokens = response.usage.input_tokens + response.usage.output_tokens
            
            # Store result for assessment
            analysis_results[test_key] = {
                'prompt_name': prompt_name,
                'model_name': model_name,
                'content': content,
                'tokens': tokens,
                'timestamp': datetime.now().isoformat()
            }
            
            print(content)
            print(f"\n💰 Tokens used: {tokens}")
            
        except Exception as e:
            print(f"❌ Error: {str(e)}")
            analysis_results[test_key] = {
                'prompt_name': prompt_name,
                'model_name': model_name,
                'content': f"ERROR: {str(e)}",
                'tokens': 0,
                'timestamp': datetime.now().isoformat()
            }
        
        time.sleep(2)  # Slightly longer pause between calls

print(f"\n✅ Completed {len(analysis_results)} analyses - ready for quality assessment")

## Next steps and extension ideas
- Compare which prompt gives the most actionable insights
- Try different instructions and prompt formats
- Add structured output parsing (e.g., ask LLM to return JSON)
- Analyze each section separately and then synthesize
- Integrate with your dashboard or API

## Automated Quality Assessment
Automatically evaluate and score the quality of each analysis output.

In [None]:
import re
import json
from datetime import datetime
from collections import Counter

def assess_analysis_quality(analysis_text, wallet_data):
    """
    Automatically assess the quality of wallet analysis output
    Returns a comprehensive quality score and breakdown
    """
    scores = {}
    
    # 1. DATA INTEGRATION SCORE - How well does it use the actual wallet data?
    key_metrics = [
        'win_rate', 'realized_pnl', 'unrealized_pnl', 'total_trade_count',
        'unique_tokens_traded', 'average_transaction_value', 'session_count',
        'hourly_trade_counts', 'days_active', 'reentry_rate'
    ]
    
    data_usage = sum(1 for metric in key_metrics if metric.lower().replace('_', ' ') in analysis_text.lower())
    scores['data_integration'] = min(data_usage / len(key_metrics) * 100, 100)
    
    # 2. INSIGHT DEPTH SCORE - Beyond just stating numbers
    insight_keywords = [
        'strategy', 'behavior', 'pattern', 'timing', 'conviction', 'discipline',
        'risk management', 'edge', 'approach', 'psychology', 'methodology'
    ]
    
    insight_count = sum(1 for keyword in insight_keywords if keyword in analysis_text.lower())
    scores['insight_depth'] = min(insight_count / len(insight_keywords) * 100, 100)
    
    # 3. ACTIONABILITY SCORE - Practical recommendations
    action_patterns = [
        r'should (consider|focus|avoid|implement|diversify|leverage)',
        r'recommend[s]?.*to',
        r'(next steps?|moving forward|going forward)',
        r'(opportunity|risk).*to',
        r'could (improve|enhance|benefit)'
    ]
    
    actionable_items = sum(1 for pattern in action_patterns 
                          if re.search(pattern, analysis_text.lower()))
    scores['actionability'] = min(actionable_items * 25, 100)
    
    # 4. SPECIFICITY SCORE - Concrete vs generic analysis
    # Check for specific token mentions, time periods, percentages
    specific_tokens = len(re.findall(r'[A-Z]{2,10}(?:\s+[A-Za-z]+)*', analysis_text))
    specific_numbers = len(re.findall(r'\d+\.?\d*%|\d+\.?\d*\s*(SOL|hours?|days?)', analysis_text))
    time_references = len(re.findall(r'(UTC|morning|evening|hours?|0-4|Asian|European|American)', analysis_text))
    
    specificity = (specific_tokens * 2 + specific_numbers * 3 + time_references * 5)
    scores['specificity'] = min(specificity, 100)
    
    # 5. CLASSIFICATION ACCURACY - Challenges automated labels appropriately
    confidence_mentions = len(re.findall(r'confidence.*score', analysis_text.lower()))
    correction_patterns = [
        'corrected classification', 'reclassified', 'incorrect.*label',
        'challenge.*classification', 'automated.*label.*wrong'
    ]
    corrections = sum(1 for pattern in correction_patterns 
                     if re.search(pattern, analysis_text.lower()))
    
    scores['classification_accuracy'] = min((confidence_mentions * 20 + corrections * 40), 100)
    
    # 6. READABILITY SCORE - Natural vs robotic language  
    # Penalize excessive technical jargon
    technical_terms = [
        'percent_of_value_in_current_holdings', 'buy_sell_symmetry',
        'trading_time_distribution', 'averageTransactionValueSol'
    ]
    
    jargon_penalty = sum(5 for term in technical_terms if term in analysis_text)
    
    # Reward natural language patterns
    natural_patterns = [
        r'this trader (appears|seems|demonstrates|shows)',
        r'(suggests|indicates|reveals) that',
        r'what.*(tells us|shows|means)',
        r'in other words',
        r'essentially'
    ]
    
    natural_language = sum(1 for pattern in natural_patterns 
                          if re.search(pattern, analysis_text.lower()))
    
    scores['readability'] = max(100 - jargon_penalty + natural_language * 10, 0)
    
    # 7. COMPREHENSIVE SCORE - Covers multiple dimensions
    analysis_sections = [
        'performance', 'risk', 'behavior', 'strategy', 'recommendation',
        'strength', 'vulnerability', 'timing', 'portfolio'
    ]
    
    coverage = sum(1 for section in analysis_sections 
                  if section in analysis_text.lower())
    scores['comprehensiveness'] = min(coverage / len(analysis_sections) * 100, 100)
    
    # OVERALL QUALITY SCORE (weighted average)
    weights = {
        'data_integration': 0.20,
        'insight_depth': 0.20,
        'actionability': 0.15,
        'specificity': 0.15,
        'classification_accuracy': 0.10,
        'readability': 0.10,
        'comprehensiveness': 0.10
    }
    
    overall_score = sum(scores[metric] * weight for metric, weight in weights.items())
    
    return {
        'overall_score': round(overall_score, 1),
        'breakdown': {k: round(v, 1) for k, v in scores.items()},
        'analysis_length': len(analysis_text),
        'timestamp': datetime.now().isoformat()
    }

# Test the assessment function (we'll use this after running the analyses)
print("✅ Quality assessment function ready")

In [None]:
# Run quality assessment on all analyses
quality_scores = {}

print("="*70)
print("AUTOMATED QUALITY ASSESSMENT RESULTS")
print("="*70)

for test_key, result in analysis_results.items():
    if not result['content'].startswith('ERROR:'):
        score = assess_analysis_quality(result['content'], agent_input)
        quality_scores[test_key] = score
        
        print(f"\n🎯 {test_key.upper()}")
        print(f"   Overall Score: {score['overall_score']}/100")
        print(f"   📊 Data Integration: {score['breakdown']['data_integration']}")
        print(f"   🧠 Insight Depth: {score['breakdown']['insight_depth']}")
        print(f"   ⚡ Actionability: {score['breakdown']['actionability']}")
        print(f"   🎯 Specificity: {score['breakdown']['specificity']}")
        print(f"   ✅ Classification Accuracy: {score['breakdown']['classification_accuracy']}")
        print(f"   📖 Readability: {score['breakdown']['readability']}")
        print(f"   📋 Comprehensiveness: {score['breakdown']['comprehensiveness']}")
        print(f"   📝 Length: {score['analysis_length']} chars")

# Find the best performing combination
if quality_scores:
    best_combo = max(quality_scores.keys(), key=lambda k: quality_scores[k]['overall_score'])
    best_score = quality_scores[best_combo]['overall_score']
    
    print(f"\n🏆 WINNER: {best_combo.upper()} with {best_score}/100")
    
    # Save detailed results to file
    detailed_results = {
        'wallet_data': agent_input,
        'analysis_results': analysis_results,
        'quality_scores': quality_scores,
        'best_combination': {
            'name': best_combo,
            'score': best_score,
            'content': analysis_results[best_combo]['content']
        },
        'timestamp': datetime.now().isoformat()
    }
    
    # Save to timestamped file
    filename = f"wallet_analysis_comparison_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
    with open(filename, 'w', encoding='utf-8') as f:
        json.dump(detailed_results, f, indent=2, ensure_ascii=False)
    
    print(f"\n💾 Detailed results saved to: {filename}")
else:
    print("\n❌ No successful analyses to assess")