# IPC Query Tutorial

This notebook demonstrates how to use the refactored `ipc_query.py` module to search for patents using International Patent Classification (IPC) codes through the EPO OPS API.

## Overview

The IPC Query module provides a clean, object-oriented interface for:
- Searching patents by IPC codes
- Extracting structured results
- Displaying formatted output

## Setup

In [ ]:
# IPC Query Tutorial\n\nThis notebook demonstrates how to use the refactored `ipc_query.py` module to search for patents using International Patent Classification (IPC) **subclasses** through the EPO OPS API.\n\n## Important Note\n\n**This module only supports IPC subclasses (4-character codes like A61K, B66B, H01L).** Wildcards (*) are not supported due to EPO OPS API restrictions.\n\n## Overview\n\nThe IPC Query module provides a clean, object-oriented interface for:\n- Searching patents by IPC subclasses\n- Extracting structured results\n- Displaying formatted output\n\n## Valid IPC Subclass Examples\n- `A61K` - Medical preparations\n- `B66B` - Elevators, lifts, escalators\n- `H01L` - Semiconductor devices\n- `G06F` - Digital data processing\n- `C07D` - Heterocyclic compounds\n\n## Setup"

## Basic Usage

### Simple IPC Query

The simplest way to query IPC codes is using the convenience function:

In [ ]:
# Example 1: Query medical patents (A61K - preparations for medical, dental, or toilet purposes)\nresult = query_ipc(\"A61K\")  # No wildcards needed"

### Using the IPCQuery Class

For more control, you can use the `IPCQuery` class directly:

In [ ]:
# Create an IPCQuery instance\nipc_query = IPCQuery()\n\n# Search for elevator/lift patents (B66B - elevators, lifts, escalators)\nelevator_result = ipc_query.search(\"B66B\", verbose=True)"

## Working with Results

The `IPCQueryResult` object provides structured access to search results:

In [ ]:
# Access result properties\nprint(f\"Total results: {elevator_result.total_results}\")\nprint(f\"Response time: {elevator_result.response_time} seconds\")\nprint(f\"Number of patents retrieved: {len(elevator_result.patents)}\")\n\n# Display patents with custom limit\nelevator_result.display_patents(limit=5)"

## Data Analysis Examples

### Convert Results to DataFrame

For analysis, convert patent results to a pandas DataFrame:

In [ ]:
# Convert patents to DataFrame\ndf = pd.DataFrame(elevator_result.patents)\nprint(\"Patent DataFrame:\")\nprint(df.head())\n\n# Basic statistics\nprint(f\"\\nCountry distribution:\")\nprint(df['country'].value_counts())"

## Error Handling and IPC Code Validation\n\nThe refactored code includes validation for IPC codes. IPC codes follow a specific format:\n- Start with a letter A-H\n- Followed by 2 digits\n- Optionally followed by a letter\n- Can end with * for wildcard searches\n\nExamples of valid IPC codes:\n- `A61K*` (medical preparations)\n- `H01L*` (semiconductor devices)\n- `G06F*` (digital data processing)\n- `C07D*` (heterocyclic compounds)\n\nLet's see the validation in action:"

In [ ]:
## Error Handling and IPC Subclass Validation\n\nThe refactored code includes validation for IPC subclasses. IPC subclasses follow a specific format:\n- Start with a letter A-H\n- Followed by 2 digits\n- End with exactly one letter\n- Total length: 4 characters\n\nExamples of valid IPC subclasses:\n- `A61K` (medical preparations)\n- `B66B` (elevators, lifts, escalators)\n- `H01L` (semiconductor devices)\n- `G06F` (digital data processing)\n- `C07D` (heterocyclic compounds)\n\n**Note:** Wildcards (*) are automatically removed and not supported.\n\nLet's see the validation in action:"

# Test IPC subclass validation\ntest_codes = [\n    \"A61K\",     # Valid: medical preparations\n    \"B66B\",     # Valid: elevators, lifts, escalators\n    \"H01L\",     # Valid: semiconductor devices\n    \"G06F\",     # Valid: digital data processing\n    \"A61K*\",    # Valid: wildcard will be removed\n    \"A61\",      # Invalid: too short\n    \"INVALID\",  # Invalid: doesn't follow IPC format\n    \"Z99Z\",     # Invalid: Z is not a valid IPC section\n]\n\nipc_query = IPCQuery()\n\nfor code in test_codes:\n    try:\n        print(f\"Testing {code}: \", end=\"\")\n        result = ipc_query.search(code, verbose=False)\n        print(f\"✓ Valid - Found {result.total_results} results\")\n    except ValueError as e:\n        print(f\"✗ Invalid - {e}\")\n    except Exception as e:\n        print(f\"⚠ Error - {e}\")"

In [ ]:
# Define IPC subclasses for different technology areas\nipc_subclasses = {\n    \"Medical/Pharma\": \"A61K\",\n    \"Elevators/Lifts\": \"B66B\", \n    \"Semiconductors\": \"H01L\",\n    \"Computing\": \"G06F\",\n    \"Chemistry\": \"C07D\"\n}\n\n# Query each IPC subclass\ncomparison_results = {}\n\nfor tech_area, ipc_subclass in ipc_subclasses.items():\n    print(f\"\\nQuerying {tech_area} ({ipc_subclass})...\")\n    result = ipc_query.search(ipc_subclass, verbose=False)\n    comparison_results[tech_area] = {\n        'total_results': result.total_results,\n        'response_time': result.response_time,\n        'patents_retrieved': len(result.patents)\n    }\n    \n# Convert to DataFrame for easy visualization\ncomparison_df = pd.DataFrame(comparison_results).T\nprint(\"\\nComparison Results:\")\nprint(comparison_df)"

## Comparing Multiple IPC Codes

Let's compare different technology areas:

In [None]:
# Define IPC codes for different technology areas
ipc_codes = {
    "Medical/Pharma": "A61K*",
    "Biotechnology": "C12N*",
    "Semiconductors": "H01L*",
    "Telecommunications": "H04*",
    "Computing": "G06*"
}

# Query each IPC code
comparison_results = {}

for tech_area, ipc_code in ipc_codes.items():
    print(f"\nQuerying {tech_area} ({ipc_code})...")
    result = ipc_query.search(ipc_code, verbose=False)
    comparison_results[tech_area] = {
        'total_results': result.total_results,
        'response_time': result.response_time,
        'patents_retrieved': len(result.patents)
    }
    
# Convert to DataFrame for easy visualization
comparison_df = pd.DataFrame(comparison_results).T
print("\nComparison Results:")
print(comparison_df)

# Silent query (no console output)\nsilent_result = query_ipc(\"A61P\", verbose=False)  # Specific therapeutic activity\n\n# Process results programmatically\nif silent_result.patents:\n    print(f\"Found {len(silent_result.patents)} patents\")\n    print(f\"Query took {silent_result.response_time} seconds\")\n    \n    # Extract unique countries\n    countries = set(patent['country'] for patent in silent_result.patents)\n    print(f\"Countries represented: {', '.join(sorted(countries))}\")\nelse:\n    print(\"No patents found\")"

In [None]:
# Create subplots for comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Convert total_results to numeric (handle 'unbekannt' values)
comparison_df['total_results_numeric'] = pd.to_numeric(
    comparison_df['total_results'], 
    errors='coerce'
)

# Plot 1: Total Results
valid_totals = comparison_df.dropna(subset=['total_results_numeric'])
if not valid_totals.empty:
    sns.barplot(data=valid_totals.reset_index(), 
                x='total_results_numeric', 
                y='index', 
                ax=ax1)
    ax1.set_title('Total Patent Results by Technology Area')
    ax1.set_xlabel('Number of Patents')
    ax1.set_ylabel('Technology Area')

# Plot 2: Response Times
sns.barplot(data=comparison_df.reset_index(), 
            x='response_time', 
            y='index', 
            ax=ax2)
ax2.set_title('API Response Times by Technology Area')
ax2.set_xlabel('Response Time (seconds)')
ax2.set_ylabel('Technology Area')

plt.tight_layout()
plt.show()

# Example of handling potential errors\ntry:\n    # Try an invalid IPC subclass\n    invalid_result = ipc_query.search(\"INVALID\")\n    print(f\"Unexpected success: {invalid_result.total_results} results\")\nexcept ValueError as e:\n    print(f\"Expected validation error: {e}\")\nexcept Exception as e:\n    print(f\"Unexpected error: {e}\")"

In [ ]:
# Create a bar plot of countries\nplt.figure(figsize=(10, 6))\ncountry_counts = df['country'].value_counts()\nsns.barplot(x=country_counts.values, y=country_counts.index)\nplt.title(f'Patent Distribution by Country - IPC Subclass: B66B')\nplt.xlabel('Number of Patents')\nplt.ylabel('Country')\nplt.tight_layout()\nplt.show()"

## Summary\n\nThe refactored `ipc_query.py` module provides:\n\n1. **Clean Architecture**: Separate classes for queries and results\n2. **Type Safety**: Full type hints for better development experience\n3. **IPC Subclass Support**: Validates and processes 4-character IPC subclasses only\n4. **No Wildcards**: Follows EPO OPS API restrictions by removing wildcards\n5. **Error Handling**: Robust handling of API errors and validation issues\n6. **Data Integration**: Easy conversion to pandas DataFrames for analysis\n\n### Key Classes:\n- `IPCQuery`: Main class for performing searches\n- `IPCQueryResult`: Container for search results with display methods\n- `query_ipc()`: Convenience function for simple queries\n\n### Supported IPC Subclasses (Examples):\n- `A61K`: Medical/pharmaceutical preparations\n- `B66B`: Elevators, lifts, escalators\n- `H01L`: Semiconductor devices\n- `G06F`: Digital data processing\n- `C07D`: Heterocyclic compounds\n\n### Important Limitations:\n- **Only 4-character IPC subclasses supported** (e.g., A61K, not A61K*)\n- **No wildcards allowed** due to EPO OPS API restrictions\n- **Authentication required** via .env file with OPS_KEY and OPS_SECRET"

In [None]:
# Example of handling potential errors
try:
    # Try an invalid IPC code
    invalid_result = ipc_query.search("INVALID_CODE")
    print(f"Unexpected success: {invalid_result.total_results} results")
except Exception as e:
    print(f"Error occurred: {e}")
    print("This is expected for invalid IPC codes")

## Summary

The refactored `ipc_query.py` module provides:

1. **Clean Architecture**: Separate classes for queries and results
2. **Type Safety**: Full type hints for better development experience
3. **Flexibility**: Both simple function calls and detailed class-based usage
4. **Error Handling**: Robust handling of API errors and edge cases
5. **Data Integration**: Easy conversion to pandas DataFrames for analysis

### Key Classes:
- `IPCQuery`: Main class for performing searches
- `IPCQueryResult`: Container for search results with display methods
- `query_ipc()`: Convenience function for simple queries

### Common IPC Codes:
- `A61K*`: Medical/pharmaceutical preparations
- `C12N*`: Biotechnology (microorganisms, enzymes)
- `H01L*`: Semiconductor devices
- `H04*`: Electric communication technique
- `G06*`: Computing, calculating, counting