# Patent Applicants and Technology Distribution in Germany by Landkreis

This notebook analyzes patent applicants and technology distributions across Germany's NUTS Level 3 regions (Landkreise). 

Using SQL queries with the PATSTAT database, the notebook demonstrates how to:
1. Extract patent data at federal state and district levels.
2. Map NUTS codes to region names.
3. Add CPC subclass titles for better insights.
4. Visualize the results interactively with **Pygwalker**.

The ultimate goal is to refactor the process into a clean, modular Python class to present it at the Patent Knowledge Forum 2024.

## Step 1: Setup and Import Libraries

This step imports all the necessary libraries for:
- Querying the PATSTAT database.
- Handling data with Pandas.
- Visualizing data using Pygwalker.
- Parsing XML to extract CPC subclass titles.

### Key Libraries
- **Pandas**: For data manipulation.
- **SQLAlchemy**: To handle database queries.
- **Pygwalker**: For interactive visualization.
- **lxml**: For XML parsing (to extract CPC subclass titles).
- **Geopandas**: Optional for geographical mapping.


In [None]:
# Import pandas libraries for data frame handling
# Import time library for measuring sql execution time
import time

# Import Geopandas for mapping if needed later
import geopandas as gpd
import pandas as pd

# Import pygwalker library for vizualisation
import pygwalker as pyg

# Import the EPO library module for PATSTAT
from epo.tipdata.patstat import PatstatClient

# Import xml lib for IPC sub group labels
from lxml import etree as ET

# Import sql library for easy sql execution
from sqlalchemy import create_engine, func
from sqlalchemy.sql import literal_column

The PATSTAT client is instantiated to connect to the test or production environment.

In [None]:
# Intantiate the client objects with reduced data set with TEST or the full dataset with PRDOD
#patstat = PatstatClient(env="TEST")
patstat = PatstatClient(env='PROD')

# Instantiate the ORM
db = patstat.orm()

# import all the tables we need
from epo.tipdata.patstat.database.models import (
    TLS201_APPLN,
    TLS202_APPLN_TITLE,
    TLS206_PERSON,
    TLS207_PERS_APPLN,
    TLS224_APPLN_CPC,
    TLS231_INPADOC_LEGAL_EVENT,
)

## Step 2: Develop Initial SQL Queries

### Test Query 1: Granted Applications Filed at EPO in 2010
This query retrieves a list of granted applications filed at the European Patent Office (EPO) in the year 2010. 

It showcases:
- Filtering by filing year.
- Filtering by application authority (`EP` for European Patent).
- Retrieving only granted applications.

The goal is to ensure the PATSTAT connection and ORM setup are working correctly.

In [None]:
# Test query 1

# Start the timer
start_time = time.time()

q = db.query(
    TLS201_APPLN.appln_id,
    TLS201_APPLN.appln_auth,
    TLS201_APPLN.appln_nr,
    TLS201_APPLN.appln_kind,
    TLS201_APPLN.appln_filing_date,
).filter(
    TLS201_APPLN.appln_filing_year == 2010,
    TLS201_APPLN.appln_auth == "EP",
    TLS201_APPLN.granted == "Y",
)

df = patstat.df(q)

# Stop the timer
end_time = time.time()

# Calculate and print the execution time
execution_time = end_time - start_time
print(f"Query execution time: {execution_time:.2f} seconds")

# Display the first few rows of the DataFrame
df

### Test Query 2: Hitlist of Chinese Applicants at EPO

This query creates a hitlist of Chinese applicants who filed patents at the EPO. It demonstrates:
- Joining tables to link applicants with their filings.
- Filtering for Chinese applicants (`person_ctry_code = 'CN'`).
- Grouping by applicant name to count their applications.
- Ordering by the number of applications filed.

In [None]:
# Test query 2

# Start the timer
start_time = time.time()

q = (
    db.query(
        TLS206_PERSON.psn_name,
        TLS206_PERSON.person_ctry_code,
        func.count(TLS201_APPLN.appln_id).label("APPLICATIONS_AT_EPO"),
    )
    .select_from(TLS206_PERSON)
    .join(TLS207_PERS_APPLN)
    .join(TLS201_APPLN)
    .filter(
        TLS206_PERSON.person_ctry_code == "CN",
        TLS207_PERS_APPLN.applt_seq_nr > 0,
        TLS207_PERS_APPLN.invt_seq_nr == 0,
        TLS201_APPLN.appln_auth == "EP",
    )
    .group_by(TLS206_PERSON.psn_name, TLS206_PERSON.person_ctry_code)
    .order_by(func.count(TLS201_APPLN.appln_id).desc())
    .limit(100)
)
df = patstat.df(q)

# Stop the timer
end_time = time.time()

# Calculate and print the execution time
execution_time = end_time - start_time
print(f"Query execution time: {execution_time:.2f} seconds")

# Display the first few rows of the DataFrame
df

## Step 3: Co-Develop a Query with Gen AI Chatbot

Using PATSTAT, this step introduces a SQL query to analyze patent applicants and technologies at the district level in Germany (NUTS Level 3). 

### Key Highlights:
1. Group by NUTS Level 3 (`Landkreis`) to map applicant activity across districts.
2. Add **technology fields** by including CPC subclass codes.
3. Visualize the results using Pygwalker for interactive exploration.

The query is refined step-by-step to include:
- Applicant names and NUTS codes.
- Application counts grouped by technology fields.

In [None]:
# Query patent applicants and technology distribution with filing year and grant status
## extract the NUTS Level 1 = Federal State
## extract the CPC sub classes (CPC hierachy level 3: e.g. B66B = )

#### Start the timer
start_time = time.time()

q = (
    db.query(
        TLS206_PERSON.person_name.label("applicant"),
        TLS206_PERSON.nuts.label("nuts_code"),
        literal_column("SUBSTR(nuts, 1, 3)").label(
            "federal_state_code"
        ),  # Federal state (NUTS Level 1)
        TLS224_APPLN_CPC.cpc_class_symbol.label("technology_field"),  # Technology field
        literal_column("SUBSTR(cpc_class_symbol, 1, 4)").label("cpc_subclass"),
        TLS201_APPLN.appln_filing_year.label("filing_year"),  # Filing year
        TLS201_APPLN.granted.label("granted"),  # Grant status
        func.count(TLS201_APPLN.appln_id).label("appln_count"),  # Application count
    )
    .select_from(TLS206_PERSON)
    .join(TLS207_PERS_APPLN, TLS206_PERSON.person_id == TLS207_PERS_APPLN.person_id)
    .join(TLS201_APPLN, TLS207_PERS_APPLN.appln_id == TLS201_APPLN.appln_id)
    .join(TLS224_APPLN_CPC, TLS201_APPLN.appln_id == TLS224_APPLN_CPC.appln_id)
    .filter(
        TLS206_PERSON.nuts.startswith("DE"),  # Filter for Germany NUTS code
        TLS206_PERSON.nuts_level == 3,  # Limit to NUTS level 3
    )
    .group_by(
        TLS201_APPLN.appln_filing_year,  # Group by filing year
        TLS206_PERSON.nuts,  # Group by NUTS Level 3 code
        literal_column("SUBSTR(nuts, 1, 3)"),  # Group by federal state code
        TLS224_APPLN_CPC.cpc_class_symbol,  # Group by technology field
        TLS206_PERSON.person_name,  # Group by person name
        TLS201_APPLN.granted,  # Group by grant status
    )
    .order_by(TLS206_PERSON.nuts)
)  # , TLS201_APPLN.appln_filing_year)

# Execute the query
df = patstat.df(q)

### Stop the timer
end_time = time.time()

# Calculate and print the execution time
execution_time = end_time - start_time
print(f"Query execution time: {execution_time:.2f} seconds")

# Display the first few rows of the DataFrame
df

## Step 4: Add Regional Mappings (NUTS Codes to Names)

To make the data more readable:
1. Load a mapping CSV file from Eurostat containing NUTS codes and corresponding names.
2. Extract mappings for:
   - **Federal States (NUTS Level 1)**: e.g., "Baden-Württemberg."
   - **Districts (NUTS Level 3)**: e.g., "Stuttgart, Stadtkreis."
3. Apply these mappings to the query results using Pandas.

This step ensures that the visualization contains user-friendly names instead of raw codes.

In [None]:
# Add mapping for Bundesland NUTS Code 1 and Landkreis NUTS Code 3 with a mapping CSV from EUROSTAT

## Load the prepared CSV file
nuts_mapping = pd.read_csv("./mappings/nuts_mapping.csv", delimiter=",")

## Create separate mappings for federal states and districts
federal_state_mapping = (
    nuts_mapping[nuts_mapping["LEVEL"] == 1]
    .set_index("NUTS_ID")["NAME_LATIN"]
    .to_dict()
)
landkreis_mapping = (
    nuts_mapping[nuts_mapping["LEVEL"] == 3]
    .set_index("NUTS_ID")["NAME_LATIN"]
    .to_dict()
)

## Map federal states (NUTS Level 1)
df["federal_state_name"] = df["nuts_code"].str[:3].map(federal_state_mapping)

## Map Landkreise (NUTS Level 3)
df["landkreis_name"] = df["nuts_code"].map(landkreis_mapping)

# Display the first few rows of the DataFrame for checking
df

## Step 5: Map CPC Subclasses to Titles

To enrich the technology field analysis:
1. Use the IPC XML scheme provided by WIPO to extract titles for CPC subclasses.
2. Parse the XML to extract:
   - Subclass symbols (e.g., `B66B`).
   - Corresponding titles (e.g., "Elevators and Lifts").
3. Map these titles to the query results.

The result is a dataset with meaningful technology labels, enabling more insightful analysis.

In [None]:
# Start measuring time
start = time.time()

# File path to the IPC XML
filename = "./mappings/EN_ipc_scheme_20210101.xml"

# Define the namespace and parser
ipc_namespace = "{http://www.wipo.int/classifications/ipc/masterfiles}"
ipcEntry = f"{ipc_namespace}ipcEntry"
text_body = f"{ipc_namespace}textBody"
title_part = f"{ipc_namespace}titlePart"
text = f"{ipc_namespace}text"
parser = ET.XMLParser(remove_blank_text=True)

# Parse the XML file
tree = ET.parse(filename, parser=parser)
root = tree.getroot()

# Initialize dictionary for sub-class mapping
sub_class_mapping = {}

# Iterate through the XML to extract sub-class information
for element in root.iter(ipcEntry):
    if element.attrib.get("kind") == "u":  # Focus on sub-classes
        symbol = element.attrib.get("symbol")  # Extract sub-class symbol

        # Locate the title text within the nested structure
        text_element = element.find(f".//{text_body}//{title_part}//{text}")
        title = text_element.text.strip() if text_element is not None else "No Title"

        sub_class_mapping[symbol] = title

# Print a sample of the extracted data
# for symbol, title in list(sub_class_mapping.items())[:20]:
#    print(f"{symbol}: {title}")

# Print execution time
print(
    f"Extracted {len(sub_class_mapping)} sub-classes in {(time.time() - start) * 1000:.0f} ms."
)

# execute the mapping
df["cpc_subclass_title"] = df["cpc_subclass"].map(sub_class_mapping)

# Display the first few rows of the DataFrame
df

## Step 6: Refactor the Notebook into a Python Class

With all components tested, the process is refactored into a modular Python class called `PatentDataProcessor`.

### Key Features:
1. **Class-Based Design**:
   - Encapsulates the entire process (querying, mapping, visualization).
2. **Modular Methods**:
   - `load_nuts_mapping`: Load and apply NUTS regional mappings.
   - `query_patent_data`: Execute the main SQL query.
   - `load_ipc_scheme`: Parse XML for CPC subclass titles.
   - `process_data`: Integrate mappings into the dataset.
   - `visualize_data`: Launch Pygwalker for visualization.
3. **Parameterization**:
   - Paths to mapping files and the PATSTAT environment are configurable.

This structure makes the code reusable, maintainable, and easy to execute outside the notebook.

### Define and execute the Class

In [None]:
import time
import os

import geopandas as gpd
import pandas as pd
import pygwalker as pyg

from epo.tipdata.patstat import PatstatClient
from epo.tipdata.patstat.database.models import (
    TLS201_APPLN,
    TLS206_PERSON,
    TLS207_PERS_APPLN,
    TLS224_APPLN_CPC,
)

from lxml import etree as ET
from sqlalchemy import func, text

class PatentDataProcessor:
    def __init__(
        self,
        patstat_env="PROD",
        nuts_mapping_path="./mappings/nuts_mapping.csv",
        ipc_scheme_path="./mappings/EN_ipc_scheme_20210101.xml",
        ipc_namespace="{http://www.wipo.int/classifications/ipc/masterfiles}",
    ):
        self.patstat = PatstatClient(env=patstat_env)
        self.db = self.patstat.orm()
        self.nuts_mapping_path = nuts_mapping_path
        self.ipc_scheme_path = ipc_scheme_path
        self.ipc_namespace = ipc_namespace
        self.nuts_mapping = None
        self.sub_class_mapping = {}

    def load_nuts_mapping(self):
        print("Loading NUTS mapping...")
        self.nuts_mapping = pd.read_csv(self.nuts_mapping_path, delimiter=",")
        self.federal_state_mapping = (
            self.nuts_mapping[self.nuts_mapping["LEVEL"] == 1]
            .set_index("NUTS_ID")["NAME_LATIN"]
            .to_dict()
        )
        self.landkreis_mapping = (
            self.nuts_mapping[self.nuts_mapping["LEVEL"] == 3]
            .set_index("NUTS_ID")["NAME_LATIN"]
            .to_dict()
        )
        
    def query_patent_data(self):
        # Query patent data using raw SQL.
        print("Querying patent data with raw SQL...")
        start = time.time()
    
        # Define the raw SQL query
        sql_query = """
            SELECT
                tls206_person.person_name AS applicant,
                tls206_person.nuts AS nuts_code,
                tls201_appln.appln_filing_year AS filing_year,
                tls224_appln_cpc.cpc_class_symbol AS cpc_subclass,
                COUNT(DISTINCT tls201_appln.appln_id) AS appln_count
            FROM
                tls201_appln
            INNER JOIN tls207_pers_appln ON tls201_appln.appln_id = tls207_pers_appln.appln_id
            INNER JOIN tls206_person ON tls207_pers_appln.person_id = tls206_person.person_id
            INNER JOIN tls224_appln_cpc ON tls201_appln.appln_id = tls224_appln_cpc.appln_id
            WHERE
                tls206_person.nuts LIKE 'DE%' AND
                tls206_person.nuts_level = 3 AND
                tls201_appln.appln_filing_year >= EXTRACT(YEAR FROM CURRENT_DATE()) - 10
            GROUP BY
                tls206_person.person_name,
                tls206_person.nuts,
                tls201_appln.appln_filing_year,
                tls224_appln_cpc.cpc_class_symbol
            ORDER BY
                tls206_person.nuts, tls201_appln.appln_filing_year, appln_count DESC;
        """
        # Use self.db.bind to access the engine
        engine = self.db.bind
    
        # Execute the raw SQL query
        with engine.connect() as connection:
            result = connection.execute(text(sql_query))
    
        # Convert the result to a DataFrame
        rows = result.fetchall()
        columns = result.keys()
        df = pd.DataFrame(rows, columns=columns)
    
        print(f"Query execution time: {time.time() - start:.2f} seconds")
        return df
        
    def load_ipc_scheme(self):
        # Load CPC or IPC scheme for CPC subclass titles.
        # Note:
        # - This method currently uses the IPC scheme.
        # - Some CPC-specific subclasses (e.g., Y02 series) may not be covered.
        # - To achieve full coverage, switch to the CPC scheme from USPTO/EPO resources.
                    
        print("Loading IPC scheme...")
        start = time.time()
    
        ipc_namespace = self.ipc_namespace
        ipcEntry = f"{ipc_namespace}ipcEntry"
        text_body = f"{ipc_namespace}textBody"
        title_part = f"{ipc_namespace}titlePart"
        text = f"{ipc_namespace}text"
    
        parser = ET.XMLParser(remove_blank_text=True)
        tree = ET.parse(self.ipc_scheme_path, parser=parser)
        root = tree.getroot()
    
        for element in root.iter(ipcEntry):
            if element.attrib.get("kind") == "u":  # Subclass kind is "u"
                symbol = element.attrib.get("symbol")
                text_element = element.find(f".//{text_body}//{title_part}//{text}")
                title = text_element.text.strip() if text_element is not None else "No Title"
                self.sub_class_mapping[symbol] = title
    
        # Log a sample of loaded subclass titles
        # print(f"Sample CPC Subclass Titles: {list(self.sub_class_mapping.items())[:5]}")
        
        print(f"Loaded {len(self.sub_class_mapping)} IPC subclasses in {(time.time() - start):.2f} seconds.")

    def process_data(self, df):
        print("Processing data...")
        start = time.time()
    
        # Add federal state codes and names
        df["federal_state_code"] = df["nuts_code"].str[:3]
        df["federal_state_name"] = df["federal_state_code"].map(self.federal_state_mapping)
        df["landkreis_name"] = df["nuts_code"].map(self.landkreis_mapping)
    
        # Normalize CPC subclass
        if "cpc_subclass" in df.columns:
            df["normalized_cpc_subclass"] = df["cpc_subclass"].str[:4]
    
            # Map CPC titles
            df["cpc_subclass_title"] = df["normalized_cpc_subclass"].map(self.sub_class_mapping)
    
            # Combine subclass and title into one column
            df["cpc_combined"] = df.apply(
                lambda row: f"{row['normalized_cpc_subclass']} - {row['cpc_subclass_title']}"
                if pd.notna(row['cpc_subclass_title']) else row['normalized_cpc_subclass'],
                axis=1
            )
        else:
            print("Warning: 'cpc_subclass' column is missing. CPC subclass titles will not be added.")
    
        print(f"Processing time: {time.time() - start:.2f} seconds")
        return df

    def save_data(self, df, file_path, file_format="csv"):
        
        print("Saving data to disk...")
        
        start = time.time()
        
        supported_formats = ["csv", "excel", "json"]
        
        if file_format not in supported_formats:
            raise ValueError(f"Unsupported file format: {file_format}. Supported formats are {supported_formats}.")

        os.makedirs(os.path.dirname(file_path), exist_ok=True)
        
        if file_format == "csv":
            df.to_csv(file_path, index=False)
        elif file_format == "excel":
            df.to_excel(file_path, index=False, engine="openpyxl")
        elif file_format == "json":
            df.to_json(file_path, orient="records")
            
        print(f"Data saved successfully to {file_path} in {file_format.upper()} format in: {time.time() - start:.2f} seconds")
    
    def visualize_data(self, df):
        
        print("Launching visualization...")

        df
        
        pyg.walk(df)

if __name__ == "__main__":
    
    # Instantiate and execute the workflow
    processor = PatentDataProcessor(patstat_env="PROD")
    
    # Step 1: Load mappings
    processor.load_nuts_mapping()
    processor.load_ipc_scheme()
    
    # Step 2: Query data
    patent_data = processor.query_patent_data()
    
    # Step 3: Process data
    processed_data = processor.process_data(patent_data)
    
    # Step 4: Save the processed data
    processor.save_data(processed_data, file_path="./output/patent_data.csv", file_format="csv")
    
    # Step 5: Visualize the data (optional)
    processor.visualize_data(processed_data)


Loading NUTS mapping...
Loading IPC scheme...
Loaded 646 IPC subclasses in 0.24 seconds.
Querying patent data with raw SQL...
Query execution time: 40.15 seconds
Processing data...
Processing time: 18.97 seconds
Saving data to disk...
Data saved successfully to ./output/patent_data.csv in CSV format in: 11.47 seconds
Launching visualization...


Box(children=(HTML(value='\n<div id="ifr-pyg-0006280d1f8e74eau4F5VjCzfrgR0EPU" style="height: auto">\n    <hea…

# Additional Documentation: Analyzing the SQL Query in `PatentDataProcessor`

This guide breaks down the SQL query used in the `query_patent_data` method of the `PatentDataProcessor` class. The query retrieves and joins data from PATSTAT tables to analyze patent applicants, their geographical locations, and associated technologies.


```SQL
            SELECT
                tls206_person.person_name AS applicant,
                tls206_person.nuts AS nuts_code,
                tls201_appln.appln_filing_year AS filing_year,
                tls201_appln.granted,
                tls224_appln_cpc.cpc_class_symbol AS technology_field,
                COUNT(tls201_appln.appln_id) AS appln_count
            FROM
                tls201_appln
            INNER JOIN tls207_pers_appln ON tls201_appln.appln_id = tls207_pers_appln.appln_id
            INNER JOIN tls206_person ON tls207_pers_appln.person_id = tls206_person.person_id
            INNER JOIN tls224_appln_cpc ON tls201_appln.appln_id = tls224_appln_cpc.appln_id
            WHERE
                tls206_person.nuts LIKE 'DE%' AND
                tls206_person.nuts_level = 3
                tls201_appln.appln_filing_year >= EXTRACT(YEAR FROM CURRENT_DATE) - 20
            GROUP BY
                tls201_appln.appln_filing_year,
                tls206_person.nuts,
                tls224_appln_cpc.cpc_class_symbol,
                tls206_person.person_name,
                tls201_appln.granted
            ORDER BY
                tls206_person.nuts;
```


### **Tables and Columns in Use**

The query utilizes the following PATSTAT tables:

#### 1. **`TLS206_PERSON` (Person Table)**
- Contains information about applicants, inventors, and their addresses.
- **Columns used:**
  - `person_name`: The name of the applicant or inventor.
  - `nuts`: The NUTS code (geographical region) for the person.
  - `nuts_level`: The granularity of the NUTS code (e.g., level 3 corresponds to districts or Landkreise).

#### 2. **`TLS207_PERS_APPLN` (Link Table: Person-Application)**
- Links persons (from `TLS206_PERSON`) to patent applications.
- **Columns used:**
  - `person_id`: Connects to the `TLS206_PERSON` table.
  - `appln_id`: Connects to the `TLS201_APPLN` table.

#### 3. **`TLS201_APPLN` (Application Table)**
- Contains metadata about patent applications.
- **Columns used:**
  - `appln_id`: The unique identifier for an application.
  - `appln_filing_year`: The year the application was filed.
  - `granted`: Indicates if the patent was granted (e.g., `'Y'` for yes).

#### 4. **`TLS224_APPLN_CPC` (Classification Table)**
- Contains CPC (Cooperative Patent Classification) codes for applications.
- **Columns used:**
  - `appln_id`: Connects to `TLS201_APPLN`.
  - `cpc_class_symbol`: The CPC classification code for the application.

---

### **What the Query Does**

#### **Selected Columns**
- `TLS206_PERSON.person_name` → Applicant name.
- `TLS206_PERSON.nuts` → Full NUTS code for geographical information.
- `TLS224_APPLN_CPC.cpc_class_symbol` → The CPC technology field for the application.
- `TLS201_APPLN.appln_filing_year` → The year the patent was filed.
- `TLS201_APPLN.granted` → Indicates whether the patent was granted.
- `func.count(TLS201_APPLN.appln_id)` → Counts the number of applications grouped by the selected fields.

#### **Joins**
- Links the `TLS206_PERSON` table to the application data:
  - `TLS206_PERSON` → `TLS207_PERS_APPLN` → `TLS201_APPLN` → `TLS224_APPLN_CPC`.
- This links applicants, their geographical information, and the CPC classifications.

#### **Filters**
- `TLS206_PERSON.nuts.startswith("DE")` → Restricts results to Germany.
- `TLS206_PERSON.nuts_level == 3` → Focuses on NUTS Level 3 regions (districts).

#### **Grouping**
The query groups data by:
- Filing year (`appln_filing_year`).
- Full NUTS code and federal state code (`nuts`, `SUBSTR(nuts, 1, 3)`).
- CPC class and subclass (`cpc_class_symbol`, `SUBSTR(cpc_class_symbol, 1, 4)`).
- Applicant name (`person_name`).
- Grant status (`granted`).

#### **Ordering**
- Orders results by the NUTS code for consistent geographical sorting.

---

### **Expected Output**

The query generates a dataset with the following columns:
1. **`applicant`**: Applicant name.
2. **`nuts_code`**: NUTS Level 3 region.
3. **`federal_state_code`**: Federal state derived from the NUTS code.
4. **`technology_field`**: CPC class symbol (e.g., `B66B` for "Elevators and Lifts").
5. **`cpc_subclass`**: First four characters of the CPC class symbol for further granularity.
6. **`filing_year`**: Filing year of the application.
7. **`granted`**: Grant status (`Y` or `N`).
8. **`appln_count`**: Count of applications for the grouped criteria.

---

### **Purpose of the Query**

#### **1. Geographical Analysis**
- Understand patent activity across districts (NUTS Level 3) and federal states (NUTS Level 1).

#### **2. Technological Insights**
- Identify technology trends by CPC classifications.

#### **3. Applicant Activity**
- Recognize active applicants in specific regions and technology fields.

#### **4. Visualization**
- Prepares data for creating maps and interactive charts using tools like **Pygwalker** or **Datawrapper**.