## Build a Market Research Tool with Parallel Deep Research


Deep Research is a new feature in Parallel's [Task API](https://docs.parallel.ai/task-api/task-quickstart). It allows developers to generate outputs high quality web research in a long-form format, with in-linecitations and traceability. In this cookbook, we will explain the process of using Parallel Deep Research to create a market research solution, from Task Design to implementation.

### Step 1: Prepare your Environment


Install the [Parallel Python SDK](https://pypi.org/project/parallel-web/) and set your Parallel API Key. Generate your key [here](https://platform.parallel.ai).

In [None]:
!pip install parallel-web
!pip install python-dotenv


import sys
import json
import os
import textwrap
from typing import Any, Dict, List, Optional

from getpass import getpass
from IPython.display import Markdown, display

from parallel import Parallel
from parallel.types import TaskSpecParam



# Set your API key
api_key = getpass("Enter your Parallel API key: ")
# Initialize the Parallel client
client = Parallel(api_key=api_key)

Collecting parallel-web
  Downloading parallel_web-0.2.0-py3-none-any.whl.metadata (16 kB)
Downloading parallel_web-0.2.0-py3-none-any.whl (113 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m113.5/113.5 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: parallel-web
Successfully installed parallel-web-0.2.0
Enter your Parallel API key: ··········


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Step 2: Design your Task

The Task API can be used for one of two types of workflows:

*   Enrichment (repeatable web-research based enrichments with prescribed structure)
*   Deep Research (one-off web research with long-form auto-generated structure)

In this cookbook, we are using the Task API to generate Deep Research reports specifically for market research. Market research is a crucial web research artefact for investors, product managers, marketers, and more. Report structures and preferences differ by industry; we want to build a tool that can flexibly research any industry with specific desired outputs.

For Deep Research, the input is a plain-text string. The output can be one of:
* **Text** A markdown-formatted report output with in-line citations and references listed at the base of the report.
* **JSON** An auto-generated JSON structured output with optimized fields. Each nested field will have its own [Basis](https://docs.parallel.ai/task-api/features/task-deep-research#nested-fieldbasis), with citations, reasoning and excerpts.

The available Deep Research processors include `pro`, `ultra`, `ultra2x`, `ultra4x`, and `ultra8x`. Choosing between these processors is a decision based on cost, latency and output quality. Refer to [this docs page](https://docs.parallel.ai/task-api/core-concepts/choose-a-processor) for information on each processor. For a strong market research report with medium latency and medium complexity, we can choose `ultra` in this recipe.

```
{
    "input": {
        "input": "Create a comprehensive market research report on the HVAC
        industry in the USA including an analysis of recent M&A activity and
        other relevant details."
    },
    "task_spec": {
        "output_schema": {
            "type": "text"
            // Default output schema is 'auto'
        }
    },
    "processor": "ultra"
}
```


Let's generate the input string while allowing for custom user inputs:


*   Desired market to research
*   Geography focus if any
*   Desired information in the market research report


In [None]:
def make_research_input():
    print("Welcome to the Market Research Report Assistant.")

    # Ask for industry
    industry = input("What industry are you interested in creating a market research report on? ").strip()
    while not industry:
        industry = input("Market is required. Please enter a market/industry: ").strip()

    # Ask for geography (optional)
    geography = input("Specify any preferred geography: (Press Enter to skip) ").strip()
    if not geography:
        geography = "Not specified"

    # Ask for specific details (optional)
    details = input(
        "Are there any specific details you need in the market research report? "
        "(e.g., CAGR, M&A Activity, Public company research — Press Enter to skip) "
    ).strip()
    if not details:
        details = "Not specified"

    # Combine the inputs into a plain text summary with clean line breaks
    research_input = (
        "Generate a comprehensive market research report based on the following criteria:\n\n"
        "If geography is not specified, default to a global market scope.\n"
        "Ensure the report includes key trends, risks, metrics, and major players.\n"
        "Incorporate the specific details provided when applicable.\n\n"
        f"Industry: {industry}\n"
        f"Geography: {geography}\n"
        f"Specific Details Required: {details}"
    )

    return research_input

### Step 3: Execute your Task

#### Text Output

We can now call the Task API to conduct Deep Research for any user input. For more information on how to choose a Processor, view a Processor comparison [here](https://docs.parallel.ai/task-api/core-concepts/processors).


Note: Deep Research runs can take up to 45 minutes to complete. For scale and improved experience, we recommend building with Parallel [Webhooks](https://docs.parallel.ai/features/webhooks). Example implementation provided later in this cookbook.

In [None]:
user_input = make_research_input()

# Create the task run with the input
task_run_text = client.task_run.create(
    input=user_input,
    processor="ultra",
    task_spec={
        "output_schema": {
            "type": "text",
        }
    },
)

print(task_run_text)

# Get the result
run_result_text = client.task_run.result(task_run_text.run_id)

# Print the result
print(run_result_text.output)

Welcome to the Market Research Report Assistant.
What industry are you interested in creating a market research report on? Coffee shop industry (non-chains)
Specify any preferred geography: (Press Enter to skip) United States
Are there any specific details you need in the market research report? (e.g., CAGR, M&A Activity, Public company research — Press Enter to skip) The best business models for single-location coffee shops
TaskRunTextOutput(basis=[FieldBasis(field='content', reasoning='The answer is synthesized from a wide range of industry reports and vendor data. Key market growth projections, which distinguish between the slowing overall market and the accelerating specialty coffee segment, come from a U.S. Coffee Shop Industry Market Analysis by mmcginvest.com. Data on owner sentiment, average profit margins, and primary challenges like staffing is based on "The 2025 Independent Coffee Shop Industry Report" from coffeeshopkeys.com. Startup cost estimates for different business mo

In [None]:
# Save the run_result output content as a text file
content_text = getattr(run_result_text.output, "content", "No content found.")

with open("run_result_text.txt", "w") as f:
    f.write(content_text)

print("run_result output content saved as run_result_text.txt")

run_result output content saved as run_result_text.txt


Preview the length and detail of the Task API text output.

In [None]:
def preview_text_output(output_content, preview_length=500):
    """
    Generates a preview of the text output, including the beginning and the part after "References".
    """
    if not isinstance(output_content, str):
        print("Output content is not a string.")
        return

    # Preview the first part
    initial_preview = output_content[:preview_length]
    print("--- Initial Preview ---")
    print(initial_preview)
    if len(output_content) > preview_length:
        print(f"... and {len(output_content) - preview_length} more characters")

    # Preview the part after "References"
    references_index = output_content.find("References")
    if references_index != -1:
        after_references = output_content[references_index + len("References"):].lstrip()
        reference_preview = after_references[:preview_length]
        print("\n--- Preview After 'References' ---")
        print(reference_preview)
        if len(after_references) > preview_length:
            print(f"... and {len(after_references) - preview_length} more characters")
    else:
        print("\n'References' section not found in the output.")

# Assuming run_result_text is available from a previous cell execution
if 'run_result_text' in locals() and hasattr(run_result_text.output, 'content'):
    preview_text_output(run_result_text.output.content)
else:
    print("run_result_text or its content is not available.")

--- Initial Preview ---
# Independent U.S. Coffee Shops 2025-2030: Winning Models, Margins & Minefields ## Executive Summary The U.S. independent coffee shop market is at a critical inflection point. While the overall market is decelerating towards saturation, a distinct and high-growth premium segment is creating significant opportunities for savvy entrepreneurs. Success is no longer about simply opening a coffee shop; it's about strategically selecting a business model that aligns with specific financial goals, opera
... and 30546 more characters

--- Preview After 'References' ---
1. *U.S. Coffee Shop Industry Market Analysis*. https://www.mmcginvest.com/post/u-s-coffee-shop-industry-market-analysis-navigating-maturity-margin-pressure-and-the-mandate-fo
2. *The 2025 Independent Coffee Shop Industry Report*. https://www.coffeeshopkeys.com/post/the-2025-independent-coffee-shop-industry-report
3. *How Much Do Coffee Shops Make? A 2025 Outlook*. https://www.beansandbrews.com/franchise/b

#### JSON Output

In [8]:
# Create the task run with the input
task_run_json = client.task_run.create(
    input=user_input,
    processor="ultra"
)

print(task_run_json)

# Get the result
run_result_json = client.task_run.result(task_run_json.run_id)

# Print the result
print(run_result_json.output)

TaskRunJsonOutput(basis=[FieldBasis(field='licensing_and_permitting_overview', reasoning='The Health Department Permit concept is directly addressed by excerpts describing mobile food permits and health-related licensing. One excerpt outlines a Mobile Food Vending License as a permit for mobile food operations, which is a foundational health-permitting element for coffee trucks or carts. This aligns with the field’s emphasis on local health permits required for legal operation of various coffee formats (brick-and-mortar, mobile, kiosk). Other excerpts discuss related regulatory frameworks such as plan checks and health-permitting processes for mobile facilities, which underpin the practical steps a shop must undertake to remain compliant. Several excerpts document regulatory category updates and terminology changes (for example, Compact Mobile Food Operations and Mobile Food Facilities) and note new annual permit fees and plan-check requirements, illustrating how regulatory environment

In [9]:
# Save the run_result as a JSON file
with open("run_result_json.json", "w") as f:
    json.dump(run_result_json.to_dict(), f, indent=2)

print("run_result saved as run_result_json.json")

run_result saved as run_result_json.json


Preview the length and basis of the JSON structured output.

In [10]:
content_obj = getattr(run_result_json.output, "content", None)

if content_obj:
    try:
        # Convert dict or Pydantic-like object to JSON string
        content_str = json.dumps(content_obj, indent=2)
        text_len = len(content_str)
        estimated_pages = text_len // 3000

        print(f"\nDeep Research output generated! Total characters: {text_len:,} (~{estimated_pages} pages)")

        print("Preview of Content:\n" + "-" * 50)
        print(content_str[:1000] + "...\n")
        print("-" * 50)

    except Exception as e:
        print(f"Could not serialize content: {e}")
else:
    print("No `.content` found in output.")


# Display structure of the output schema
if hasattr(run_result_json.output, "basis") and isinstance(run_result_json.output.basis, list):
    print("Structured Output Fields Extracted:\n")
    for i, field in enumerate(run_result_json.output.basis[:5]):  # preview first 5 fields
        print(f"  {i+1}. Field: {field.field}")
        if hasattr(field, 'reasoning') and field.reasoning:
            print(f"     Reasoning: {field.reasoning[:80]}...")  # truncated preview
        if hasattr(field, 'citations') and field.citations:
            print(f"     Source: {field.citations[0].url}")
            print(f"     Excerpt: {field.citations[0].excerpts}")
        print()

    if len(run_result_json.output.basis) > 5:
        print(f"  ...and {len(run_result_json.output.basis) - 5} more fields captured!\n")



Deep Research output generated! Total characters: 34,665 (~11 pages)
Preview of Content:
--------------------------------------------------
{
  "executive_summary": "The U.S. independent coffee shop market presents a landscape of dual realities for the 2025-2030 period. While the overall U.S. coffee and snack shop sector, valued at $74.3 billion in 2025, is projected to experience a sharp deceleration in growth to just 1.3% annually, the specialty coffee segment is forecast to surge with a robust 9.5% CAGR, reaching an estimated $81.8 billion by 2030. This bifurcation signals a mature general market but significant opportunities within premium niches, where independent operators are best positioned to thrive. The market remains highly fragmented, with over 50% of establishments being independently owned, despite the formidable market share of chains like Starbucks (30-40%) and Dunkin' (26%). This fragmentation fosters intense competition but also allows for unique market positioning. 

#####(Optional) Map Output Fields with Citations, Reasoning and Confidence

For `auto` schema mode, the Parallel [Basis](https://docs.parallel.ai/core-concepts/basis) object contains a mapping of each output field (including leaf-level fields) with evidence supporting it. For example, if one output schema field `industry_overview` has nested fields `description`, `growth_cagr` and `key_players`, the corresponding part of the Basis object would resemble:



```
{
      "field": "industry_overview.description",
      "citations": [
        {
          "url": "https://example.com",
          "excerpts": ["Sample excerpt..."]
        }
      ],
      "reasoning": "Sample reasoning...",
      "confidence": "high"
    },
    {
      "field": "industry_overview.growth_cagr",
      "citations": [
        {
          "url": "https://www.example.com",
          "excerpts": ["Sample excerpt..."]
        },
        {
          "url": "https://www.example2.com",
          "excerpts": ["Sample excerpt..."]
        }
      ],
      "reasoning": "Sample reasoning...",
      "confidence": "high"
    },
    {
      "field": "key_players.0",
      "citations": [
        {
          "url": "https://www.example.com",
          "excerpts": ["Excerpt..."]
        }
      ],
      "reasoning": "Excerpt...",
      "confidence": "high"
    },
    {
      "field": "key_players.1",
      "citations": [
        {
          "url": "https://www.example.com",
          "excerpts": ["Excerpt..."]
        }
      ],
      "reasoning": "Excerpt...",
      "confidence": "high"
    }
```


In Deep Research applications, in-place citations are helpful for end users. Below is a helper function that combines fields in output.content with their corresponding output.basis information.


In [11]:

def map_basis_to_object(content: Any, basis: List[Any]) -> Dict[str, Any]:
    """
    Efficiently maps a nested content object to its basis metadata, if available.
    """

    # Preprocess basis into a fast dict using dotpaths as keys
    basis_map = {b.field: b for b in basis}

    def get_basis_info(path: str):
        b = basis_map.get(path)
        if not b:
            return None
        return {
            "field": b.field,
            "reasoning": getattr(b, "reasoning", None),
            "confidence": getattr(b, "confidence", None),
            "citations": [
                {
                    "url": getattr(c, "url", None),
                    "excerpts": getattr(c, "excerpts", []),
                }
                for c in getattr(b, "citations", []) or []
            ],
        }

    def walk(node: Any, path_parts: List[str]) -> Dict[str, Any]:
        path_str = ".".join(path_parts)

        if isinstance(node, list):
            result = [
                walk(item, path_parts + [str(i)]) for i, item in enumerate(node)
            ]
        elif isinstance(node, dict):
            result = {
                k: walk(v, path_parts + [k]) for k, v in node.items()
            }
        else:
            result = node

        return {
            "value": result,
            "basis": get_basis_info(path_str)
        }

    return walk(content, [])


mapped_output = map_basis_to_object(run_result_json.output.content, run_result_json.output.basis)

print("\n Mapped Output Preview (first 3 fields):")
print("-" * 50)

for field, data in list(mapped_output.get("value", {}).items())[:5]:
    val = data.get("value") if isinstance(data, dict) else None
    val_str = str(val).replace("\n", " ") if val is not None else "None"
    val_preview = val_str[:120] + ("..." if len(val_str) > 120 else "")

    basis = data.get("basis") if isinstance(data, dict) else {}
    citations = basis.get("citations", []) if basis else []
    source = citations[0].get("url") if citations else "N/A"

    print(f"• {field}")
    print(f"  → Value: {val_preview}")
    print(f"  → Source: {source}")
    print()


 Mapped Output Preview (first 3 fields):
--------------------------------------------------
• executive_summary
  → Value: The U.S. independent coffee shop market presents a landscape of dual realities for the 2025-2030 period. While the overa...
  → Source: https://www.mmcginvest.com/post/u-s-coffee-shop-industry-market-analysis-navigating-maturity-margin-pressure-and-the-mandate-fo

• best_business_models_analysis
  → Value: The most effective business models for single-location coffee shops include (1) Coffeehouse (community-centric), appeali...
  → Source: https://pos.toasttab.com/blog/on-the-line/how-much-does-it-cost-to-open-a-coffee-shop?srsltid=AfmBOoorE127uEliNl9_o3haRB_99WOrdxPwh9OXCptzduLqKIUmpTRO

• market_landscape_analysis
  → Value: {'market_size': {'value': 'The U.S. coffee shop industry is projected to have a total revenue of $74.3 billion in 2025, ...
  → Source: https://www.mmcginvest.com/post/u-s-coffee-shop-industry-market-analysis-navigating-maturity-margin-press

#### Webhooks

Executing Deep Research Tasks with webhooks eliminates the need for polling and allows for the end user to be notified when the research is complete. An example workflow may be an email or in-app notification on completion. This will also allow for more scale -- several deep research Tasks to be kicked off simultaneously with simple result mechanisms.

Webhooks are currently available on a per-run basis.

Documentation for Webhooks is available [here](https://docs.parallel.ai/task-api/features/webhooks). Below is an example of how you would use webhooks for this Task:



```
import requests

url = "https://api.parallel.ai/v1beta/tasks/runs"
headers = {
    "Content-Type": "application/json",
    "x-api-key": "YOUR_API_KEY"
}

payload = {
    "task_spec": {
        "output_schema": "Find the GDP of the specified country and year"
    },
    "input": user_input,
    "processor": "ultra",
    "metadata": {
        "key": "value"
    },
    "webhook": {
        "url": "https://your-domain.com/webhooks/parallel",
        "event_types": ["task_run.status"],
        "secret": "your-custom-secret"
    }
}

response = requests.post(url, json=payload, headers=headers)
print(response.json())
```

