# Search API Example 
This pipeline shows how to use Indexical's `search` action to integrate with SERP APIs and start scrapes with natural language queries. 

### Setup
In this section, we'll set up the environment to start using Indexical's API by importing the necessary libraries and saving the API key. Start by inputting your API key below. You can generate an API key from the Indexical console by selecting `Keys` and then hitting `New API Key`. 

In [1]:
import requests
import json


In [2]:

API_KEY="<REPLACE WITH YOUR API_KEY>" 

### Add Pipeline
To create scalable data extraction workflows, Indexical allows you to construct a pipeline of high-level steps that describe how to go from a URL / query to clean, structured data. Each step contains an `action` that is tied to a task-specific agent on our end, as well as a natural language `goal` to instruct the agent. We'll start by importing the `search.json` file as our pipeline and printing it below. 

This pipeline assumes that the user will input dynamic queries via the `runs` API endpoint. We will then use SERP APIs to input that query into a search engine, and click on the first result. Indexical will then nevgiate through the website to find the relevant page that answers the user's question, and extract the main content of the page as human-readable markdown. 


In [3]:
file_path="pipelines/search.json"
with open(file_path, 'r') as file:
    pipeline_steps = json.load(file)
    print(json.dumps(pipeline_steps, indent=4))


{
    "name": "search",
    "steps": [
        {
            "action": "search",
            "limit": 1
        },
        {
            "action": "extract",
            "goal": "extract the following information",
            "schema": {
                "content": "$mainContent"
            }
        }
    ]
}


Now, we'll call Indexical's `pipelines` API endpoint to save that pipeline for use in future data extraction workflows Each API request requires the header with `x-api-key`, and you include the JSON pipeline as the API body.  

In [4]:
response = requests.post("https://app.indexical.dev/pipelines",
                          headers={'x-api-key': API_KEY}, 
                          json=pipeline_steps)

if response.status_code == 200: 
    # Convert the response to JSON format
    data = response.json()
else:
    print(response.status_code)

### Run data extraction job
Once you've saved your pipeline, you can then run that workflow on any set ofwebsites / queries through that pipeline. Indexical will use AI to handle the process of mapping that pipeline to relevant selectors and information on each page, gathering and transforming the website into a clean, standardized schema of your choosing. 

In this case, I will run the `search` pipeline and input 2 queries at run-time. 

In [9]:
response = requests.post("https://app.indexical.dev/runs",
                          headers={'x-api-key': API_KEY}, 
                          json={
                              "name" : "search", 
                              "textInputs" : ["Perplexity privacy policy", "Terms of Service for Figma"], 
                              "proxiesEnabled" : True 
                          })



By default, the `runs` endpoint will run the data extraction pipeline and return the results asynchronously (either available for download on the developer console or transmitted to a subscriber URL via webhooks). As a response, the `runs` endpoint will return both the `pipelineID` and `runID`. 

In [10]:
if response.status_code == 200: 
    # Convert the response to JSON format
    data = response.json()
    run_id = data['id']
    print(data)
else:
    print(response.status_code)

{'pipeline': 1004, 'id': 1601}


### Getting Results
To get the results programmatically, you can either use [webhooks](https://docs.indexical.dev/runs) or use the `outputs` endpoint. Simply call `https://app.indexical.dev/runs/:runId/outputs` with the `runID` returned by the `runs` endpoint.  

In [13]:
output_endpoint = "https://app.indexical.dev/runs/" + str(run_id) + "/outputs"
response = requests.get(output_endpoint,
                          headers={'x-api-key': API_KEY})




By default, the results will be a JSON file with 2 keys `results` and `errors`. Each key will contain an array of results. Each result has a `seed` query/URL, the final `url` from which Indexical extracted the information, as well as a `data` key which has all of the data specified in the pipeline. If Indexical was not able to find information on the page that maps to a specific element, it will either return `NULL` or not output that key.   

In [14]:
if response.status_code == 200: 
    # Convert the response to JSON format
    data = response.json()
    print(json.dumps(data, indent=4))
else:
    print(response.status_code)

{
    "results": [
        {
            "id": 7763217,
            "pipeline": 1004,
            "run": 1601,
            "seed": "Perplexity privacy policy",
            "url": "https://www.perplexity.ai/hub/legal/privacy-policy",
            "data": {
                "content": "PERPLEXITY'S PRIVACY POLICY\n\nLast updated: June 4th, 2024.\n\nThis Privacy Notice describes how Perplexity AI, Inc. (\u201cwe\u201d, \u201cus,\u201d \u201cour\u201d)\ncollects, uses and discloses information about individuals who use our websites\n(www.perplexity.ai and https://labs.perplexity.ai), applications, services,\ntools and features, purchase our products or otherwise interact with us\n(collectively, the \u201cServices\u201d). For the purposes of this Privacy Notice, \u201cyou\u201d\nand \u201cyour\u201d means you as the user of the Services, whether you are a customer,\nwebsite visitor, job applicant, representative of a company with whom we do\nbusiness, or another individual whose information w