# Ecommerce Price API Example 
This pipeline shows how to use Indexical to scrape ecommerce prices. This will pull the `product_name`, `image_link`, `product_details_page`, and `price` for each product on a product listing page.  

### Setup
In this section, we'll set up the environment to start using Indexical's API by importing the necessary libraries and saving the API key. Start by inputting your API key below. You can generate an API key from the Indexical console by selecting `Keys` and then hitting `New API Key`. 

In [2]:
import requests
import json


In [3]:

API_KEY="<REPLACE WITH YOUR API_KEY>" 

### Add Pipeline
To create scalable data extraction workflows, Indexical allows you to construct a pipeline of high-level steps that describe how to go from a URL / query to clean, structured data. Each step contains an `action` that is tied to a task-specific agent on our end, as well as a natural language `goal` to instruct the agent. We'll start by importing the `ecommerce_price.json` file as our pipeline and printing it below. 



In [4]:
file_path="pipelines/ecommerce_price.json"
with open(file_path, 'r') as file:
    pipeline_steps = json.load(file)
    print(json.dumps(pipeline_steps, indent=4))


{
    "name": "ecommerce_price",
    "steps": [
        {
            "action": "extract-many",
            "goal": "extract the listed information about each product listing",
            "schema": {
                "product_name": {
                    "type": "string",
                    "description": "the name of the product"
                },
                "product_price": {
                    "type": "string",
                    "description": "the price (discounted if available) of the product"
                },
                "image_link": {
                    "type": "string",
                    "description": "the link to the product image, if available"
                },
                "product_listing_link": {
                    "type": "string",
                    "descripiton": "the link to the product listing details page"
                }
            }
        }
    ]
}


Now, we'll call Indexical's `pipelines` API endpoint to save that pipeline for use in future data extraction workflows Each API request requires the header with `x-api-key`, and you include the JSON pipeline as the API body.  

In [5]:
response = requests.post("https://app.indexical.dev/pipelines",
                          headers={'x-api-key': API_KEY}, 
                          json=pipeline_steps)

if response.status_code == 200: 
    # Convert the response to JSON format
    data = response.json()
else:
    print(response.status_code)

### Run data extraction job
Once you've saved your pipeline, you can then run that workflow on any set ofwebsites / queries through that pipeline. Indexical will use AI to handle the process of mapping that pipeline to relevant selectors and information on each page, gathering and transforming the website into a clean, standardized schema of your choosing. 

In this case, I will run the `ecommerce_price` pipeline and extract all of the product information from the following [url](https://www.denydesigns.com/collections/bench).

In [13]:
response = requests.post("https://app.indexical.dev/runs",
                          headers={'x-api-key': API_KEY}, 
                          json={
                              "name" : "ecommerce_price", 
                              "urls" : ["https://www.denydesigns.com/collections/bench"],
                              "proxiesEnabled" : True 
                          })



By default, the `runs` endpoint will run the data extraction pipeline and return the results asynchronously (either available for download on the developer console or transmitted to a subscriber URL via webhooks). As a response, the `runs` endpoint will return both the `pipelineID` and `runID`. 

In [14]:
if response.status_code == 200: 
    # Convert the response to JSON format
    data = response.json()
    run_id = data['id']
    print(data)
else:
    print(response.status_code)

{'pipeline': 514, 'id': 1546}


### Getting Results
To get the results programmatically, you can either use [webhooks](https://docs.indexical.dev/runs) or use the `outputs` endpoint. Simply call `https://app.indexical.dev/runs/:runId/outputs` with the `runID` returned by the `runs` endpoint.  

In [15]:
output_endpoint = "https://app.indexical.dev/runs/" + str(run_id) + "/outputs"
response = requests.get(output_endpoint,
                          headers={'x-api-key': API_KEY})




By default, the results will be a JSON file with 2 keys `results` and `errors`. Each key will contain an array of results. Each result has a `seed` query/URL, the final `url` from which Indexical extracted the information, as well as a `data` key which has all of the data specified in the pipeline. If Indexical was not able to find information on the page that maps to a specific element, it will either return `NULL` or not output that key.   

In [16]:
if response.status_code == 200: 
    # Convert the response to JSON format
    data = response.json()
    print(json.dumps(data, indent=4))
else:
    print(response.status_code)

{
    "results": [
        {
            "id": 7594431,
            "pipeline": 514,
            "run": 1546,
            "seed": "https://www.denydesigns.com/collections/bench",
            "url": "https://www.denydesigns.com/collections/bench",
            "data": {
                "image_link": "https://www.denydesigns.com/cdn/shop/files/britt-mills-design-chinoiserie-garden-bench-whitebg-gold_large.jpg?v=1720707940",
                "product_name": "Britt Mills Design Chinoiserie Garden Bench",
                "product_price": "$349.00",
                "product_listing_link": "https://www.denydesigns.com/products/britt-mills-design-chinoiserie-garden-bench"
            },
            "created_at": "2024-07-13 00:53:43.712066"
        },
        {
            "id": 7594432,
            "pipeline": 514,
            "run": 1546,
            "seed": "https://www.denydesigns.com/collections/bench",
            "url": "https://www.denydesigns.com/collections/bench",
            "data": 