Skip to content

phucb/pulsarr

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3,090 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿค– PulsarRPA

Docker Pulls License: APACHE2 Spring Boot


English | ็ฎ€ไฝ“ไธญๆ–‡ | ไธญๅ›ฝ้•œๅƒ

๐ŸŒŸ Introduction

๐Ÿ’– PulsarRPA: The AI-Powered, Lightning-Fast Browser Automation Solution! ๐Ÿ’–

โœจ Key Capabilities:

  • ๐Ÿค– AI Integration with LLMs โ€“ Smarter automation powered by large language models.
  • โšก Ultra-Fast Automation โ€“ Coroutine-safe browser automation concurrency, spider-level crawling performance.
  • ๐Ÿง  Web Understanding โ€“ Deep comprehension of dynamic web content.
  • ๐Ÿ“Š Data Extraction APIs โ€“ Powerful tools to extract structured data effortlessly.

Automate the browser and extract data at scale with simple text.

Go to https://www.amazon.com/dp/B0C1H26C46
After page load: scroll to the middle.

Summarize the product.
Extract: product name, price, ratings.
Find all links containing /dp/.

๐ŸŽฅ Demo Videos

๐ŸŽฌ YouTube: Watch the video

๐Ÿ“บ Bilibili: https://www.bilibili.com/video/BV1kM2rYrEFC


๐Ÿš€ Quick Start Guide

โ–ถ๏ธ Run PulsarRPA

๐Ÿ“ฆ Run the Executable JAR โ€” Best Experience

๐Ÿงฉ Download

# For Linux/macOS/Windows (with curl)
curl -L -o PulsarRPA.jar https://github.com/platonai/PulsarRPA/releases/download/v3.0.7/PulsarRPA.jar

๐Ÿš€ Run

java -DEEPSEEK_API_KEY=${DEEPSEEK_API_KEY} -jar PulsarRPA.jar

๐Ÿ” Tip: Make sure DEEPSEEK_API_KEY is set in your environment, or AI features will not be available.


๐Ÿ“‚ Resources

โ–ถ Run with IDE

Details
  • Open the project in your IDE
  • Run the ai.platon.pulsar.app.PulsarApplicationKt main class

๐Ÿณ Docker Users

Details
docker run -d -p 8182:8182 -e DEEPSEEK_API_KEY=${DEEPSEEK_API_KEY} galaxyeye88/pulsar-rpa:latest

๐ŸŒŸ For Beginners โ€“ Just Text, No Code!

Use the ai/command API to perform actions and extract data using natural language instructions.

๐Ÿ“ฅ Example Request (Text-based):

curl -X POST "http://localhost:8182/api/ai/command" \
  -H "Content-Type: text/plain" \
  -d '
    Go to https://www.amazon.com/dp/B0C1H26C46
    After page load: click #title, then scroll to the middle.
    
    Summarize the product.
    Extract: product name, price, ratings.
    Find all links containing /dp/.
  '

๐Ÿ“„ JSON-Based Version:

Details
curl -X POST "http://localhost:8182/api/ai/command" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.amazon.com/dp/B0C1H26C46",
    "pageSummaryPrompt": "Provide a brief introduction of this product.",
    "dataExtractionRules": "product name, price, and ratings",
    "linkExtractionRules": "all links containing `/dp/` on the page",
    "onPageReadyActions": ["click #title", "scroll to the middle"]
  }'

๐Ÿ’ก Tip: You don't need to fill in every field โ€” just what you need.

๐ŸŽ“ For Advanced Users โ€” LLM + X-SQL: Precise, Flexible, Powerful

Harness the power of the x/e API for highly precise, flexible, and intelligent data extraction.

curl -X POST "http://localhost:8182/api/scrape/execute" -H "Content-Type: text/plain" -d "
select
  llm_extract(dom, 'product name, price, ratings') as llm_extracted_data,
  dom_base_uri(dom) as url,
  dom_first_text(dom, '#productTitle') as title,
  dom_first_slim_html(dom, 'img:expr(width > 400)') as img
from load_and_select('https://www.amazon.com/dp/B0C1H26C46', 'body');
"

The extracted data example:

{
  "llm_extracted_data": {
    "product name": "Apple iPhone 15 Pro Max",
    "price": "$1,199.00",
    "ratings": "4.5 out of 5 stars"
  },
  "url": "https://www.amazon.com/dp/B0C1H26C46",
  "title": "Apple iPhone 15 Pro Max",
  "img": "<img src=\"https://example.com/image.jpg\" />"
}

๐Ÿ‘จโ€๐Ÿ’ป For Experts - Native API: Powerful!

๐ŸŽฎ Browser Control:

Details
val prompts = """
move cursor to the element with id 'title' and click it
scroll to middle
scroll to top
get the text of the element with id 'title'
"""

val eventHandlers = DefaultPageEventHandlers()
eventHandlers.browseEventHandlers.onDocumentActuallyReady.addLast { page, driver ->
    val result = session.instruct(prompts, driver)
}
session.open(url, eventHandlers)

๐Ÿ“ Example: View Kotlin Code


๐Ÿค– Complete Robotic Process Automation Capabilities:

Details
val options = session.options(args)
val event = options.eventHandlers.browseEventHandlers
event.onBrowserLaunched.addLast { page, driver ->
    warnUpBrowser(page, driver)
}
event.onWillFetch.addLast { page, driver ->
    waitForReferrer(page, driver)
    waitForPreviousPage(page, driver)
}
event.onWillCheckDocumentState.addLast { page, driver ->
    driver.waitForSelector("body h1[itemprop=name]")
    driver.click(".mask-layer-close-button")
}
session.load(url, options)

๐Ÿ“ Example: View Kotlin Code


๐Ÿ” Complex Data Extraction with X-SQL:

Details
select
    llm_extract(dom, 'product name, price, ratings, score') as llm_extracted_data,
    dom_first_text(dom, '#productTitle') as title,
    dom_first_text(dom, '#bylineInfo') as brand,
    dom_first_text(dom, '#price tr td:matches(^Price) ~ td') as price,
    dom_first_text(dom, '#acrCustomerReviewText') as ratings,
    str_first_float(dom_first_text(dom, '#reviewsMedley .AverageCustomerReviews span:contains(out of)'), 0.0) as score
from load_and_select('https://www.amazon.com/dp/B0C1H26C46  -i 1s -njr 3', 'body');

๐Ÿ“š Example Code:


๐Ÿ“œ Documents


๐Ÿ”ง Proxies - Unblock Websites

Details

Set the environment variable PROXY_ROTATION_URL to the URL provided by your proxy service:

export PROXY_ROTATION_URL=https://your-proxy-provider.com/rotation-endpoint

Each time the rotation URL is accessed, it should return a response containing one or more fresh proxy IPs. Ask your proxy provider for such a URL.


โœจ Features

๐Ÿ•ท๏ธ Web Spider

  • Scalable crawling
  • Browser rendering
  • AJAX data extraction

๐Ÿค– AI-Powered

  • Automatic field extraction
  • Pattern recognition
  • Accurate data capture

๐Ÿง  LLM Integration

  • Natural language web content analysis
  • Intuitive content description

๐ŸŽฏ Text-to-Action

  • Simple language commands
  • Intuitive browser control

๐Ÿค– RPA Capabilities

  • Human-like task automation
  • SPA crawling support
  • Advanced workflow automation

๐Ÿ› ๏ธ Developer-Friendly

  • One-line data extraction
  • SQL-like query interface
  • Simple API integration

๐Ÿ“Š X-SQL Power

  • Extended SQL for web data
  • Content mining capabilities
  • Web business intelligence

๐Ÿ›ก๏ธ Bot Protection

  • Advanced stealth techniques
  • IP rotation
  • Privacy context management

โšก Performance

  • Parallel page rendering
  • High-efficiency processing
  • Block-resistant design

๐Ÿ’ฐ Cost-Effective

  • 100,000+ pages/day
  • Minimal hardware requirements
  • Resource-efficient operation

โœ… Quality Assurance

  • Smart retry mechanisms
  • Precise scheduling
  • Complete lifecycle management

๐ŸŒ Scalability

  • Fully distributed architecture
  • Massive-scale capability
  • Enterprise-ready

๐Ÿ“ฆ Storage Options

  • Local File System
  • MongoDB
  • HBase
  • Gora support

๐Ÿ“Š Monitoring

  • Comprehensive logging
  • Detailed metrics
  • Full transparency

๐Ÿ“ž Contact Us

WeChat QR Code

About

Scrape web data at scale completely and accurately with high performance, distributed RPA.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Kotlin 57.3%
  • Java 40.9%
  • Shell 0.6%
  • CSS 0.5%
  • PowerShell 0.5%
  • Python 0.2%