Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JSON scrape support #717

Merged
merged 5 commits into from Aug 10, 2020
Merged

Conversation

WithoutPants
Copy link
Collaborator

Adds support for JSON scrapers. JSON scrapers work very similar to xpath scrapers, but uses GJSON to query for values. Scenes may be scraped by fragment with a query URL. The query URL supports the following parameters:

  • {checksum} - the MD5 checksum of the scene
  • {oshash} - the oshash of the scene
  • {filename} - the base filename of the scene
  • {title} - the title of the scene

A scraper for ThePornDB is below:

name: ThePornDB
performerByName:
  action: scrapeJson
  queryURL: https://metadataapi.net/api/performers?q={}
  scraper: performerSearch
performerByURL:
  - action: scrapeJson
    url:
      - https://metadataapi.net/api/performers/
    scraper: performerScraper
sceneByURL:
  - action: scrapeJson
    url:
      - https://metadataapi.net/api/scenes/
    scraper: sceneScraper
sceneByFragment:
  action: scrapeJson
  queryURL: https://metadataapi.net/api/scenes?parse={filename}&limit=1
  scraper: sceneQueryScraper
jsonScrapers:
  performerSearch:
    performer:
      Name: data.#.name
      URL:
        selector: data.#.id
        replace:
          - regex: ^
            with: https://metadataapi.net/api/performers/

  performerScraper:
    common:
      $extras: data.extras
    performer:
      Name: data.name
      Gender: $extras.gender
      Birthdate: $extras.birthday
      Ethnicity: $extras.ethnicity
      Height: $extras.height
      Measurements: $extras.measurements
      Tattoos: $extras.tattoos
      Piercings: $extras.piercings
      Aliases: data.aliases
      Image: data.image

  sceneScraper:
    common:
      $performers: data.performers
    scene:
      Title: data.title
      Details: data.description
      Date: data.date
      URL: data.url
      Image: data.background.small
      Performers:
        Name: data.performers.#.name
      Studio:
        Name: data.site.name
      Tags:
        Name: data.tags.#.tag
  
  sceneQueryScraper:
    common:
      $data: data.0
      $performers: data.0.performers
    scene:
      Title: $data.title
      Details: $data.description
      Date: $data.date
      URL: $data.url
      Image: $data.background.small
      Performers:
        Name: $data.performers.#.name
      Studio:
        Name: $data.site.name
      Tags:
        Name: $data.tags.#.tag  

@WithoutPants WithoutPants added the feature Pull requests that add a new feature label Aug 9, 2020
@WithoutPants WithoutPants added this to the Version 0.3.0 milestone Aug 9, 2020
pkg/scraper/json.go Outdated Show resolved Hide resolved
@bnkai
Copy link
Collaborator

bnkai commented Aug 9, 2020

From a quick test it seems to work fine

@WithoutPants WithoutPants merged commit 7158e83 into stashapp:develop Aug 10, 2020
@WithoutPants WithoutPants linked an issue Aug 12, 2020 that may be closed by this pull request
Tweeticoats pushed a commit to Tweeticoats/stash that referenced this pull request Feb 1, 2021
* Add support for scene fragment scrape in xpath
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Pull requests that add a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Template support to scrape JSON
2 participants