Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraper and plugin manager #4242

Merged
merged 39 commits into from
Nov 21, 2023
Merged

Conversation

WithoutPants
Copy link
Collaborator

image

Adds a package manager to the scraper and plugin settings pages.

These allow the configuration of multiple package sources.
image

A package source URL can be a local path or URL, and the URL must return a yaml index file containing all of the contained packages in the source:

- id: <package id>
  name: <package name>
  version: <version>
  date: <date>
  path: <path to package zip>
  sha256: <sha256 of zip>
...

I have example sources deployed at the following URLs:
https://withoutpants.github.io/CommunityScrapers/develop/index.yml
https://withoutpants.github.io/CommunityScripts/develop/index.yml

The package manager unzips package zips into their own directory named for the package id in the applicable local directory (plugins or scrapers). It also writes a manifest file to track the package version and files:

id: DateParser
name: Date Parser
description: Find date in path or filename and add it
version: 0.2-32f6e33
date: 2023-10-26 14:20:23 +1100
source_repository: https://withoutpants.github.io/CommunityScripts/develop/index.yml
files:
- requirements.txt
- date_parser.py
- date_parser.yml

The main branches of https://github.com/WithoutPants/CommunityScripts and https://github.com/WithoutPants/CommunityScrapers have been modified to build the package sources from the source files. This should be considered a proof of concept and for testing purposes only.

Resolves #623

@WithoutPants WithoutPants added the feature Pull requests that add a new feature label Oct 26, 2023
@WithoutPants WithoutPants added this to the Version 0.24.0 milestone Oct 26, 2023
@stg-annon
Copy link
Collaborator

stg-annon commented Oct 26, 2023

small improvement may be to index the domains/urls in scrapers so that you can search by that when looking up a scraper

could even enumerate the tables based off this value as I think in most cases people are looking for a specific domain not necessarily scraper package

☑️ URL Package Version
☑️ google.com Alphabet Inc. 1234a5
☑️ youtube.com Alphabet Inc. 1234a5

selecting a domain would install the whole package but I think this would be a better UX

@stg-annon
Copy link
Collaborator

For plugins can the path be separate from the Index URL?

say I wanted to add performerBodyCalculator to a hosted/existing index.yml with the following

- id: performerBodyCalculator
  name: Performer Body Calculator
  description: Tags performers based on existing metadata, with tags matching the performers body type
  version: 1.0-b455ac6
  date: 2023-10-26 17:26:15 +0000
  path: https://github.com/stg-annon/performerBodyCalculator/releases/download/v1.0/performerBodyCalculator-1.0.zip
  sha256: 57A899ACC459383C4E74A5E7118D675EFA97556A5B0F4E0E327A0CF6EA8FA32A

should this be possible? it would make for easier development so we could manage a community index that could pull from various places after a review of the entry into the index

@stg-annon
Copy link
Collaborator

stg-annon commented Oct 27, 2023

encountered an issue, some scrapers have many YMLs to a single py script, Algolia_* is a prime example of this, how do we want to deal with things like this or py_common, direct the user to get the dependency from the repo?

Discussion in discord lead to the conclusion that dependencies is likely the solution here, question would be on implementation, in the simplest case one package depends on another within the same repo

Algolia_Site.yml

name: "Site"
requires:
  - algolia
sceneByURL:
  - action: script
    url:
      - site.com/en/video/
    script:
      - python
      - ../algolia/Algolia.py
      - site

algolia.yml

name: "Algolia Interface Package"
requires:
  - py_common

py_common.yml

name: "py_common Module"

this existing dependency example shows the need to examine chains of dependencies within a given repo

Dependencies Across Sources?

should we allow for/support dependencies across repos and if so what would that look like

Say this scraper is from another Source/Repo

name: "Non Community Scraper"
requires:
  - CommunityScrapers/algolia

@WithoutPants
Copy link
Collaborator Author

encountered an issue, some scrapers have many YMLs to a single py script, Algolia_* is a prime example of this, how do we want to deal with things like this or py_common, direct the user to get the dependency from the repo?

For Algolia, I would just put all of the related ymls into a single package. I haven't yet got a solution for dependencies, though the short term solution would be to bundle the dependencies into the package. Obviously has issues with redundancy, but that's a less major issue.

@Phasetime
Copy link
Contributor

I encountered a few small issues while testing:

  1. Installing a scraper/plugin usually makes it show up instantly in the list of installed items, but if it's the first item you install and the folder on the filesystem does not exist yet, the installed items only show up after a page reload.
  2. For plugins the installed section isn't showing at all if none are installed but for scrapers it does.
  3. Check for Updates does add a column for latest version but it stays empty.

I have to say, great work with this already!! Installing/updating scrapers has been a major hassle and I love to see this addressed. Along with the UI plugin API v24 shapes up to be an absolute gamechanger for versatility and usability.

@scruffynerf
Copy link

scruffynerf commented Oct 28, 2023

encountered an issue, some scrapers have many YMLs to a single py script, Algolia_* is a prime example of this, how do we want to deal with things like this or py_common, direct the user to get the dependency from the repo?

For Algolia, I would just put all of the related ymls into a single package. I haven't yet got a solution for dependencies, though the short term solution would be to bundle the dependencies into the package. Obviously has issues with redundancy, but that's a less major issue.

But this doesn't solve the Cropper.js issue, as an example.
If we have three different crop plugins (Scene, Performer, Tags) all of which use Cropper.js, we don't want to include it 3 times.

It's far nicer if we have Cropper.js in it's own plugin with a depends.

Nor do we want more than one py_common, which has to be configured (needs a settings rewrite anyway)

@stg-annon
Copy link
Collaborator

stg-annon commented Oct 28, 2023

I think the solution here is a very limited dependency implementation where you can only depend on packages within the same Source and the dependency is the package ID within that source

requires:
 - py_common

going though each dependency and recursively installing each pluginID/scraperID in the requires field should leave you with all the deps and the selected package

this makes it more reliant on Source/Package maintainers organizing things in a way that works but should be relatively easy to implement on the stash side of things

this will not do any cleanup of dependencies if a package is uninstalled it essentially just automatically does what the user would have to do when a package calls for a dependency i.e.

You need to install the package 'py_common' from the community repo!

user then goes and check the box for py_common and hits install

Implementation

say at manager.go#L156 we add a function call to ListPackageRequirements which if implemented would return the list of IDs in the requires field or nul

reqs, err := m.ListPackageRequirements(remoteURL, id)
if err != nil {
	return fmt.Errorf("retrieving requirements: %w", err)
} 
if reqs != nul {
	for reqId := range reqs {
		m.Install(remoteURL, reqId)
	}
}

Apologies if I am off base here I don't know the nuances of go so I could be very wrong with how this would work

Edit:
I've setup an example repo for what I believe a solution would look like for algolia
stg-annon/CommunityScrapers Source URL

@scruffynerf
Copy link

scruffynerf commented Oct 29, 2023

I'm confused why zip is being used? Where are these zips coming from?

(I see the package manager unpacks them... but the creation is the part sticking in my craw.)

Might be nicer if it could just get all files in/under a directory, and just verify each:
something like https://gist.github.com/jaredhoward/f231391529efcd638bb7

The problem is I see, similar to the way Userscripts are, it looks like the dev has to rebuild in order to get the item updated.
(@stg-annon yes, wasn't this the reason we wanted to move to plugins, to end that?)

Meaning every merge patch of the Community Repos, we'll have to rebuild?

@stg-annon
Copy link
Collaborator

stg-annon commented Oct 29, 2023

@scruffynerf

I'm confused why zip is being used? Where are these zips coming from?

ZIPs are transportable and can be hashed as a single file, they are generated and hosted on GithubPages to not run afoul of GitHubs TOS, its more of an automated process with Actions not a manual process

This approach helps with version control as once the package is archived it will stay that way with that hash unless updated, so there is no concern of alterations after a package is approved and added by maintainers (Plugin v1.2.3 == sha256 hash)

the way WP has implemented it is to retroactively work with the current CommunityRepos, but as I describe in one of my comments we could separate out the index from the hosting of the packages themselves #4242 (comment)

@scruffynerf
Copy link

To be clear, though... if the complexity of nested folders is an issue, forcing us to use a single folder per item, no nesting, I'm ok with that. We're not going to check out a repo into the scraper directory anymore, etc.

@WithoutPants
Copy link
Collaborator Author

My suggestion would be to have an a configurable value so that managed packages could be placed into their own folder so continuing with the above exampl

The current assumption that packages are installed in their own folder with name as packageID makes the code for finding a given package very simple. Removing this assumption is going to take a bit of development time.

@stg-annon
Copy link
Collaborator

the config would apply to all packages not a config for each package, similar to a how we configure a plugins folder or scrapers folder within stash this would be a "managed plugins folder"

It really should not change any assumptions about the packages besides where we start to look for them, the idea is that we shift the folder down one level under the "root" scraper/plugin folder allowing for the root to still be manually organized without cluttering that folder

I saw two options to do this

  1. a static path, so we are always looking in the same place for any managed package, this would work the same as it currently works just the package manager is always looking at these folders instead of directly the root folder
    <plugins_path>/ManagedPlugins/
    <scrapers_path>/ManagedScrapers/

  2. adding all managed managed packages into folder relative to their index, this would create the folder for each Source/Index but then be the same after that, the advantage here is that it could avoid package collisions as each Index is separated but it also means multiple indexes cant depend on each other

    <plugins_path>/
    ├─ <index_source_name>/
    │  ├─ <plugin_id>/ 
    
    <scrapers_path>/
    ├─ <index_source_name>/
    │  ├─ <scraper_id>/ 
    

@WithoutPants
Copy link
Collaborator Author

Added the ability to set local paths for package sources, and changed the caching behaviour so that it stores package lists in the cache directory (if set) and only re-downloads if it is newer. Should be good for another round of testing.

@stg-annon
Copy link
Collaborator

This sill appears to be an issue #4242 (comment)

@WithoutPants
Copy link
Collaborator Author

This sill appears to be an issue #4242 (comment)

Should be addressed now.

@stg-annon
Copy link
Collaborator

This sill appears to be an issue #4242 (comment)

Should be addressed now.

Yup looks good

@stg-annon
Copy link
Collaborator

How do we want to deal with requires attribute is this something that is exclusive to the Source Repo? Defined in some requirements map that CI will pick up or will they live in the plugin/scraper .yml file?

everything works as intended with the values defined in the Source index.yml but if they are defined in the scraper/plugin file stash currently will error out when loading scrapers/plugins with this value defined

yaml: unmarshal errors:
  line 2: field requires not found in type scraper.config

@WithoutPants
Copy link
Collaborator Author

How do we want to deal with requires attribute is this something that is exclusive to the Source Repo? Defined in some requirements map that CI will pick up or will they live in the plugin/scraper .yml file?

The requires field is package-specific, so not something that should be in the plugin/scraper yml file. Populating the requirements is up to the source repos to sort out. Possible methods might be to add a .requirements file to the scraper/plugin directory, or a comment in the yml file. Either way, the index-generation script should get this information and use it to populate the index file.

@WithoutPants WithoutPants changed the title [WIP] Scraper and plugin manager Scraper and plugin manager Nov 20, 2023
@WithoutPants WithoutPants merged commit 987fa80 into stashapp:develop Nov 21, 2023
2 checks passed
ChilledSlim added a commit to ChilledSlim/FansDB-SHALookup that referenced this pull request Dec 27, 2023
Create index file for stashapp/stash#4242 feature
@olivechicago
Copy link

Wouldn't you want installed plugins to either be hidden or "greyed out"/marked as installed in the Available Plugins section of the manager?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Pull requests that add a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] CommunityScraper Management/Sync Within Stash UI
5 participants