Scraper and plugin manager #4242

WithoutPants · 2023-10-26T03:42:38Z

Adds a package manager to the scraper and plugin settings pages.

These allow the configuration of multiple package sources.

A package source URL can be a local path or URL, and the URL must return a yaml index file containing all of the contained packages in the source:

- id: <package id>
  name: <package name>
  version: <version>
  date: <date>
  path: <path to package zip>
  sha256: <sha256 of zip>
...

I have example sources deployed at the following URLs:
https://withoutpants.github.io/CommunityScrapers/develop/index.yml
https://withoutpants.github.io/CommunityScripts/develop/index.yml

The package manager unzips package zips into their own directory named for the package id in the applicable local directory (plugins or scrapers). It also writes a manifest file to track the package version and files:

id: DateParser
name: Date Parser
description: Find date in path or filename and add it
version: 0.2-32f6e33
date: 2023-10-26 14:20:23 +1100
source_repository: https://withoutpants.github.io/CommunityScripts/develop/index.yml
files:
- requirements.txt
- date_parser.py
- date_parser.yml

The main branches of https://github.com/WithoutPants/CommunityScripts and https://github.com/WithoutPants/CommunityScrapers have been modified to build the package sources from the source files. This should be considered a proof of concept and for testing purposes only.

Resolves #623

stg-annon · 2023-10-26T16:23:32Z

small improvement may be to index the domains/urls in scrapers so that you can search by that when looking up a scraper

could even enumerate the tables based off this value as I think in most cases people are looking for a specific domain not necessarily scraper package

☑️	URL	Package	Version
☑️	google.com	Alphabet Inc.	1234a5
☑️	youtube.com	Alphabet Inc.	1234a5

selecting a domain would install the whole package but I think this would be a better UX

stg-annon · 2023-10-26T17:40:40Z

For plugins can the path be separate from the Index URL?

say I wanted to add performerBodyCalculator to a hosted/existing index.yml with the following

- id: performerBodyCalculator
  name: Performer Body Calculator
  description: Tags performers based on existing metadata, with tags matching the performers body type
  version: 1.0-b455ac6
  date: 2023-10-26 17:26:15 +0000
  path: https://github.com/stg-annon/performerBodyCalculator/releases/download/v1.0/performerBodyCalculator-1.0.zip
  sha256: 57A899ACC459383C4E74A5E7118D675EFA97556A5B0F4E0E327A0CF6EA8FA32A

should this be possible? it would make for easier development so we could manage a community index that could pull from various places after a review of the entry into the index

stg-annon · 2023-10-27T15:24:49Z

encountered an issue, some scrapers have many YMLs to a single py script, Algolia_* is a prime example of this, how do we want to deal with things like this or py_common, direct the user to get the dependency from the repo?

Discussion in discord lead to the conclusion that dependencies is likely the solution here, question would be on implementation, in the simplest case one package depends on another within the same repo

Algolia_Site.yml

name: "Site"
requires:
  - algolia
sceneByURL:
  - action: script
    url:
      - site.com/en/video/
    script:
      - python
      - ../algolia/Algolia.py
      - site

algolia.yml

name: "Algolia Interface Package"
requires:
  - py_common

py_common.yml

name: "py_common Module"

this existing dependency example shows the need to examine chains of dependencies within a given repo

Dependencies Across Sources?

should we allow for/support dependencies across repos and if so what would that look like

Say this scraper is from another Source/Repo

name: "Non Community Scraper"
requires:
  - CommunityScrapers/algolia

WithoutPants · 2023-10-28T08:41:19Z

encountered an issue, some scrapers have many YMLs to a single py script, Algolia_* is a prime example of this, how do we want to deal with things like this or py_common, direct the user to get the dependency from the repo?

For Algolia, I would just put all of the related ymls into a single package. I haven't yet got a solution for dependencies, though the short term solution would be to bundle the dependencies into the package. Obviously has issues with redundancy, but that's a less major issue.

Phasetime · 2023-10-28T12:00:29Z

I encountered a few small issues while testing:

Installing a scraper/plugin usually makes it show up instantly in the list of installed items, but if it's the first item you install and the folder on the filesystem does not exist yet, the installed items only show up after a page reload.
For plugins the installed section isn't showing at all if none are installed but for scrapers it does.
Check for Updates does add a column for latest version but it stays empty.

I have to say, great work with this already!! Installing/updating scrapers has been a major hassle and I love to see this addressed. Along with the UI plugin API v24 shapes up to be an absolute gamechanger for versatility and usability.

scruffynerf · 2023-10-28T15:02:25Z

encountered an issue, some scrapers have many YMLs to a single py script, Algolia_* is a prime example of this, how do we want to deal with things like this or py_common, direct the user to get the dependency from the repo?

For Algolia, I would just put all of the related ymls into a single package. I haven't yet got a solution for dependencies, though the short term solution would be to bundle the dependencies into the package. Obviously has issues with redundancy, but that's a less major issue.

But this doesn't solve the Cropper.js issue, as an example.
If we have three different crop plugins (Scene, Performer, Tags) all of which use Cropper.js, we don't want to include it 3 times.

It's far nicer if we have Cropper.js in it's own plugin with a depends.

Nor do we want more than one py_common, which has to be configured (needs a settings rewrite anyway)

stg-annon · 2023-10-28T17:05:10Z

I think the solution here is a very limited dependency implementation where you can only depend on packages within the same Source and the dependency is the package ID within that source

requires:
 - py_common

going though each dependency and recursively installing each pluginID/scraperID in the requires field should leave you with all the deps and the selected package

this makes it more reliant on Source/Package maintainers organizing things in a way that works but should be relatively easy to implement on the stash side of things

this will not do any cleanup of dependencies if a package is uninstalled it essentially just automatically does what the user would have to do when a package calls for a dependency i.e.

You need to install the package 'py_common' from the community repo!

user then goes and check the box for py_common and hits install

Implementation

say at manager.go#L156 we add a function call to ListPackageRequirements which if implemented would return the list of IDs in the requires field or nul

reqs, err := m.ListPackageRequirements(remoteURL, id)
if err != nil {
	return fmt.Errorf("retrieving requirements: %w", err)
} 
if reqs != nul {
	for reqId := range reqs {
		m.Install(remoteURL, reqId)
	}
}

Apologies if I am off base here I don't know the nuances of go so I could be very wrong with how this would work

Edit:
I've setup an example repo for what I believe a solution would look like for algolia
stg-annon/CommunityScrapers Source URL

scruffynerf · 2023-10-29T15:21:34Z

I'm confused why zip is being used? Where are these zips coming from?

(I see the package manager unpacks them... but the creation is the part sticking in my craw.)

Might be nicer if it could just get all files in/under a directory, and just verify each:
something like https://gist.github.com/jaredhoward/f231391529efcd638bb7

The problem is I see, similar to the way Userscripts are, it looks like the dev has to rebuild in order to get the item updated.
(@stg-annon yes, wasn't this the reason we wanted to move to plugins, to end that?)

Meaning every merge patch of the Community Repos, we'll have to rebuild?

stg-annon · 2023-10-29T18:30:14Z

@scruffynerf

I'm confused why zip is being used? Where are these zips coming from?

ZIPs are transportable and can be hashed as a single file, they are generated and hosted on GithubPages to not run afoul of GitHubs TOS, its more of an automated process with Actions not a manual process

This approach helps with version control as once the package is archived it will stay that way with that hash unless updated, so there is no concern of alterations after a package is approved and added by maintainers (Plugin v1.2.3 == sha256 hash)

the way WP has implemented it is to retroactively work with the current CommunityRepos, but as I describe in one of my comments we could separate out the index from the hosting of the packages themselves #4242 (comment)

Remove fs repository in favour of file:// url

scruffynerf · 2023-11-02T23:38:03Z

To be clear, though... if the complexity of nested folders is an issue, forcing us to use a single folder per item, no nesting, I'm ok with that. We're not going to check out a repo into the scraper directory anymore, etc.

WithoutPants · 2023-11-08T00:44:55Z

My suggestion would be to have an a configurable value so that managed packages could be placed into their own folder so continuing with the above exampl

The current assumption that packages are installed in their own folder with name as packageID makes the code for finding a given package very simple. Removing this assumption is going to take a bit of development time.

stg-annon · 2023-11-08T02:17:22Z

the config would apply to all packages not a config for each package, similar to a how we configure a plugins folder or scrapers folder within stash this would be a "managed plugins folder"

It really should not change any assumptions about the packages besides where we start to look for them, the idea is that we shift the folder down one level under the "root" scraper/plugin folder allowing for the root to still be manually organized without cluttering that folder

I saw two options to do this

a static path, so we are always looking in the same place for any managed package, this would work the same as it currently works just the package manager is always looking at these folders instead of directly the root folder
<plugins_path>/ManagedPlugins/
<scrapers_path>/ManagedScrapers/
adding all managed managed packages into folder relative to their index, this would create the folder for each Source/Index but then be the same after that, the advantage here is that it could avoid package collisions as each Index is separated but it also means multiple indexes cant depend on each other
```
<plugins_path>/
├─ <index_source_name>/
│  ├─ <plugin_id>/ 
```
```
<scrapers_path>/
├─ <index_source_name>/
│  ├─ <scraper_id>/ 
```

WithoutPants · 2023-11-18T23:16:10Z

Added the ability to set local paths for package sources, and changed the caching behaviour so that it stores package lists in the cache directory (if set) and only re-downloads if it is newer. Should be good for another round of testing.

stg-annon · 2023-11-18T23:42:15Z

This sill appears to be an issue #4242 (comment)

WithoutPants · 2023-11-19T00:26:44Z

This sill appears to be an issue #4242 (comment)

Should be addressed now.

stg-annon · 2023-11-19T17:20:12Z

This sill appears to be an issue #4242 (comment)

Should be addressed now.

Yup looks good

stg-annon · 2023-11-19T18:05:54Z

How do we want to deal with requires attribute is this something that is exclusive to the Source Repo? Defined in some requirements map that CI will pick up or will they live in the plugin/scraper .yml file?

everything works as intended with the values defined in the Source index.yml but if they are defined in the scraper/plugin file stash currently will error out when loading scrapers/plugins with this value defined

yaml: unmarshal errors:
  line 2: field requires not found in type scraper.config

WithoutPants · 2023-11-20T00:49:22Z

How do we want to deal with requires attribute is this something that is exclusive to the Source Repo? Defined in some requirements map that CI will pick up or will they live in the plugin/scraper .yml file?

The requires field is package-specific, so not something that should be in the plugin/scraper yml file. Populating the requirements is up to the source repos to sort out. Possible methods might be to add a .requirements file to the scraper/plugin directory, or a comment in the yml file. Either way, the index-generation script should get this information and use it to populate the index file.

Create index file for stashapp/stash#4242 feature

olivechicago · 2023-12-29T17:54:41Z

Wouldn't you want installed plugins to either be hidden or "greyed out"/marked as installed in the Available Plugins section of the manager?

WithoutPants added 16 commits October 26, 2023 13:28

Add package manager

b62b7a0

Add id to package

7c4333e

Show local version

8af2ecc

Integrate with stash

05df021

Refactoring

c85203e

First version of UI

866d2ac

Add operations

7a9f3e8

Refactoring and bug fixes

b7708b9

Lint js before css

c66b00b

Add SettingModal validate

53e45aa

Reverse modal button order

d85c2f1

Source management

2f01d2b

Restructuring

8e47901

Add plugin package management

e826f25

Fix path resolution

458b340

Add missing i8n string

1ab4969

WithoutPants added the feature Pull requests that add a new feature label Oct 26, 2023

WithoutPants added this to the Version 0.24.0 milestone Oct 26, 2023

Various cleanup

44a8b82

WithoutPants added 2 commits October 31, 2023 10:00

Allow package zips hosted elsewhere

7d62cb9

Remove fs repository in favour of file:// url

Add requires field

a470099

WithoutPants added 3 commits November 15, 2023 14:32

Configure path for source

e916a8e

Remove stashpkg

5570e10

Cache package lists in filesystem

644d635

WithoutPants added 2 commits November 19, 2023 10:19

Merge remote-tracking branch 'upstream/develop' into plugin-manager

670175f

Fix merge errors

7cfd26f

WithoutPants added 3 commits November 19, 2023 11:11

Merge remote-tracking branch 'upstream/develop' into plugin-manager

ec66291

Go mod changes

5d10bfa

Add all package requirements

b161257

WithoutPants added 3 commits November 20, 2023 14:44

Merge remote-tracking branch 'upstream/develop' into plugin-manager

c21c5ad

Add hide unselected button

04623ca

Confirm dialog on uninstall

df672fe

WithoutPants changed the title ~~[WIP] Scraper and plugin manager~~ Scraper and plugin manager Nov 20, 2023

WithoutPants merged commit 987fa80 into stashapp:develop Nov 21, 2023
2 checks passed

Maista6969 mentioned this pull request Dec 8, 2023

Added TitanMen.yml stashapp/CommunityScrapers#1555

Closed

This was referenced Dec 18, 2023

Set PYTHONPATH environment variable for Python script scrapers #4372

Merged

Issue 1566 stashapp/CommunityScrapers#1567

Closed

Python Scrapers do not work folderized stashapp/CommunityScrapers#1566

Closed

ChilledSlim added a commit to ChilledSlim/FansDB-SHALookup that referenced this pull request Dec 27, 2023

Create createindex.sh

8f92bfc

Create index file for stashapp/stash#4242 feature

ChilledSlim mentioned this pull request Dec 27, 2023

Create createindex.sh feederbox826/FansDB-SHALookup#13

Closed

WithoutPants mentioned this pull request Dec 28, 2023

Revert modal button order change #4400

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scraper and plugin manager #4242

Scraper and plugin manager #4242

WithoutPants commented Oct 26, 2023

stg-annon commented Oct 26, 2023 •

edited

Loading

stg-annon commented Oct 26, 2023

stg-annon commented Oct 27, 2023 •

edited

Loading

WithoutPants commented Oct 28, 2023

Phasetime commented Oct 28, 2023

scruffynerf commented Oct 28, 2023 •

edited

Loading

stg-annon commented Oct 28, 2023 •

edited

Loading

scruffynerf commented Oct 29, 2023 •

edited

Loading

stg-annon commented Oct 29, 2023 •

edited

Loading

scruffynerf commented Nov 2, 2023

WithoutPants commented Nov 8, 2023

stg-annon commented Nov 8, 2023

WithoutPants commented Nov 18, 2023

stg-annon commented Nov 18, 2023

WithoutPants commented Nov 19, 2023

stg-annon commented Nov 19, 2023

stg-annon commented Nov 19, 2023

WithoutPants commented Nov 20, 2023

olivechicago commented Dec 29, 2023

Scraper and plugin manager #4242

Scraper and plugin manager #4242

Conversation

WithoutPants commented Oct 26, 2023

stg-annon commented Oct 26, 2023 • edited Loading

stg-annon commented Oct 26, 2023

stg-annon commented Oct 27, 2023 • edited Loading

Dependencies Across Sources?

WithoutPants commented Oct 28, 2023

Phasetime commented Oct 28, 2023

scruffynerf commented Oct 28, 2023 • edited Loading

stg-annon commented Oct 28, 2023 • edited Loading

Implementation

scruffynerf commented Oct 29, 2023 • edited Loading

stg-annon commented Oct 29, 2023 • edited Loading

scruffynerf commented Nov 2, 2023

WithoutPants commented Nov 8, 2023

stg-annon commented Nov 8, 2023

WithoutPants commented Nov 18, 2023

stg-annon commented Nov 18, 2023

WithoutPants commented Nov 19, 2023

stg-annon commented Nov 19, 2023

stg-annon commented Nov 19, 2023

WithoutPants commented Nov 20, 2023

olivechicago commented Dec 29, 2023

stg-annon commented Oct 26, 2023 •

edited

Loading

stg-annon commented Oct 27, 2023 •

edited

Loading

scruffynerf commented Oct 28, 2023 •

edited

Loading

stg-annon commented Oct 28, 2023 •

edited

Loading

scruffynerf commented Oct 29, 2023 •

edited

Loading

stg-annon commented Oct 29, 2023 •

edited

Loading