Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future of the horizon-pool repo #169

Open
carrotIndustries opened this issue Aug 30, 2020 · 55 comments
Open

Future of the horizon-pool repo #169

carrotIndustries opened this issue Aug 30, 2020 · 55 comments

Comments

@carrotIndustries
Copy link
Member

As it's probably evident by PRs not getting merged, I don't have the time and motivation for reviewing external contributions . Let's use this issue to brainstorm for ideas on how that situation can be improved.

Making the review process easier

To easily get PRs into my clone of the pool for review I added a bit of libgit2-based code that pulls a PR into a local branch and merges it with master. Since that feature didn't meet my requirements in terms of robustness, it gathered dust on a local branch. To make this available to other people interested in reviewing PRs, I merged this into master: horizon-eda/horizon@e2d4cd2 but it needs to be enabled manually in the prefs.json

In the grand scheme of things, I see two ways to make reviewing PRs easier:

  • Enhance the pool manger in such a way that the entire PR review process can be done from within Horizon EDA by using the github api.
  • Have a bot that posts renderings of the changes to the PR's comments. Unfortunately, I couldn't figure out a way to do this without running my own server as an untrusted PR needs to trigger scripts defined in a trusted repo that has access to API keys for posting a comment.

More automation

Automatically check for things such as missing items like as PRs just containing a unit or datasheet links pointing to secondary sources. Some of this can probably be supported by the existing GitHub Actions workflow: https://github.com/horizon-eda/horizon-pool/blob/master/.github/workflows/all.yml On the long rung, I'm envisioning something equivalent to Debian's Lintian, but for packages, parts, etc.

More reviewers

To have more people than me having merge permissions, we probably need to work on the open issues on https://github.com/horizon-eda/horizon-pool-convention/ so that no matter who's reviewing the PR, we end up with the same result.

What else?

I don't believe that there's a single solution to this problem, so we probably need to do at least all of the things listed above. Any more ideas?

@carrotIndustries carrotIndustries pinned this issue Aug 30, 2020
@kliment
Copy link

kliment commented Aug 31, 2020

About the untrusted PR, the action functionality in github can run pre-defined tasks and produce artifacts given a docker image. A comment action can access those artifacts and post a comment. The build task doesn't need access to secrets and the API token is only accessible to the (trusted) script posting the comment. Example here: https://github.com/machine-learning-apps/pr-comment

@atoav
Copy link
Contributor

atoav commented Aug 31, 2020

If servers are needed, I am running multiple which I could put the scripts on. Anyways a cheap virtual Hetzner-Ubuntu box costs around 4 Euros per month, that shouldn't be the problem.

I think we also have to talk about the organizational side of things, namely stuff like:

  • Who should be in charge of what and who is in charge of who is in charge etc
  • Who will be allowed to do what
  • What is needed for a PR to be accepted (how many people need to look at it)
  • Are reviewers allowed to accept their own PRs? While this would speed things up it is probably not a good idea
  • How and wehre is this all clearly documented
  • Once there is a process that works, having tutorials helps people to grok things

Maybe a few things could also be done as a concerted effort (like creating and reviewing the most commonly used packages). If you know the package is alright reviewing has already become a lot easier.

This of course all ties in with Automation. Investing a little more time on doing that properly now is probably saving many people a lot of time and energy in the long run so it is definitly something that pays off. Automation is, what would enable such clear and easy workflows. If a lot of that could become part of the pool manager itself that would be even greater. I am gladly using git from the shell to do things by hand, but I'd doubt that this is a very common thing among electronics people.

@fruchti
Copy link
Contributor

fruchti commented Aug 31, 2020

A streamlined PR reviewed process would greatly benefit the pool. I'd imagine stalling PRs without apparent reasons can easily demotivate users and are, in a way, holding Horizon back because there are not as many contributions as there could be.

As @atoav mentions, the details need to be worked out there, but there needs to be a common, agreed upon, base for all decisions, too. Let me know if I can do anything for the convention or its issues to get the discussion there going again.

As we are talking about processes, I also wanted to come back to the possibility of a multiple ‘stages’, ‘channels’, or however you wanted to call them, containing parts with different levels of dependability:

Here I had left some ideas how the review process and the responsibility for part correctness could be split up among users. I think it is still relevant as it could make the reviewers' lives easier, who wouldn't have to shoulder everything, while providing more probably-correct parts for end users.

In a nutshell, my idea was to have separate ‘testing’ and ‘stable’ pools (branches of the repository or with some built-in tagging system). Contributions could be merged into ‘testing’ with a minimal amount of sanity checks from reviewers, where they would be available to end users after an explicit opt-in. Users building their schematics and layouts with these untested parts can report back in the part's tracking issue, testifying for what they validated. As soon as every item on a PR's check list (footprint, pinout, 3D model, …) has been independently validated, it can be merged into ‘stable’.

This could already be done today, with a ‘testing’ pool including the main ‘stable’ pool, without changes to Horizon. If we were to adopt something like this, it still might be a good idea to think about UX and possible direct integration into Horizon first, though. I would, for example, imagine a simple flag ‘confirmed’ for each unit/entity/symbol/etc.

The current ‘PR load’ isn't anywhere near high enough that a small team of reviewers couldn't be able to shoulder it, and letting average users cross-check parts instead of just trusted reviewers might be debatable, but I think a process like this could greatly benefit the number of available parts for end users (if they are willing to cross-check everything, which most people probably do anyway).

@carrotIndustries
Copy link
Member Author

The build task doesn't need access to secrets and the API token is only accessible to the (trusted) script posting the comment. Example here

Doesn't this just shift the issue? How does the comment-posting script ensure that it doesn't get spam to post?

If servers are needed, I am running multiple which I could put the scripts on. Anyways a cheap virtual Hetzner-Ubuntu box costs around 4 Euros per month, that shouldn't be the problem.

Thanks for the offer, but I try to stick with manged solutions such as github pages, actions, discourse, etc. that don't require any sysadmin work. Having though about this, a bit I think I have something that could work even though it doesn't seem particularly elegant:

  • Webhook in the horizon-pool repo
  • That triggers an aws lambda function which then in turn
  • calls the repository_dispatch handler of a workflow in a trusted repo that'll then post to the PR.

@kliment
Copy link

kliment commented Sep 1, 2020

The comment posting script only posts a comment with a fixed template and the output artifacts from the PR. The script itself is using the version from a predefined branch rather than whatever comes in the PR. That's a weird configuration, which is why I linked to an example. If someone is motivated enough to provide their spam in the form of a horizon library item, well I guess we just have to live with that.

@atoav
Copy link
Contributor

atoav commented Sep 1, 2020

That could be solved as well by having the AWS labda script look at the presence of certain markers or flags that only a privileged user is allowed to add. That would make this a semi automated process though.

I agree with @fruchti that conforming to the template is enough of a hurdle to get rid of random bots and low effort spammers. What might be more problematic would be well meaning users posting copyrighted material (e.g. 3D models), because that stuff isn't easy to figure out automatically.

Edit: I offered my servers because I already do the sysadmin work on them, however a AWS lamda would be a good fit as well

@atoav
Copy link
Contributor

atoav commented Sep 1, 2020

Channels UX

The idea to have multiple "channels" for parts makes a lot of sense, because then users won't have to wait to use their parts, while still being able to contribute. If we can manage to basically distribute the work of checking onto the users as well, that would probably be the most scalable approach. This however would mean a lot of streamlining when it comes to UX:

  • It must be easy to opt in an out of different channels
  • If parts are unchecked this must be somehow displayed
  • If someone checked a part, successfully manufactured a board or found a fault, this shouldn't be more than a few clicks as well (and in the backend we could just keep a tally of the number of checks and successful usages)
  • Maybe it makes sense to list "my parts" – so stuff people contributed to build a sense of ownership
  • The process of moving stuff from one channel to another must be seamless
  • The process of updating pools and updating projects that depend on it must be seamless as well

I also had to think about the way LibrePCB does this:
library_manager
This is obviously a different concept given the different structure, but I think it is well worth it thinking a little bit about what the usability pitfalls with using multiple pools at the moment are and grinding down the edges that exist.

Because the experience with multiple pools as for now is like this:

  1. open horizon, see one or more pools
  2. create a new pool that is meant to be included in the official one and edit it
  3. realize you won't be able to depend on packages etc from the official pool
  4. ???

In a lot of cases it might be beneficial if the included/child pool could inherit from the parent or am I overlooking something? As for now parts/packages/etc of included pools are colored differently in the pool editor – what if one would be able to select whether a new part/package/etc should be created in the parent or one of the child pools?

One problem I see with that, is that the child pool doesn't know it is a child of anything, so if one were to edit it separately it would probably scream in horror about missing references.

Git integration

I think the current git integration would need an major overhaul as for now it "works", but a lot of things are missing (e.g. ability to review PRs as mentioned). When I click the Git tab on my pool, the program freezes for 15 seconds, my fan starts to spin and then eventually I see something.

I know the whole thing is probably a bit of an hairy issue, that isn't too much fun to work on, but It is definitely something that should be tackled given the role it plays for the development of the whole pool ecosystem. And precisely because of the hairy nature having vision and some sort of goal to strive for is needed.

@fruchti
Copy link
Contributor

fruchti commented Sep 1, 2020

What might be more problematic would be well meaning users posting copyrighted material (e.g. 3D models), because that stuff isn't easy to figure out automatically.

Very good point. There are also bots which make agreeing to the licence a prerequesite for contributions, i.e. contributers would have to sign they have the rights to redistribute their 3D models under the pool's licence. As an example, this one is the first one I came across: https://github.com/finos/cla-bot

@carrotIndustries
Copy link
Member Author

The comment posting script only posts a comment with a fixed template and the output artifacts from the PR. The script itself is using the version from a predefined branch rather than whatever comes in the PR. That's a weird configuration, which is why I linked to an example

I'm probably still not getting it: We add a workflow to this repo that gets triggered by forked PRs and doesn't have access to any useful secrets (the github_token is readonly), so how is this workflow supposed to to anything privileged such as commenting on a PR?

No matter how it's going the be deployed, I think that a bot that does as much linting as possible is something that'd help a lot as the contributor will get immediate feedback on their PR and some common mistakes can be caught without opening the pool manager.

When I click the Git tab on my pool, the program freezes for 15 seconds

You should probably create an issue for that. Is any of the two lists particularly long?

Re multiple pools: One workflow that's currently supported, but not really intuitive is to create a pool for each project and mix-and-match pools from other users.

As far as the git integration goes, for users not familiar with git, there's the "remote" tab that'll appear when downloading the pool using the pool manager.

@kliment
Copy link

kliment commented Sep 2, 2020

I'm probably still not getting it: We add a workflow to this repo that gets triggered by forked PRs and doesn't have access to any useful secrets (the github_token is readonly), so how is this workflow supposed to to anything privileged such as commenting on a PR?

The workflow gets triggered by PRs. Said workflow has access to secrets, and can do privileged stuff, but you cannot modify the workflow itself from a PR without approval so it can neither leak secrets nor do anything you haven't preconfigured. The key thing is that the workflow is specified to use a particular version or branch when it's triggered. Separately from this, you run a docker image (that you control) that also cannot be modified from a PR without approval, that does whatever you like in the background linting and rendering and all, and produces a set of artifacts (images, error/warning lists, whatever) that has access to the PR's changes. Your commenting workflow gets access to these outputs, and can incorporate them into a comment in any way it wishes. The PR author has read-only access to the workflow script, read-only access to the docker image, and no access to the secrets. The workflow has access to the docker thing's output, and posting rights. The docker thing has code you control and data coming from the PR. Of course, both the workflow itself and the docker image it uses can be changed by a PR, but those changes are not effective (do not get run) until and unless a person with commit access approves them. The do get run in the submitter's repo, but don't have access to your repo's secrets.

Hope this helps - like I said it is confusing, this the example implementation

@carrotIndustries
Copy link
Member Author

The workflow gets triggered by PRs. Said workflow has access to secrets, and can do privileged stuff, but you cannot modify the workflow itself from a PR without approval so it can neither leak secrets nor do anything you haven't preconfigured.

As far as I understand workflows, they can be modified by PRs as well. The docs also state that the github token is read-only for forked PRs: https://docs.github.com/en/actions/reference/events-that-trigger-workflows#pull-request-events-for-forked-repositories Secret's aren't passed either: https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets#using-encrypted-secrets-in-a-workflow

@kliment
Copy link

kliment commented Sep 2, 2020

From your link: When you create a pull request from a forked repository to the base repository, GitHub sends the pull_request event to the base repository and no pull request events occur on the forked repository.

The example I linked responds to that event - in our repository, not the forked one. And then we can do all the things we want. GITHUB_TOKEN is passed to the runner, but the runner runs in our context not the forked repo's. GUTHUB_TOKEN gets passed to workflows that are triggered by PRs, but the repo where the PR is coming from cannot access our GITHUB_TOKEN. And of course a PR can modify the workflow, but only once it's merged. Pre-merge the workflow that gets run is the workflow that's in the base repo, even though it gets runs on the changes in the PR. So here is what happens:

  1. Attacker forks repo at commit 0xg00dc0de, makes changes to workflow in commit 0xbadc0de, sends pr
  2. Our workflow in the base repo gets run, triggered by a pull_request event
  3. Our workflow triggers a docker image that runs the pool manager and has access to the new commit's data, and generates a bunch of files as output
  4. Our workflow then triggers a script (which can live in a separate repo or branch) that downloads said files, and uses GITHUB_TOKEN to post a comment with them. The docker image never seen GITHUB_TOKEN and the pr author never sees anything except the result
  5. We look at the commit that does evil things, and reject it
    So the workflow that gets executed is always one we control, and the code that accesses GITHUB_TOKEN is under our control and not the same code that accesses the attacker-controlled data.

Yeah, I know it's confusing and underdocumented. But it does actually allow a sensible workflow.

@carrotIndustries
Copy link
Member Author

Is just found https://github.blog/2020-08-03-github-actions-improvements-for-fork-and-pull-request-workflows/ The pull_request_target event does exactly what I need without any Rube-Goldberg contraptions as it uses the workflow defined in the repo rather than the one from the merge commit.

@carrotIndustries
Copy link
Member Author

I tested pull_request_target and it works as advertised: carrotIndustries/horizon-pool-test#10

@carrotIndustries
Copy link
Member Author

We now have a rudimentary bot that can bet triggered by a PR comment that starts with Bot!, like this: #160 (comment)

@viniciusmiguel
Copy link

I believe the concept presented by @atoav is essential to increase the traction on this EDA. Users should be able to use different pools on the same project, this would solve the PR waiting list time without depending on a centralized review committee to agree on rules or check the requests.
The strongest point of Horizon is its robust component management but would be its weakness if there is a lot of friction to have an up to date pool.

@jpleger
Copy link

jpleger commented Jan 22, 2021

Sorry for resurrecting this.

I was thinking about this a bit more tonight. I'll preface this by saying that I am by no means using Horizon on my day-to-day, but have been really enjoyed using it to compare using it vs other EDA tools on a couple small projects. I think that the library/pool management is critical to the success of an EDA tool and I am largely a fan of the way horizon does it.

By far (vs. Altium or Kicad for example), my favorite has been the way that Horizon manages libraries/pools. I think leveraging git to version control and manage revisions is the right approach. The big thing that strikes me as missing is the fact that you can't mix/match repos. This almost brings up a philosophical debate on the merits of a monorepo vs a polyrepo approach with regards to managing pools.

In my mind, there is a strong argument to be made to use different repos, but in a structured/templated way. The biggest advantage of this is that a centralized GitHub action could be created and managed by the project that performs all the linting/testing no matter where the repo lives. For example, if you have a private pool, you could manage this in your personal GitHub account and leverage the projects linting/checks CI for new commits. GitHub has the concept of Template repositories, so you could have a structure that is sane by default and include the GH Actions by default.

If we had the ability to cross-reference items in pools, like via pool uuid as a namespace, for example: 6d752bc3-29b3-45df-8cf2-1e3aacefbcc6:762c84c2-187d-454a-af81-d3381bec5257 would reference a hole padstack in the default pool. This could open up a whole new way of managing pools. I could see there being a "base" or common pool, where things like padstacks (circular holes, smd rectangles), common symbols/units (fets, connectors, opamps) and packages could live. You could also create something like a depends_on list of UUIDs that the pool requires.

Then, you could organize repos with a bit more flexibility.

For example you could do something by vendors:

Pool Name Description
pool-base Common symbols, padstacks, etc.
pool-generic Generic parts/standard footprints (555, fets, etc.)
pool-texas-instruments-pool TI specific parts
pool-analog-devices Analog devices
pool-st-micro Analog devices

Or you could do something by category:

Pool Name Description
pool-base Common symbols, padstacks, etc.
pool-jedec JEDEC standard footprints
pool-connectors Connectors
pool-passive Passive components (resistors, caps, etc.)
pool-decals common decals/graphics

Or it could be done with a combination of the two, where maybe certain vendors with large product sets, or where the pool is automatically generated via scripts. Then you could have those scripts live with the repo and get ran via scheduled ci action.

The other thing that might be interesting is that in the future, it might be possible to have a JSON of the rules/checks in the repo and then both horizon itself and the CI checks could reference those rules/checks.

Lastly, 3d models could live in separate repos as well. This way if someone didn't want the models, just the symbols/footprints, they could chose not to even download the 3d models.

All in all, I think there would be some major advantages to allowing multiple pools being used and/or being able to cross-reference items between pools.

I could see it being phased in by adding a few new features and then later on restructuring the pools. That way it isn't too much of an undertaking at once.

  • Add the ability to create cross repo items (reference padstacks, symbols, units, etc.)
  • Create a new dialogue for selecting default repos, maybe reference in the pool JSON a dict of related repos and descriptions.
  • Add rule checks inside the pool.json file (essentially linter rules)
  • Create/design a new GH action that uses the pool.json for checks in GH on PRs (there are ways to include the images for PR reviews in a different branch for example)
  • Refactor pools using (vendor or category)
  • Add related pools to the pool-base as they are refactored

@carrotIndustries
Copy link
Member Author

The big thing that strikes me as missing is the fact that you can't mix/match repos.

You can in fact, even it's a bit clunky: Pools support including other pools (settings tab in the pool manager). So if you want to keep your parts in your own repo rather than than messing around with branches, you can create your own pool and have it include the default pool.

If we had the ability to cross-reference items in pools

That's already given by the included pools mentioned above. Since UUIDs are globally unique, there's no need for namespacing. Right now conflicting UUIDs are resolved by having the item in the local pool override the item from the included pool. We can do to the pool whatever we want as long as the new implementation implements the IPool interface: https://github.com/horizon-eda/horizon/blob/master/src/pool/ipool.hpp

My initial reservation against having multiple pools was that it leads to fragmentation. However, I think that this just makes the fragmentation that will happen/has happened more evident. Another plus for multiple repos is that I don't think that performance enhancements will keep pace with the pool's growth. Even right now, downloading the pool repo on windows (slower disk io than linux) isn't as fast I want it to be. With the pool on a spinning disk, it's even worse.

Lastly, 3d models could live in separate repos as well. This way if someone didn't want the models, just the symbols/footprints, they could chose not to even download the 3d models.

As they take up a nontrivial amount of space, this could be the right approach. I however want to avoid messing with path variables as it's done in KiCad. If we want to do this, each pool (repo) can optionally have single secondary repo that contains the 3D models.

Some ideas for implementation:
I think it makes sense to mix-and-match pools on a per-project basis, so each project will be accompanied by a project pool that'll include the pools needed for the project. That however will add another indirection for getting items from pools, i.e. pool → project pool → cache → schematic. To simplify this, we could try to merge the cache and the project pool as in the project pool also being the cache. Using an item from a pool would then automatically copy it into the project pool.
I see several advantages that this approach has:

  • It's easy to make one-off modifications to items as it's unfortunately needed sometimes, such as packages for RF components that are tuned to a particular stackup
  • Project-specific items such as footprints for zebra-strip connected LCD panels can be done with low overhead
  • The search in the part browser will reflect the contents of the cache

An implementation obstacle we need to overcome is that we now have to take care of updating the project pool to reflect changes made in the upstream pools. Right now, updating a pool with included pools completely ignores the pool database created by the included pools and looks at every item for itself. Before that implementation, I was experimenting with merging the pool on-the-fly by means of SQL views. That however turned out to be a bit too fragile. A halfway-between solution could be to just copy the rows from the included pools databases to the project pool.

@atoav
Copy link
Contributor

atoav commented Jan 23, 2021

I include another pool in my forked horizon-pool (work branch), for very odd parts, quick and dirty fixes or personal decals that have no place in the official pool. My forked pool only differs from the official pool in including the changes that have not been accepted as a PR yet (so I can e.g. add parts when I need them, use them right away and still contribute to the official pool).

In the included pool I had to duplicate padstacks, symbols, units, entities etc. from the official pool that I wanted to reuse. This works, but offloads a bit of cognitive load onto the user. It is not immidiately obvious how one would do that, and as mentioned maybe it would be good if:

  • every pool could make use of a set of very common units, entities, symbols, padstacks
  • matching and mixing pools would be easy (as in: one doesn't has to think a lot about intended and unintended side effects)
  • Easier edits to parts on a per-project-basis. Imagine you place a part, realize you want the footprint to be slightly more universal, and just right click > Edit Package, edit it in the Package Editor, and then have the ability to choose whether this affects only that specific part, all parts with the same MPN, all parts with the same package or save it as a package variant or a new part into the pool.

So I really agree to the plan to make these changes.

Breaking out 3D-parts into their own repo/pool/bucket makes a lot of sense too, although I am not sure what would be the smartest solution there in terms of topology.

@carrotIndustries
Copy link
Member Author

In the included pool I had to duplicate padstacks, symbols, units, entities etc. from the official pool that I wanted to reuse.

Why not include the default pool instead?

I'm aiming for a structure like this:
2021-01-23-131845_1038x487_scrot

Breaking out 3D-parts into their own repo/pool/bucket makes a lot of sense too, although I am not sure what would be the smartest solution there in terms of topology.

Thinking about this a bit, this significantly complicates pull requests as we now have two repos to deal with. Could be solved by a bot though that automatically merges the corresponding PR (how should they be linked?) in the 3d models repo.

@jpleger
Copy link

jpleger commented Jan 25, 2021

Thinking about this a bit, this significantly complicates pull requests as we now have two repos to deal with. Could be solved by a bot though that automatically merges the corresponding PR (how should they be linked?) in the 3d models repo.

Honestly, I think having separate PRs for 3d models vs. the part stuff isn't that big of a deal. I would assume there are others like myself who just care about the package/part side of things and there are others who are focused on 3d rendering of boards. TBH the only time I use 3d view is to take a look and make sure vias are tented appropriately and my soldermask looks good before sending out to be manufactured.

My understanding is the 3d model directory doesn't have JSON objects with it, but does have a UUID that is inside the package json. Would it make sense to pull the JSON out of the package and basically put a JSON object that lives with the 3d model and then just reference it in the package JSON by UUID instead?

Also, with 3d models, would want to check/lint based on a couple different things like: do you have a license for the step file, what is the license for it, where did it originate from(url), etc. But honestly, that could be added to the JSON file and then would be simple to check against it.

In the future it would be possible to generate lists of packages that don't have 3d models and track those inside the 3d model repo. Someone who is proficient with 3d models could tackle those or associate with packages that are already created.

@carrotIndustries
Copy link
Member Author

but does have a UUID that is inside the package json.

3D models are referenced purely by filename. The UUID in the package is used to tell multiple models of a a package apart. If the same model is referenced by multiple packages, it'll have a different UUID.

TBH the only time I use 3d view is to take a look and make sure vias are tented appropriately and my soldermask looks good before sending out to be manufactured.

If your PCA ends up being more than an evalboard chances are it'll have to go in some sort of enclosure and you need to import the PCA into mechanical cad. To me at least, 3D models are an integral part of any modern PCB design tool.

Apart from the pull requests, we'd also need to integrate the separate 3d model in several other places:

  • pull request review bot
  • remote tab. this one's probably particularly difficult as it needs to juggle with two repos and not lose track of what it's doing
  • pool download

We'd also need some way of assigning the 3D model repos to the regular ones. I've though of making them a submodule, but that still adds considerable overhead.

The main/only(?) argument for separating them is disk space and download size. Right now, the pool contains 80MB of 3D models for vertical pin headers which isn't all that great. By splitting these pin headers into a separate pool/repo, we already get a significant size reduction as users who don't need these can just not download that pool.

@atoav
Copy link
Contributor

atoav commented Jan 26, 2021

I also think the cleanest solution by far is to have the 3D-models next to the packages in the same pool.

Sometimes I'd wish for a way to procedurally generate the geometry on the fly, because then we'd only have to store the code that generates the geometry, and not the resulting data. This would also allow tuning in the resolution of the resulting meshes. For very generic parts (SMD) this could even be part of the existing footprint (or a new set of generators). For more complex geometry step is still going to be the way in the end, but many things would get easier with such a feature IMO.

@kliment
Copy link

kliment commented Jan 26, 2021

We can procedurally generate geometry - that's how a lot of kicad models are built too. The problem is that they use cadquery, which is a horrible pain to install (recommended method is using conda, which is a horrible pain in general).

@carrotIndustries
Copy link
Member Author

this would also allow tuning in the resolution of the resulting meshes.

There are no meshes in STEP. It uses boundary representation.

the problem is that they use cadquery, which is a horrible pain to install

Exactly what I wanted to say :)

Some more specific thoughts on what will most likely get implemented: The key to make using multiple pools in project feasible will be replacing/enhancing the pool cache by a special project pool. The pool will look like this on disk:

  • parts
    • cache
      • e4218da5-4136-487a-92a4-1c94b4049289.json
      • ... other items copied from the included pools
    • foo
      • my_cool_part.json
  • 3d_models
    • cache
      • uuid of the pool the model came from
        • complete path to the model
    • my_model.step

Items copied into the cache will get some additional data such as the original path and the pool they came from. Padstacks will get another field to indicate which package they came from.

I've thought about preserving file names and paths in the cache, but that can get difficult when items get moved.

One issue that we need to solve is that calling get_symbol for example on the project pool must not always cache the symbol. Looking at a symbol in the place symbol dialog should not cache it. Only placing it on the schematic should cache it.

@carrotIndustries
Copy link
Member Author

Apart from the implementation details, we also need to figure out how to organize these multiple repos. From my perspective, grouping solely by manufacturer or device type doesn't really cut it. TI for example makes almost everything that can be made out of silicon.

Some ideas for what should go in its own pool

  • MCU/FPGA series, i.e. have a pool each for STM32, AVR, MAX10 FPGAs etc
  • Generated passives
  • Generated connectors
  • Crystals/oscillators
  • Opamps
  • Variable resistors
  • Data converters
  • Voltage regulators (should we split this any further?)

Any suggestions what to do with odd devices like the Si53306-B-GM clock buffer or the LCR-0202 Vactrol analog optoisolator?

I have plans for providing a centralized pool registry to provide an index for installable pools in the application. It'd also be nice to have a way to find parts among all pools without having to guess which one to download.

@jpleger
Copy link

jpleger commented Jan 27, 2021

Re: organizing... I think that makes a ton of sense. For the 2 examples, you could have a timing repo and then put crystals/oscillators in there as well. Likewise you could have a optical (optoisolators, LEDs, laser, etc.) repo or you could do isolation (maybe transformers go in there too).

Honestly, there are always going to be oddball devices, so having a catchall or "uncategorized" repo might not be a bad idea. Then, if a logical group starts to form, it can be split into its own repo. Having the ability to re-organize items in pools will be extremely important. Having a bot command to move the JSON to a different repo might be a really nice thing to have for this.

If/when this change happens, I can help out with that as well, just let me know how you want it done and I can work on it, since I know it can be tedious. I can also go through some distributor websites and just start randomly picking parts and see how the categorization would hold up and start documenting types/categories of things and where they go (which repo).

I think having documentation on the workflow/examples and maybe a video might be really helpful for folks who are new to horizon.

In my mind some additional or renamed categories that form (by no means is this inclusive):

  • Passives (non generated, probably specialty)
  • Timing (crystals, oscillators, clock sources)
  • Amplifiers (opamps, but also audio, rf)
  • Analog (ADC, DAC)
  • Sensors (encoders, motion sensing, current sensing, gas, lidar/ToF)
  • Potentiometers/variable resistor
  • Optics (LED, LCD, laser diodes, optoisolator)
  • Transformers
  • Interfaces (drivers, multiplexers, I/O expansion, usb/network phys)
  • Memory (controllers, eeprom, flash, sdram)
  • Logic (gates, shift registers, flip flops)
  • Filters
  • Radio (mixers, transceivers, smd antenna, balun, emi shielding)
  • Modules (ublox, laird, esp)
  • Power (Switching, LDO, voltage reference)
  • Switches (physical switches)
  • Buttons (physical buttons)
  • Relays

I think the important thing is to just have an index of where the UUIDs live pool-wise, like you mentioned, that way they can be re-organized as appropriate.

@atoav
Copy link
Contributor

atoav commented Jan 27, 2021

Thanks for the clarifications. I think beyond a certain granularity deciding where a new part shall go will become a challenge for users. The strength of having multiple repos is that special purpose repos can be drawn up more quickly and that no full consensus has to exist about how to do it.

E.g. Bob could decide he needs a valve guitar amplifier repository, which only collects parts used for tube amps. For those who make tube amps this might be incredibly valuable, while those who never even think about them it is out of the way. Sorting those tube sockets, specialized capacitors etc by their oddball manufacturers would result result in fragmented repos nobody needs.

I think the most useful order would orient itself towards usage. I think the proposed categories are not a bad start. However it must be very easy to use all, mix and match, etc. Maybe drawing a distinction between offical and 3rd party makes also sense.

@jue89
Copy link
Contributor

jue89 commented Jan 27, 2021

To be honest, I don't like the idea of splitting the pool by categories. I see problems with the packages. Some common packages like SSOP or SOIC will be required in several categories. How to handle this?

  • Duplicate them? In that case modifications must be spread across all repos.
  • Cross-reference them beyond pool borders? Where is their home? A special packages repo? In that case this repo will gain a lot of size over time as packages get added with 3D models.

I'd rather change the way how 3D models are stored. How about git lfs for step files? Cloning should get fast again. The 3D models are downloaded once they are actually needed.

@RX14
Copy link
Collaborator

RX14 commented Jan 27, 2021

While being able to use multiple pools is a good idea for sharing internal symbols and footprints inside a group of projects or company, I'm not sure that splitting up this pool into smaller categories would really help with the contribution workflow. The way I see having a split pool benefiting contribution is allowing delegation of various component types to various people, but if these split pools are still official, is there any guarantee that maintainers would be found for these pools? Maintainership of different parts of the pool by different people is still possible with one repo.

In short, the problem seems more of a human organizational problem than one which should be solved by adding more pools and more features to horizon (though these features will most certainly be welcome and have other usecases). Having multiple official pools definitely has some disadvantages, and I'd tend to believe that adding more maintainers to this repo will have more of an effect for less effort than splitting this pool up.

@kliment
Copy link

kliment commented Jan 27, 2021

There have been some developments with cadquery I was not aware of. Apparently this is now a thing https://github.com/CadQuery/cq-cli/releases and it packages everything necessary, entirely removing the need to install it. It's still a massive pain to keep up to date, but much less so than before. It's very limited compared to the main cadquery, but there's just enough functionality there to be able to generate STEP on demand. In addition, I've engaged a friend to look into the build process of OCP and cadquery in order to make it more sane and less conda-dependent. Perhaps this will enable a more practical solution.

@carrotIndustries
Copy link
Member Author

Thanks for everyone's ideas and suggestions. This reminded me that one goal of the pool was to avoid inventing a taxonomy for parts. Splitting up the pool doesn't really line up with that. However I think it still makes sense to factor out large numbers of generated items as not all users might need them and it slows down the part browser.

Duplicate them? In that case modifications must be spread across all repos.

Of course not.

Cross-reference them beyond pool borders? Where is their home? A special packages repo?

Yes, that's the point of including pools in other pools.

However, having a separate packages pool could make contributions much harder than they are right now as we'd need to tell users that the new generic package they're adding in a PR needs to go in a different repo.

How about git lfs for step files? Cloning should get fast again. The 3D models are downloaded once they are actually needed.

I don't really see how LFS will improve things in our case. 3D models don't really change all that often so cloning the repo doesn't download that many unneeded files. It also looks like libgit2 (used by the pool manager for downloading pools) doesn't really support LFS.

Some 3D models (basically everything you get from manufacturers) don't allow for redistribution. However, in some cases the STEP file is available for download in way that can be automated, such as https://www.molex.com/pdm_docs/stp/500901-0801_stp.zip We could then put this information in the pool in some way to automatically download the models (if needed?). This feature could also be used to automatically pull in 3D models from another repo or so.

Maybe drawing a distinction between offical and 3rd party makes also sense.

Definitely. I'm also thinking of having Horizon EDA register a url scheme on such as horizon-eda-pool on installation that random people's pools can link to to trigger downloading/cloning that pool.

Having multiple official pools definitely has some disadvantages, and I'd tend to believe that adding more maintainers to this repo will have more of an effect for less effort than splitting this pool up.

There certainly some truth to that point that we're trying to solve a human problem through technology...

As you can see, I haven't really made up my mind yet on how to move forward on this issue. Regardless of how this will pan out, I think that converting the project cache to a pool is a good thing to do as this will make the part browser more consistent since it'll then show the items as they're cached, even if they're not present in the actual pool. We can also use this opportunity to fix some buggy behavior such as looking at frames in the schematic properties dialog resulting in them being added to the cache.

@jue89
Copy link
Contributor

jue89 commented Jan 28, 2021

However I think it still makes sense to factor out large numbers of generated items as not all users might need them and it slows down the part browser.

Hmmm, I guess the amount of parts imported into Horizon will grow either way. Generated parts are mostly passives like resistors and capacitors - and every project need them. Furthermore, most users won't throw out pools after they've finished a project.

Wouldn't it make more sense to optimize the part browser? Like grouping parts by their first tag and display these groups collapsed if nothing has been filled into the filter above? We could end up with having a hand full of categories. Furthermore, this will make the parts more searchable, since at least the first tag must be picked carefully.

@carrotIndustries
Copy link
Member Author

Generated parts are mostly passives like resistors and capacitors - and every project need them.

Depending on what a user is building and their background, they might be somewhat overwhelmed by finding 80 100nF 0603 capacitors. For them, we could offer a pool that contains generic parts.

Furthermore, most users won't throw out pools after they've finished a project.

Suppose, we also add a whole bunch of through hole resistors as a pool and I know that my project won't need them, I could opt not to use the pool with the TH resistors in the project. That way, the part browser doesn't have to cope with parts that are never needed.

Like grouping parts by their first tag

Unfortunately, tags aren't ordered: https://github.com/horizon-eda/horizon/blob/7c3a8361fda4a599141b6703c37d40a1ef990424/src/pool/part.hpp#L49

@atoav
Copy link
Contributor

atoav commented Jan 29, 2021

Depending on what a user is building and their background, they might be somewhat overwhelmed by finding 80 100nF 0603 capacitors. For them, we could offer a pool that contains generic parts.

I think the user story for "I want a part with a precise MPN" is quite good already. Users that just want "a resistor" (because they only draw a schematic) or just "a 0805 resistor" because they didn't decide on naything yet (or aren't experienced enough) have a harder time tho. You are probably aware of circuitJS where you can scroll to set a resistor value. Sometimes I wish that I could just draw a schematic up in a similar fashion and decide which precise MPN 100k-resistor I want to take afterwards.

Suppose, we also add a whole bunch of through hole resistors as a pool and I know that my project won't need them, I could opt not to use the pool with the TH resistors in the project. That way, the part browser doesn't have to cope with parts that are never needed.

I don't think we would need multiple pools for this tho. E.g. if all TH resistors have a tag called th, wouldn't it then be enough to allow users to white- or blacklist certain tags? Eg if you blacklist the tag th the pool could skip all parts that have that tag. How to do this efficiently is a different question, eg filtering once, building a cache and then search that cache etc).

The good thing is that horizons pool model is flexible enough to improve in all directions. We really need to figure out which direction is good for an assumed groth of the pool.

@carrotIndustries
Copy link
Member Author

carrotIndustries commented Mar 5, 2021

The recent commits add support for project pools and thus make it feasible to use multiple pools in a project.

@atoav
Copy link
Contributor

atoav commented Mar 6, 2021

Very cool, thank you. I wonder: Maybe we have to add a new section in the docu (next to Why a Pool?) that is about pool organisation? So how to structure things, how to include things, explaining project pools and so on. Because right now even to me as a long time user/contributer it is kinda foggy what is possible and what is actually a good way to do things etc.

@fruchti
Copy link
Contributor

fruchti commented Mar 6, 2021

Awesome! This solves my remaining annoyances with part and project organisation. ‘Missing in pool’ items don’t seem to show up in the project pool cache tab, though?

@carrotIndustries
Copy link
Member Author

carrotIndustries commented Mar 6, 2021

Missing in pool’ items don’t seem to show up in the project pool cache tab, though?

Good catch, fixed in horizon-eda/horizon@9df1dfa. Missing items are told apart from items only in this pool by being in the cache subdirectories.

To anyone, who already migrated projects to the project pool, I noticed that packages didn't get migrated correctly. To fix this, delete the pool subdirectory and reset the version in the project file to 0 to redo the migration.

@fruchti
Copy link
Contributor

fruchti commented Mar 6, 2021

Missing items are told apart from items only in this pool by being in the cache subdirectories.

Should it be noted somewhere that regular project pool entries shouldn’t reside in a subdirectory with this name?

Another nitpick: Cached items show up as overridden (blue) in the lists, which is a bit confusing.

@carrotIndustries
Copy link
Member Author

Should it be noted somewhere that regular project pool entries shouldn’t reside in a subdirectory with this name?

Yes, probably in the same place as:

I wonder: Maybe we have to add a new section in the docu (next to Why a Pool?) that is about pool organisation?

Another nitpick: Cached items show up as overridden (blue) in the lists, which is a bit confusing.

With horizon-eda/horizon@119d3f3 they're differentiated from overriding items if the pool isn't a project pool. This only takes pool UUIDs into account, so items missing from the included pools will still show up as green.

@carrotIndustries
Copy link
Member Author

Now that we have the support for multiple pools somewhat in place, I think we can go into detail on what should stay in the main pool and what not. From my perspective, we should keep as much as possible in the main pool and only factor out specific items:

  • Generated passive parts as these are detrimental to performance due to their sheer quantity
  • Parts that aren't available at major distributors and are of niche interest, such as the AS3320 Voltage Controlled Filter

To get access to the generic items in the main pool, all of the extra pools will include the main pool.

@bob-u
Copy link

bob-u commented Mar 13, 2021

I got here while looking for generic parts...here's my 5c

I think any type of rigid part organization structure adds complexity for users because there's no one fits all scenario. It should be up to user to organize their library the way it fits their needs. Also, for growth and popularity of Horizon EDA community should be able to share parts, review, and comment. Therefore, an important feature would be an easy mechanism to move parts between pools individually and in batches to create new pools.

@fruchti
Copy link
Contributor

fruchti commented Mar 14, 2021

  • Generated passive parts as these are detrimental to performance due to their sheer quantity

A few of these will be needed in pretty much every project, though, so you’d include the ‘passive components’ pool every time. At most, there could be separate THT and SMT passives pools because you’d likely only need one.

I think any type of rigid part organization structure adds complexity for users because there's no one fits all scenario.

Exactly. The main problem of other component libraries the flat pool structure solves is the ambiguousness of ways parts can be grouped together. With a system with sublibraries, you can at least click through a few until you find the part you want but pools are completely opaque until you include them in a project. If the subdivision of the pool should be useful, the resulting ‘categories’ must be overlap-free and the component association intuitive to beginners.

  • Parts that aren't available at major distributors and are of niche interest, such as the AS3320 Voltage Controlled Filter

Currently, I add parts I’m unlikely to need in another project to the project-specific pool but it’d be very nice to have a way to share ‘niche’ parts without clogging the main pool.

Another idea: the part browser could have inverse filtering, too, perhaps by tag. By default, a list of tags would be excluded from display and search, e.g. ‘resistor’ and ‘capacitor’, which I’d assume would be good for performance. Because the filter is shown directly below the search fields, it is also more discoverable than additional pools you’d have to include.

@jue89
Copy link
Contributor

jue89 commented Mar 14, 2021

  • Generated passive parts as these are detrimental to performance due to their sheer quantity

Is this the main problem here? Isn't there a way to optimize GTK's list view? From my web development background, I know techniques like lazy loading. Is the used list view capable of something similar?

@carrotIndustries
Copy link
Member Author

Is this the main problem here?

It also makes pool updates slower. I've already played with putting the SQLite action into its own thread, but figured the complexity wasn't worth it yet.

At most, there could be separate THT and SMT passives pools because you’d likely only need one.

Chances are, one also doesn't need passives from every manufacturer under the sun.

Another idea: the part browser could have inverse filtering, too, perhaps by tag.

Sound reasonable, any idea on how the UI should look like?

but it’d be very nice to have a way to share ‘niche’ parts without clogging the main pool.

I'm glad you asked since the latest commit added just that, though there's still some copy writing to be done: Add your pool to https://github.com/horizon-eda/horizon-pool-index/tree/master/pools with the level set to community. Once it's merged an action will pick it up and update https://pool-index.horizon-eda.org/index.json accordingly. This list is then used by the new pools window:

2021-03-15-001326_832x409_scrot

@RX14
Copy link
Collaborator

RX14 commented Mar 15, 2021

Any first idea on what the boundary would look like for "too niche to be in core"?

@atoav
Copy link
Contributor

atoav commented Mar 15, 2021

@RX14 As one of the main oddball contributors (it appears) I'd guessavailability plays a huge role in this. So if the part is only available at a weird chinese Aliexpress reseller, it might be oddball. If the part is not produced anymore but there is still stock being sold (e.g. certain germanium tranistors), it is quite definitly oddball. If it is very special purpose and not being sold on the typical reseller sites (Mouser/Digikey/...) it might be oddball as well (e.g. Coolaudio V2044A).

Drawing the line here will always be subjective I guess.

@carrotIndustries I would consider breaking out all of my audio stuff into an Audio/Synth pool. How would we deal with shared things like symbols? E.g. if I create a OTA Symbol/Unit/Entity for say a LM13700, this would probably be something that should be in the main pool as well? Should I add it via PR to both pools then?

On the issue of inverse filters: I think I proposed somethign similar a while back when I said maybe it would make sense to allow users to configure a set of global exclude/include filters in the preferences. This way if someone feels particularily annoyed by a subset of parts, they could a narrow down with filters what they see on their day to day usage of the software. Maybe adding a visible switch to the part browser to temorarily disable that filter makes sense as well, but I think these filters should be in the preferences and they should be persistent between restarts of the software (as opposed to being visible in the part browser and non persistent between restarts).

@carrotIndustries
Copy link
Member Author

E.g. if I create a OTA Symbol/Unit/Entity for say a LM13700, this would probably be something that should be in the main pool as well?

That part in particular could go into the main pool as it's readily available at digikey and other reputable distributors.

Any first idea on what the boundary would look like for "too niche to be in core"?

I think it makes more sense to decide this based on the part's availability. If a part is particularly hard to get or only available on ebay or so, it shouldn't be in the main pool.

@fruchti
Copy link
Contributor

fruchti commented Mar 23, 2021

Another idea: the part browser could have inverse filtering, too, perhaps by tag.

Sound reasonable, any idea on how the UI should look like?

How about a filter exactly like the ’Tag’ filter, just called ‘Exclude tags’? It would be pre-populated with ‘resistor’ + ‘capacitor’ (+ ‘inductor’?). I’d rather not put very generic/vague tags like ‘passive’ in there by default so users

  • can remove specific exclusions when looking for e.g. an inductor, and
  • aren’t surprised when the search doesn’t bring anything up because it isn’t clear what parts are tagged ‘passive’.

Maybe ‘With tags’ + ‘Without tags’ would be a clearer labelling, not sure there.

@carrotIndustries
Copy link
Member Author

I've created a reviewer team that can add tags to PRs, as of now, it's @RX14 and @sh-ow. Who else wants to take part? @fruchti , @jue89, @atoav ?

@jue89
Copy link
Contributor

jue89 commented Apr 3, 2021

Sure! I can't make any promises regarding the amount of time I can spend for reviews, but I'd love to help out whenever I find some time. The review guidelines are described in the horizon-pool-convention repo, right?

@fruchti
Copy link
Contributor

fruchti commented Apr 3, 2021

Same for me, sign me up!

@atoav
Copy link
Contributor

atoav commented Apr 3, 2021

Ah sure thing, i am all in

@carrotIndustries
Copy link
Member Author

The recent commits move all of the generated passives to their individual repos:
2021-04-05-225116_1122x696_scrot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants