Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split opencog into multiple github repos! #3391

Closed
linas opened this issue Dec 20, 2018 · 16 comments
Closed

Split opencog into multiple github repos! #3391

linas opened this issue Dec 20, 2018 · 16 comments

Comments

@linas
Copy link
Member

linas commented Dec 20, 2018

This is an issue open for debate: there are various pros and cons. The proposal:

Split this repo into multiple parts. These would be:

What's left to be done:

  • ghost
  • openpsi
  • relex2logic
  • chatbots
  • robot-embodiment

The above are fairly tightly knit together in the current code base, so it might not be possible to split them up cleanly into distinct components. So maybe we can leave these, as they are!?

Pros:

  • It makes it easier to find interesting bits, and see how they fit together.
  • Work gets focused on the component that matters, without distraction from other component maintainers
  • Issue list easier to track.
  • Simpler git history
  • Less overall complexity
  • Different repos can have different owners, different merge policies.
  • Developers in one repo don't have to be experts in the other repos.
  • Test breakages in pre-alpha components don't affect mature components

Cons:

  • Its gonna take work to split things apart.
  • Some dependencies might get complicated.
  • API changes in one module (e.g. openpsi) trickle down into ghost, and thus require coordination.

If you have new pros & cons to add to this list, please edit this first top-post. Other comments, including proposals for splitting along different directions, should go into comments below. I expect this to be somewhat controversial. It seems like a good idea, just right now.

@linas linas pinned this issue Dec 20, 2018
@linas linas changed the title Split opencog into six github repos! Split opencog into nine github repos! Dec 20, 2018
@ngeiswei ngeiswei changed the title Split opencog into nine github repos! Split opencog into multiple github repos! Dec 20, 2018
@ngeiswei
Copy link
Member

I think it's a good idea.

The main con is that it's a lot of work, but I suppose it could be done iteratively, taking opencog apart piece by piece. The task could be distributed across the maintainers of each part, for instance I would be in charge of moving the pattern miner and pln to their own repositories, @misgeatgit would be in charge of moving attention allocation, etc.

Worth considering at the very least...

@vsbogd
Copy link
Contributor

vsbogd commented Dec 20, 2018

I am not so involved into opencog repo support and it is not clear to me which particular things will be simplified. Do you mean that (1) it will allow update core component A while keeping component B using old version of A? Or that (2) components will be better decoupled when they will be in different repositories. May be we can find a way of keeping things in one repo and solving both of the issues?

For the (1) I would propose looking at dependency tree first and see if there are such cases.

For the (2) I would start from having one folder for each component in the root of repository.

@ngeiswei
Copy link
Member

I would think the pros are in (2), in fact I see (1) as potential con. The reason I would be bending towards that proposal is because these components are still gonna grow, as likely the number of people involved, so more compartmentalization feels right. But maybe the granularity of the split can be much coarser than what is originally suggested, TBD.

Certainly we have time to weight the pros and cons.

@ngeiswei
Copy link
Member

I'm not sure we can rip the benefit of splitting if things stay in the same repo, though I suppose flattening on the root directory (or rather opencog directory) could be a reasonable intermediary step. The first step obviously is to come up with the right split, ideally a small partition of cohesive parts. Maybe Matt can help to calculate that with his Phi measure toolset. ;-)

@linas
Copy link
Member Author

linas commented Dec 20, 2018

I just added: "Different repos can have different owners, different merge policies" -- so, for example, the space-time server, which is pre-alpha, should allow any kind of merges and breakages; a spacetime server commit should not be held up because unit tests fail.

Different repos can have different owners, controllers review policies, etc. Everyone who works in repo X should be an expert in repo X. Right now, the pattern miner experts know nothing about ghost, and vice-versa.

@vsgbod: right now, the different components are already almost in different root directories. There's some overlap, but not much.

@linas
Copy link
Member Author

linas commented Dec 20, 2018

Just to be clear: "Different repos can have different owners, different merge policies" is really a people-management issue, not a code-management issue. It allows different groups of people to set different policies and different mechanisms of control, to suit their collaborative style, to suit the maturity of that component.

@vsbogd
Copy link
Contributor

vsbogd commented Dec 20, 2018

Ok, if it is more about more formal code ownership and different release processes then it makes sense.

Regarding code ownership I don't see big difference with using separate folders. GitHub usually suggests reviewers among people who contributed into code before. So having different repos is just a bit better. I have experience working in setup when people just owned folders. It was comfortable and using same repository were simpler because you don't need pull and rebuild many repositories. But it required all team knew who is responsible for the component.

Release process is more complex thing if we need to have different release and merge policies then we need to use different repos.

@vsbogd
Copy link
Contributor

vsbogd commented Jan 28, 2019

Some links on topic how to move files from one repository to another preserving history:

In few words:

  • make a local copy of source repository
  • use git filter-branch to filter history for files you need
  • move filtering results under proper folder in source repository copy if required
  • use get remote add or git remote set-url to set target repository as remote URL
  • use git pull --allow-unrelated-histories to sync local and remote target repository
  • do git push

@linas
Copy link
Member Author

linas commented Jan 30, 2019

thanks See however discussion in #3593 about the loss of git history.

@linas
Copy link
Member Author

linas commented Feb 24, 2019

I just split out language-learning to here: https://github.com/opencog/learn and plan to replace the contents of the nlp/learn with a git submodule/subtree ...

@ngeiswei
Copy link
Member

@linas, I like the idea of splitting as you know, but I'm not sure about the name. learn is too generic to be dedicated to nlp alone, right? Or do you mean move anything that relates to learning under it, which is then probably too broad.

In opencog there is a learning directory that contains the pattern miner. Not sure it belongs to that directory anyway since, as currently implemented, it actually closer to reasoning as it runs on the URE. I would be fine to move the pattern miner to its own repo anyway, just wanted to express why I question the name learn.

@linas
Copy link
Member Author

linas commented Feb 25, 2019

I've convinced myself that the generic algorithm, implemented there works just fine for biology, vision, facial expressions, robot movements, whatever; its not just an NLP thing. I've kind of been saying that for five years now, and if no one wants to believe me .. I dunno. I get to say it again? At this time, I do not want to mash in other code bases into that repo, mostly because the other code-bases are distinct and stand alone.

@linas
Copy link
Member Author

linas commented Feb 25, 2019

Regarding URE, PLN & pattern mining. perhaps it makes sense to re-group these. Perhaps they are sufficiently tightly coupled that they should move together as a group, whereas URE does not have a lot in common with the atomspace. This is partly a technical decision, and partly a marketing decision: it is kind of nice to say, in a marketing blurb: "New! Improved! The atomspace comes with an inference engine!!" But one could split it out, and then the sales brochure says: "New! Improved! The inference engine now comes with a pattern miner!"

I don't have a strong opinion one way or the other. The URE did not come out the way I had hoped it would. I was assuming that the URE would implement what I call "sheaves", but instead, it implements something else. I had also hoped/planned to use the pattern miner instead of having the one-two-three step of MI-pairs->MST->disjuncts->factorization but the pattern-miner does not give me graph factors. If it did, we could just unleash the pattern miner on the raw language data, and get the syntax graph popping out. But the pattern-miner didn't come out that way either. Whatever. I still think the MI-pairs->MST->disjuncts->factorization pipeline is a valid alternative to pattern-mining, we can continue to improve both, and see which one does better at extracting structure for which kinds of problems.

So anyway, we now we have multiple subsystems that are similar, share some similar goals, have some similar ideas, take some similar inputs, but really ended up quite different. Because they are more different than they are similar, it makes sense to keep them separated.

@vsbogd
Copy link
Contributor

vsbogd commented Sep 5, 2019

After moving repo parts into separate repos the following functionality is not available in opencog-dev:cli docker and via octool installation:

  • ure
  • miner
  • learn
  • attentionbank
  • spacetime server
  • pattern-index
  • visualization
  • and cogserver will not be available after merge of Rm server #3608

To fix it additional changes should be done in opencog/ocpkg and opencog/docker

@vsbogd
Copy link
Contributor

vsbogd commented Sep 5, 2019

Same about unit tests. cogutil, atomspace, CircleCI configs were designed to run unit tests for all dependent components. But after separation the following unit tests are not executed:

  • learn
  • spacetime
  • pattern-index
  • attention

@linas
Copy link
Member Author

linas commented Jul 8, 2021

Closing. This task project is more or less completed. All the parts that seemt o be indpendently useful outside of the chatbot framework have been moved to their own repos. All that remains here is the chatbot code.

(It might still be interesting to split out openpsi on it's own, or at least, the scheduling/prioritization part of openpsi.)

@linas linas closed this as completed Jul 8, 2021
@linas linas unpinned this issue Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants