Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate gitcollector #51

Merged
merged 7 commits into from
Jun 19, 2019
Merged

Integrate gitcollector #51

merged 7 commits into from
Jun 19, 2019

Conversation

smacker
Copy link
Contributor

@smacker smacker commented Jun 17, 2019

The first part of #47

  • refactor workdir package to make clear separation of workdir name and
    workdir path
  • introduce new InitWithOrgs function
  • use envFile struct instead of a string that knows how to generate .env
    file depends on the fields
  • update docker-compose.yml with ghcollector and new gitbase
  • refactor bindings in docker-compose.yml to support docker volume as
    persistent storage which allows to easily prune it and avoid performance
    issues related to different host filesystem
  • add support for base64 workdir names

TODO in sequential PRs because this PR is rather big already:

  • rename init into local init
  • add validation for token and organization names correctness
  • there is initial support for multi-orgs but ghcollector supports only 1 for now. Most probably it also makes sense to sort them and maybe some other changes that are better to implement with underlying support is ready
  • move orgs workdirs into orgs subdir and local workdirs into local subdir (maybe)

@smacker smacker requested a review from a team June 17, 2019 14:30
@smacker smacker mentioned this pull request Jun 17, 2019
@smacker
Copy link
Contributor Author

smacker commented Jun 17, 2019

Ooops. The last change broke it. I'm fixing it.

@smacker
Copy link
Contributor Author

smacker commented Jun 17, 2019

Nobody reviewed it yet, so I just force-pushed the fix

Copy link
Contributor

@dpordomingo dpordomingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestions for command descriptions

cmd/sourced/cmd/orgs.go Outdated Show resolved Hide resolved
cmd/sourced/cmd/orgs.go Outdated Show resolved Hide resolved
cmd/sourced/cmd/orgs.go Outdated Show resolved Hide resolved
@smacker smacker mentioned this pull request Jun 17, 2019
docker-compose.yml Outdated Show resolved Hide resolved

// InitWithOrgs initialize workdir with remote list of organizations
func InitWithOrgs(orgs []string, token string) (string, error) {
workdir := base64.StdEncoding.EncodeToString([]byte(strings.Join(orgs, ",")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we need to sort orgs first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please take a look in the description. It's in todo for multi-org support. Currently, it works only with 1 org. There is no need to sort 1 org.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ops, sorry I missed that. 👍

return "", err
}

realPath, err := filepath.EvalSymlinks(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
realPath, err := filepath.EvalSymlinks(path)
realPath, err := ActivePath()

}

// common initialization for both local and remote data
func initWorkdir(workdirPath string, makeEnvContent func() string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not envContent envFile insteaf of makeEnvContent func() string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm. Makes sense to change. I added envFile after I wrote this function.

- refactor workdir package to make clear separation of workdir name and
workdir path
- introduce new InitWithOrgs function
- use envFile struct instead of a string that knows how to generate .env
file depends on the fields
- update docker-compose.yml with ghcollector and new gitbase
- refactor bindings in docker-compose.yml to support docker volume as
persistent storage which allows to easily prune it and avoid performance
issues related to different host filesystem
- add support for base64 workdir names

Signed-off-by: Maxim Sukharev <max@smacker.ru>
Signed-off-by: Maxim Sukharev <max@smacker.ru>
Signed-off-by: Maxim Sukharev <max@smacker.ru>
@smacker
Copy link
Contributor Author

smacker commented Jun 18, 2019

@dpordomingo @se7entyse7en please another pass.

Signed-off-by: Maxim Sukharev <max@smacker.ru>
@smacker
Copy link
Contributor Author

smacker commented Jun 18, 2019

Updated to ghcollector beta.3 and updated the config for srcd-ui to match what current master expects.
Update to beta.3 doesn't mean progress works, see: src-d/gitcollector#35

Copy link
Contributor

@dpordomingo dpordomingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outdated comment; you can use message history if you want to make laughs about me ;)

@smacker
Copy link
Contributor Author

smacker commented Jun 18, 2019

I'll recheck. I didn't retest local mode after the very recent changes.

@dpordomingo
Copy link
Contributor

You can disregard my prev comment; the problem I had with GITBASE_REPOS_DIR was caused by an old version of docker-compose.yml

Copy link
Contributor

@dpordomingo dpordomingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried sourced init and sourced orgs init, and they both works, so LGTM.
Reviewed code a bit fast, not in full depth because some parts were moved from A to B and diff is a bit ugly.

// InitWithOrgs initialize workdir with remote list of organizations
func InitWithOrgs(orgs []string, token string) (string, error) {
workdir := base64.StdEncoding.EncodeToString([]byte(strings.Join(orgs, ",")))
workdirPath, err := absolutePath(workdir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We mentioned that workdirs from orgs will live in orgs dir. It can be done in another PR, but it could be added in TODO PR description to avoid forgetting it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible you forgot to push? I don't see this change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO is updated. Not the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, you mean updated the description, I missed it on the first read

move orgs workdirs into orgs subdir and local workdirs into local subdir (maybe)

GITBASE_VOLUME_SOURCE=%s
GITBASE_SIVA=%s
GITHUB_ORGANIZATION=%s
GITHUB_TOKEN=%s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could omit GITBASE_SIVA, GITHUB_ORGANIZATION, GITHUB_TOKEN when not in an org .env

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea is not to use ${var:-} in docker-compose.yml so it easy to spot if something is broken (.env file and yml file aren't in sync). It complains if there is nothing set in such case.
And to avoid warning for a normal workflow we set env vars into empty values.

Though I don't feel strongly about it. We can change later if everybody prefers.

Signed-off-by: Maxim Sukharev <max@smacker.ru>
@@ -7,17 +7,37 @@ services:
- 9432:9432
restart: always

gitcollector:
image: srcd/gitcollector:v0.0.1-beta.3
# wait for db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe dumb question, but which db? The superset's postgres? Isn't the

depends_on:
  - postgres

enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. not enough. depends_on handles only container creation. But postgres needs time to init and start before becoming available. You can read more about it here: https://docs.docker.com/compose/startup-order/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the link!

Copy link
Contributor

@se7entyse7en se7entyse7en left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

GITHUB_ORGANIZATION: ${GITHUB_ORGANIZATION:-}
GITHUB_TOKEN: ${GITHUB_TOKEN:-}
# use main db
GITCOLLECTOR_METRICS_DB_URI: postgresql://superset:superset@postgres:5432/superset?sslmode=disable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to use the main DB? Is it not better to use the metadata db?
I'm assuming at some point we may have a command to cleanup the gitcollector+ghsync data and download again. Removing the metadata db and gitbase_repositories volumes will leave everything in a clean state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used main db exactly to avoid wiping out stuff related to gitcollector.
I assumed it can update sivas (as it was shown during the demo). But maybe it's not true. I asked for confirmation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will update the discovered repositories which are already there, there's is an option to don't allow updates so it would do nothing

confirmed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure about this. In my mind it would be better to have gitcollector and ghsync totally in sync.
gitcollector can update sivas, but can it detect if a repo was deleted, or renamed?

Anyway to me this is something worth discussing, but for the initial download it's ok either way.

Copy link
Contributor

@carlosms carlosms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a missing base64 decode somewhere, I found this bug:

$ ./sourced workdirs
  carlosms-test-org
* /home/cmartin/repos

$ ./sourced prune --all
lstat /home/cmartin/.sourced/workdirs/carlosms-test-org: no such file or directory

$ tree -a .sourced/workdirs/
.sourced/workdirs/
├── __active__ -> /home/cmartin/.sourced/workdirs/carlosms-test-org
├── home
│   └── cmartin
│       └── repos
│           ├── docker-compose.yml -> /home/cmartin/.sourced/compose-files/__active__/docker-compose.yml
│           └── .env
└── Y2FybG9zbXMtdGVzdC1vcmc=
    ├── docker-compose.yml -> /home/cmartin/.sourced/compose-files/__active__/docker-compose.yml
    └── .env

Exported ListPath and SetActivePath functions to support prune all
logic. Another option would be to incapsulate this logic inside workdir
package.

Signed-off-by: Maxim Sukharev <max@smacker.ru>
@smacker
Copy link
Contributor Author

smacker commented Jun 19, 2019

Thanks. Fixed.

Exported ListPath and SetActivePath functions to support prune all
logic. Another option would be to encapsulate this logic inside workdir
package.

Though I would prefer to unify workdir names instead (base64 everywhere as local won't be the main use case, it's much easier to use org), it would simplify everything a lot.

Signed-off-by: Maxim Sukharev <max@smacker.ru>
@smacker
Copy link
Contributor Author

smacker commented Jun 19, 2019

Ooop. Everybody missed hard-coded SYNC_MODE: true for ui container. I hot fixed it by using GITBASE_SIVA. We can improve it later.

@smacker smacker merged commit c2335eb into src-d:master Jun 19, 2019
@carlosms carlosms mentioned this pull request Jun 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants