New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log `git checkout` and expose to users #3520

Merged
merged 26 commits into from Feb 16, 2018

Conversation

Projects
None yet
4 participants
@humitos
Member

humitos commented Jan 15, 2018

I've been taking a look at the code and I found some consideration that we need to take a look:

  1. BaseVCS class has it owns run method that works in a similar way than BuildEnvironment.run
    • the log_tmpl string differs
    • it adds some configurations automatically, like the cwd path
  2. All commands ran by BaseVCS are not logged as a BuildCommand
  3. BuildCommand needs to be instantiated with build_env to make be able to .save()/record the command for the current Build object.
  4. build_env could be LocalEnvironment or DockerEnvironment

Depending on how much we want to refactor, a minimalistic solution would involve:

  1. pass setup_env to update_imported_docs
  2. pass this env to version_repo.checkout
  3. under checkout consider use if that env if not None
  4. env.run with proper cwd

This is the what this PR implements but, this leaves a mix of calls to run method between BaseVCS and LocalEnvironment for the same purpose: update the repository's code. So, maybe we should consider to just pass the setup_env to the init of BaseVCS and execute everything under setup_env.

The problem of using always the setup_env (or maybe a new one called vcs_env) is that the record affects all the commands ran under that env and we want to record only clone and checkout commands --the run method could receive an optional record to override the default behaviour.

It's a good starting point to discuss a little about this.

This PR only affects to the git backend, that's why I think we need to think in a general solution under the BaseVCS class.

Closes #3397

@humitos

This comment has been minimized.

Member

humitos commented Jan 15, 2018

Also, we may want to remove the --quiet option from the git command?

captura de pantalla_2018-01-15_15-58-05

...

captura de pantalla_2018-01-15_15-59-36

but if it fails...

captura de pantalla_2018-01-15_16-07-21

@@ -173,7 +189,7 @@ def checkout(self, identifier=None):
self.fetch()

This comment has been minimized.

@humitos

humitos Jan 15, 2018

Member

We probably needs to pass the env here, also 😞

@ericholscher

This comment has been minimized.

Member

ericholscher commented Jan 16, 2018

I think we should be able to set a per-command option to choose whether to record that call ever. It seems like we haven't needed that in the past, but for the VCS stuff we'll want to have a subset of things that aren't recorded.

One question, is there a reason we really don't want all the VCS commands recorded? It might be extra info, but it doesn't seem inherently bad to show them.

@ericholscher

I think we should remove the BaseCLI:run() command from the VCS tools, and move to always just proxying those commands to the Environment:run(). We really shouldn't have two different ways of running commands in our build process, and the environments are a better abstraction.

@@ -593,7 +593,7 @@ def update_imported_docs(version_pk):
version_slug = version.slug
version_repo = project.vcs_repo(version_slug)
ret_dict['checkout'] = version_repo.checkout(version.identifier)
ret_dict['checkout'] = version_repo.checkout(version.identifier, env)

This comment has been minimized.

@ericholscher

ericholscher Jan 16, 2018

Member

Do we need the setup_env here, or could we just create a new vcs_env inside checkout?

This comment has been minimized.

@ericholscher

ericholscher Jan 16, 2018

Member

Seems we could pass it into the original call to project.vcs_repo(version_slug) in the line above, and stored on the repo object?

This comment has been minimized.

@ericholscher

ericholscher Jan 16, 2018

Member

I'm not suggesting this btw, just thinking out loud. Is it worth having a vcs_env we use for all commands, or is passing in the setup_env correct? I'm not sure!

This comment has been minimized.

@agjohnson

agjohnson Jan 16, 2018

Contributor

My initial thought was to just use the setup_env. The difference between setup_env and build_env is just that one is executed on the host, and one in the vm. Separating this further might make for more places to check for env exception vs env failure vs env pass as well, so reusing setup_env ends up cleaner too.

Do you have a scenario that you think it might help to have a separate env?

This comment has been minimized.

@agjohnson

agjohnson Jan 16, 2018

Contributor

I think passing env into the VCS backend instantiation is the correct pattern. I envisioned completely removing the run() method on the vcs backends, as we shouldn't be duplicating code there. Perhaps there are some commands that are not important to the user to expose in the UI and we can not record those, but executing all commands through the setup_env is important for this feature.

This comment has been minimized.

@humitos

humitos Jan 16, 2018

Member

The setup_env is not used for VCS checkouts since they are using their own run. In fact, the function/task didn't receive the setup_env, I had to added to it.

That could be a problem because that task (update_imported_docs) is called from a couple of places where a setup_env doesn't exist (e.g. from core/views/hooks.py). This could be one of the reason of why a vcs_env could be a better approach: we could create it inside the Project.vcs_repo instead of passing around through all the code or forcing other calls to create this environment just to log the commands.

This comment has been minimized.

@humitos

humitos Jan 16, 2018

Member

Something that we need to consider if we want to create a new vcs_env is that the envs are created with this dependencies:

  • self.project
  • self.version
  • self.config (from YAML)
  • self.build
  • record
  • env_vars. From get_env_vars, which depends on
    • self.version
    • self.project
    • self.config

In case we want to create the environment inside the Project.vcs_repo will need to consider this dependencies.

This comment has been minimized.

@humitos

humitos Jan 16, 2018

Member

I took a look at this, and I found that the only missing piece here is the build_pk (originally passed from core.utils.trigger_build after creating an Build object).

Solution 1

So, I'd say that this method update_imported_docs could receive the version_pk and build_pk (but when called from UpdateDocsTask we will be repeating a couple of actions like parsing the YAML and more) and create our own vcs_env for this. If build_pk is None we won't record (False) the commads since we don't know where to record them. This will allow other uses of this function to keep working.

Solution 2

Refactor all the things!

  • move update_imported_docs as a method of UpdateDocsTask
  • fix test_celery.py:test_update_imported_docs
  • refactor/remove core.view.hooks._build_url function that uses the function update_imported_docs
  • refactor/remove github_build, gitlab_build and bitbucket_build functions that use the _build_url but the docstrings say those functions are deprecated in favor of readthedocs.restapi.views.integrations classes (these classes use core.views.hooks._build_version which uses trigger_build which uses UpdateDocsTask --so we are safe)

I think the last solution is a better plan, but I may missing a lot of things in the middle :)

This comment has been minimized.

@humitos

humitos Jan 16, 2018

Member

We found that there are users using the old webhook endpoints. Only projects with Feature.ALLOW_DEPRECATED_WEBHOOK can hit these ones. But anyway, we can't remove them yet.

So, the refactor would include changing calls to projects.tasks.update_imported_docs,

if not built:
# Call update_imported_docs to update tag/branch info
update_imported_docs.delay(project.versions.get(slug=LATEST).pk)
msg = '(URL Build) Syncing versions for %s' % project.slug
log.info(msg)

by core.utils.trigger_build which will end up calling UpdateDocsTask in a proper way and "everything should keep working"

This comment has been minimized.

@ericholscher

ericholscher Jan 16, 2018

Member

I think we need think more about why we are calling into this function from other places, and build a supported API for that use case.

I believe all the other calls are so that we can re-sync the checkout on disk and the DB (to update versions on a push, for example). We should find a way to make that supported explicitly in the build tooling I think, so that we don't have to maintain a bunch of different logic. I think the best approach here is to have some kind of mechanism on the doc builds to say "Do a resync of all the on-disk stuff, but don't actually build the docs", which is mostly what your Solution 2 is talking about.

@@ -335,7 +335,7 @@ def setup_vcs(self):
self.setup_env.update_build(state=BUILD_STATE_CLONING)
self._log(msg='Updating docs from VCS')
update_imported_docs(self.version.pk)
update_imported_docs(self.version.pk, self.setup_env)

This comment has been minimized.

@ericholscher

ericholscher Jan 16, 2018

Member

Would like to see kwargs here. I'd love to get to a world where we're explicitly using kwargs in all function/method calls, at least until we can add type checking to the calls :)

This comment has been minimized.

@agjohnson

agjohnson Jan 16, 2018

Contributor

We can also just drop this arg. Other methods use the class local self.setup_env, so it might make sense to continue assuming this state exists.

I don't particularly like this pattern, but its at least consistent.

This comment has been minimized.

@humitos

humitos Jan 16, 2018

Member

update_imported_docs is a function outside this UpdateDocsTask class (it's used from other places). See #3520 (comment)

@agjohnson

The output is a huge improvement, I'm really excited to have this output included and unified with the rest of the command output finally!

I'll echo the sentiment to continue cleaning up the vcs backend code and completely remove the secondary run() method. I'll also echo the uncertainty of displaying all the commands, could you post some screenshots of the commands all executed through the build environment?

@@ -335,7 +335,7 @@ def setup_vcs(self):
self.setup_env.update_build(state=BUILD_STATE_CLONING)
self._log(msg='Updating docs from VCS')
update_imported_docs(self.version.pk)
update_imported_docs(self.version.pk, self.setup_env)

This comment has been minimized.

@agjohnson

agjohnson Jan 16, 2018

Contributor

We can also just drop this arg. Other methods use the class local self.setup_env, so it might make sense to continue assuming this state exists.

I don't particularly like this pattern, but its at least consistent.

@@ -593,7 +593,7 @@ def update_imported_docs(version_pk):
version_slug = version.slug
version_repo = project.vcs_repo(version_slug)
ret_dict['checkout'] = version_repo.checkout(version.identifier)
ret_dict['checkout'] = version_repo.checkout(version.identifier, env)

This comment has been minimized.

@agjohnson

agjohnson Jan 16, 2018

Contributor

I think passing env into the VCS backend instantiation is the correct pattern. I envisioned completely removing the run() method on the vcs backends, as we shouldn't be duplicating code there. Perhaps there are some commands that are not important to the user to expose in the UI and we can not record those, but executing all commands through the setup_env is important for this feature.

code, out, err = build_cmd.exit_code, build_cmd.output, build_cmd.error
else:
code, out, err = self.run('git', 'checkout',
'--force', '--quiet', revision)

This comment has been minimized.

@agjohnson

agjohnson Jan 16, 2018

Contributor

Based on your screenshots, --quiet is not what we want it seems. The output without --quiet was really helpful. So 👍 on removing that from the command.

@agjohnson

This comment has been minimized.

Contributor

agjohnson commented Jan 16, 2018

One more thought:

build_env could be LocalEnvironment or DockerEnvironment

We probably won't want to support this. In the case of our commercial hosting, this step is specifically kept out of the Docker build environment so that the build environment has no read access to the source deploy keys.

@ericholscher

This comment has been minimized.

Member

ericholscher commented Jan 16, 2018

@agjohnson I'd really like to run the VCS calls inside Docker on the .org though, so I think having it be configurable by a setting or similar is good.

@humitos humitos referenced this pull request Jan 17, 2018

Closed

Old webhooks gives 502 #3527

@humitos

This comment has been minimized.

Member

humitos commented Jan 17, 2018

I'll also echo the uncertainty of displaying all the commands, could you post some screenshots of the commands all executed through the build environment?

I'm working on running all the commands inside a BuildEnvironment. These are the commands executed in an already imported project (for example when a webhook comes) so far:

In [4]: vcs = Project.objects.get(slug='poliastro').vcs_repo()

In [5]: vcs.checkout()
[17/Jan/2018 15:01:55] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git status' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:55] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git remote set-url origin https://github.com/poliastro/poliastro' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:55] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git fetch --tags --prune' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:56] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git show-ref remotes/origin/master' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:56] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git checkout --force --quiet origin/master' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:56] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git clean -d -f -f' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:56] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git submodule sync' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
[17/Jan/2018 15:01:56] readthedocs.doc_builder.environments:112[13358]: INFO Running: 'git submodule update --init --recursive --force' [/home/humitos/rtfd/code/readthedocs.org/user_builds/poliastro/checkouts/latest]
Out[5]: (0, '', None)
@humitos

This comment has been minimized.

Member

humitos commented Jan 18, 2018

I just updated this PR with several changes implementing the idea that I commented you both. This is not a final solution but some code to discuss over something concrete.

Summarizing, the idea is to have a BaseEnvironment which doesn't depend on a Build where we can run arbitrary commands: this is the case of some VCS command that we need to run outside a build.

This code will probably needs some polishing and arrangement between us, but it shows the idea I think.

@agjohnson screenshot for all the VCS commands recorded

captura de pantalla_2018-01-18_12-47-44

@humitos

This comment has been minimized.

Member

humitos commented Jan 18, 2018

Initial import...

captura de pantalla_2018-01-18_12-55-32

Note that the first command failed, but it's under warn_only so I think we shouldn't record those ones to avoid this problems.

@humitos

This comment has been minimized.

Member

humitos commented Jan 18, 2018

Just imported repo without logging the warn_only commands:

captura de pantalla_2018-01-18_12-59-38

@humitos

I left some comments where I have doubts or where I wanted to explain a little more about that change.

Anyway, this code still need work for sure but I will wait for your comments before continue working on this.

cc @ericholscher @agjohnson

def _log_warn_only(self, msg):
log.warn(LOG_TEMPLATE.format(
project=self.project.slug,
version='latest',

This comment has been minimized.

@humitos

humitos Jan 19, 2018

Member

I don't like this, but I don't want to depend on a version instance to just get the slug. I would like to think in a better way to implement it, though

# :'(
log.warn(LOG_TEMPLATE.format(
project=self.project.slug,
version=self.version.slug,

This comment has been minimized.

@humitos

humitos Jan 19, 2018

Member

This one can use the version object because the *BuildEnvironment are instantiated with this object. No real problem here.

@@ -163,7 +163,7 @@ def save_environment_json(self):
with open(self.environment_json_path(), 'w') as fpath:
# Compatibility for Py2 and Py3. ``io.TextIOWrapper`` expects
# unicode but ``json.dumps`` returns str in Py2.
fpath.write(unicode(json.dumps(data)))
fpath.write(json.dumps(data))

This comment has been minimized.

@humitos

humitos Jan 19, 2018

Member

Someone proposed to use six.text_type, but I didn't try it yet

"""
max_retries = 5
default_retry_delay = (7 * 60)
name = __name__ + '.update_docs'
# TODO: the argument from the __init__ are used only in tests

This comment has been minimized.

@humitos

humitos Jan 19, 2018

Member

I noted this because I found these non-required attributes kind of confusing

This comment has been minimized.

@agjohnson

agjohnson Jan 23, 2018

Contributor

Perhaps some docs should exist on these? Optionally, we could use a separate pattern for this, such as a separate class method used just for testing. I have no strong opinions here though, just throwing out some options.

version_repo = self.project.vcs_repo(
self.version.slug,
# When ``sync_only`` we don't a setup_env
getattr(self, 'setup_env', None),

This comment has been minimized.

@humitos

humitos Jan 19, 2018

Member

This is also kind of hacky: sync_repo can be executed before (sync_only=True) or after the run_setup (the one that creates the self.setup_env we want to use here in case it exists)

@@ -21,7 +22,7 @@ class Backend(BaseVCS):
def update(self):
super(Backend, self).update()
retcode = self.run('bzr', 'status')[0]
retcode = self.run('bzr', 'status', warn_only=True)[0]

This comment has been minimized.

@humitos

humitos Jan 19, 2018

Member

This command is used to check the existence of the repository and call pull or clone for example.

@@ -66,21 +66,21 @@ def checkout_revision(self, revision=None):
revision = 'origin/%s' % branch
code, out, err = self.run('git', 'checkout',
'--force', '--quiet', revision)
'--force', revision)

This comment has been minimized.

@humitos

humitos Jan 19, 2018

Member

Since we are exposing the output to the user, I removed all the quiet arguments.

@@ -23,7 +24,7 @@ def pull(self):
(pull_retcode, _, _) = self.run('hg', 'pull')
if pull_retcode != 0:
raise RepositoryError
(update_retcode, stdout, stderr) = self.run('hg', 'update', '-C')
(update_retcode, stdout, stderr) = self.run('hg', 'update', '--clean')

This comment has been minimized.

@humitos

humitos Jan 19, 2018

Member

I expand the arguments to use the long form which is clearer.

"""
Base for VCS Classes.
Built on top of the BaseCLI.
VCS commands are ran inside a ``LocalEnvironment``.

This comment has been minimized.

@humitos

humitos Jan 19, 2018

Member

In fact, could be any Environment not just Local. By default, is LocalEnvironment

environment = os.environ.copy()
# TODO: kind of a hack
del environment['PATH']

This comment has been minimized.

@humitos

humitos Jan 19, 2018

Member

I need to delete this one because there is an assert in the code that doesn't allow the PATH in the environment

This comment has been minimized.

@agjohnson

agjohnson Jan 23, 2018

Contributor

Probably best anyways? I think we want tight control over this.

@agjohnson

This comment has been minimized.

Contributor

agjohnson commented Jan 22, 2018

Now that I've seen how many commands are raised to the user, I feel like there is perhaps some filtering of commands that we need to do. I think it's important to reduce redundancy in the implementations and use a central execution pattern, but it probably isn't useful for users to see all of these commands.

For instance, if we do a git status to check for a repository, this shouldn't be surfaced to the user, and it shouldn't show as a failed command, as it isn't important from the user's perspective. Commands like git checkout and git subodule ... are important though.

This will probably also get around the issue that a command like git status will normally fail and execution will continue regardless. We don't need to show this as a failure to the user.

@humitos

This comment has been minimized.

Member

humitos commented Jan 22, 2018

To add another place where we are running commands outside a Build and even outside an Environment, I just found this:

def run(*commands):
"""
Run one or more commands.
Each argument in `commands` can be passed as a string or as a list. Passing
as a list is the preferred method, as space escaping is more explicit and it
avoids the need for executing anything in a shell.
If more than one command is given, then this is equivalent to
chaining them together with ``&&``; if all commands succeed, then
``(status, out, err)`` will represent the last successful command.
If one command failed, then ``(status, out, err)`` will represent
the failed command.
:returns: ``(status, out, err)``
"""
environment = os.environ.copy()
environment['READTHEDOCS'] = 'True'
if 'DJANGO_SETTINGS_MODULE' in environment:
del environment['DJANGO_SETTINGS_MODULE']
if 'PYTHONPATH' in environment:
del environment['PYTHONPATH']
# Remove PYTHONHOME env variable if set, otherwise pip install of requirements
# into virtualenv will install incorrectly
if 'PYTHONHOME' in environment:
del environment['PYTHONHOME']
cwd = os.getcwd()
if not commands:
raise ValueError("run() requires one or more command-line strings")
for command in commands:
# If command is a string, split it up by spaces to pass into Popen.
# Otherwise treat the command as an iterable.
if isinstance(command, six.string_types):
run_command = command.split()
else:
try:
run_command = list(command)
command = ' '.join(command)
except TypeError:
run_command = command
log.debug('Running command: cwd=%s command=%s', cwd, command)
try:
p = subprocess.Popen(
run_command,
cwd=cwd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
env=environment
)
out, err = p.communicate()
ret = p.returncode
except OSError:
out = ''
err = traceback.format_exc()
ret = -1
log.exception("Command failed")
return (ret, out, err)

(used for symlinks, for example)

@agjohnson

This comment has been minimized.

Contributor

agjohnson commented Jan 23, 2018

Yeah, that's the other one that needs to be unified. Let's address that separately though. I'll open a ticket for that.

@agjohnson

This looks great! I could use some more explanation around moving the update docs task into the main build task, as it seems like a conditional use of the main task. Maybe all that is required here is an abstraction though.

We already discussed the addition of a record on run(), but i agree the verbose cli options is a necessary addition for all of the vcs commands. It makes the user view more obvious and makes it easier for development as well.

class LocalEnvironment(BaseEnvironment):
# TODO: BuildCommand name doesn't make sense here, should be just Command

This comment has been minimized.

@agjohnson

agjohnson Jan 23, 2018

Contributor

True, this meaning has changed now. 👍 on changing

This comment has been minimized.

@agjohnson

agjohnson Jan 23, 2018

Contributor

However, maybe this is a later refactor after all. If we generalize the naming, we should probably also move out of doc_builder/ app.

This comment has been minimized.

@humitos

humitos Jan 23, 2018

Member

Opened a ticket for this: #3541

kwargs['environment'] = self.environment
# TODO: ``build_env`` is passed as ``kwargs`` when it's called from a ``*BuildEnvironment``
# kwargs['build_env'] = self

This comment has been minimized.

@agjohnson

agjohnson Jan 23, 2018

Contributor

I don't quite follow this comment, but I assume this note makes more sense for you :)

This comment has been minimized.

@humitos

humitos Jan 23, 2018

Member

I wanted to leave that comment to discuss a little about this.

build_env was self, but now if it comes, it's from the overriden run method from the class that is calling this method.

Sounds a little elaborated, that's why I'd appreciate your thoughts on this.

self.commands.append(build_cmd)
build_cmd.run()
# TODO: I don't like how it's handled this entry point here

This comment has been minimized.

@agjohnson

agjohnson Jan 23, 2018

Contributor

What is your concern here? I can see how warn_only would be confusing as we're now considering several different return states:

  • Command passed, report
  • Command failed, report
  • Command failed, report and fail the build
  • Command passed, don't report
  • Command failed, don't report
  • Command failed, don't report but fail the build (this probably shouldn't be used ever, but maybe there are steps that should raise a general error?)
  • Command failed, but report as pass? (I don't know if this still exists though)

This comment has been minimized.

@humitos

humitos Jan 23, 2018

Member

This TODO comment was moved accidentally. It belongs to this line: https://github.com/rtfd/readthedocs.org/pull/3520/files/7f6c098983acb4bb3afc90e5c780af7dae9ec9d8#r163242699

I agree with the return states, but I'd say that the last two shouldn't exist at all. The rest, are needed.

This comment has been minimized.

@humitos

humitos Jan 23, 2018

Member

Command failed, but report as pass? (I don't know if this still exists though)

This is tricky, but it does exist. Although, we are not using it but maybe we need it.

For example, hg tags command. I suppose that we want to record=True it but we want to warn_only=True also, since it could fail but we don't want the Build to fail. Besides, if it's recorded and failed (exit status != 0) it will be presented in red to the user and we don't want that.

Is that a good example of this use case?

This comment has been minimized.

@humitos

humitos Jan 23, 2018

Member

Sumarizing, I think we want the warn_only commands to be recorded but don't interfers in the build status nor in the command itself (when presented to the user).

Maybe what we need to change is the BuildCommandResultMixin.successful property to something like:

return self.exit_code == 0 and self.warn_only == False

but that would involve to save the warn_only in the database also and I don't know if this is getting more and more complicated :/

This comment has been minimized.

@humitos

humitos Jan 23, 2018

Member

Another possible idea is that if warn_only we record it, but we set the exit_code as 0 (this could end up with other problems like the Latex build fail and we will never know it, even when opening the Build interface since it won't be marked as failed -in red).

So, add another attribute? warn_only and force_success 😛 ?

This comment has been minimized.

@agjohnson

agjohnson Jan 24, 2018

Contributor

👍 Its sounding like we also want a force_success option as well here.

This comment has been minimized.

@humitos

humitos Jan 24, 2018

Member

Meh, I just realized that we need something like record_as_success instead of force_success because we are using git status exit code to decide if clone or pull, so we need the real exit code in the flow but we want to save a fake one in the databse...

if not warn_only:
# TODO: maybe receive ``record`` as an attribute for skip/record
# just specific commands but not all of them ran under the
# *BuildEnvironment

This comment has been minimized.

@agjohnson

agjohnson Jan 23, 2018

Contributor

This note sounds like we want a record kwarg on command execution? I don't quite understand the comment on *BuildEnvironment though.

This comment has been minimized.

@humitos

humitos Jan 23, 2018

Member

Hehe.

Yes, I was thinking on having the record attribute passed in the run method to override the one from the __init__ from *BuildEnvironment classes (now we have LocalEnvironment and (Local/Docker)BuildEnvironment)

This comment has been minimized.

@agjohnson

agjohnson Jan 24, 2018

Contributor

Yeah, i think a record arg on run() makes sense. Given __init__(record=False, ...), and run(record=True, ...), what do think we should do? I'd probably say don't record if __init__(record=False), thoughts?

This comment has been minimized.

@humitos

humitos Jan 24, 2018

Member

I think it's better to follow what run(record=) is set independently how was the class initialized.

Why? Because sometimes the LocalEnvironment is initialized far away (in another file and just passed as argument) from the code where the command is ran and you want/need to override the default record.

I implemented this at

if record is None:
# ``self.record`` only exists when called from ``*BuildEnvironment``
record = getattr(self, 'record', False)

# TODO: do we want to save commands that FAILED but not raised and
# exception? This will cause the first `git status` (when
# importing) to fail and be marked with RED in the Build command
# details

This comment has been minimized.

@agjohnson

agjohnson Jan 23, 2018

Contributor

Raised in general review of this, I'd say having a record option that wouldn't record this command at all is a better pattern.

This comment has been minimized.

@humitos

humitos Jan 23, 2018

Member

We need both: record=False and warn_only=True In fact, I'm thinking that record=False should implicit mean warn_only=True because the False case doesn't make sense.

This comment has been minimized.

@agjohnson

agjohnson Jan 24, 2018

Contributor

Oh, good point. I think you're right.

"""
max_retries = 5
default_retry_delay = (7 * 60)
name = __name__ + '.update_docs'
# TODO: the argument from the __init__ are used only in tests

This comment has been minimized.

@agjohnson

agjohnson Jan 23, 2018

Contributor

Perhaps some docs should exist on these? Optionally, we could use a separate pattern for this, such as a separate class method used just for testing. I have no strong opinions here though, just throwing out some options.

"""
Run a documentation build.
Run a documentation sync or sync n' build.

This comment has been minimized.

@agjohnson

agjohnson Jan 23, 2018

Contributor

What does sync mean in this context?

This comment has been minimized.

@humitos

humitos Jan 23, 2018

Member

sync means: "update docs from VCS (sync tags/branches and trigger stable build if the stable version has changed)"

Where sync is not a good summary for that :(

This comment has been minimized.

@agjohnson

agjohnson Jan 24, 2018

Contributor

Roger, this makes sense.

We do describe a lot of things as syncing (we also overuse the words backend and environment). This doesn't need to change though, I just was looking for some clarification to help understand the changes.

@@ -149,9 +145,15 @@ def run(self, pk, version_pk=None, build_pk=None, record=True,
self.build_force = force
self.config = None
if sync_only:
self.sync_repo()
return True

This comment has been minimized.

@agjohnson

agjohnson Jan 23, 2018

Contributor

Regarding my note above on the significance of sync, is there a strong reason to make this task conditionally operate like this? It seems like this should be a separate task instead?

If the purpose of this is code reuse, perhaps a mixin class that two tasks can share would make more sense.

If the sync operation requires the same amount of arguments/etc, then it's probably not a strong case for task isolation.

This comment has been minimized.

@humitos

humitos Jan 23, 2018

Member

This is something that I talked to @ericholscher, and I saw it very clear at the beginning while we were talking, but the implementation is not as good as I imagined.

I wanted to avoid the needing of handling the LocalEnvironment inside the update_imported_docs and also try to avoid the "jump to a middle step of the flow": update_imported_docs is like the third step of the whole UpdateDocsTask and was used from outside also -I didn't like that.

Besides, the use of this sync_only from outside was done only from

def _build_url(url, projects, branches):
and that endpoint is deprecated. So, I'd say that we shouldn't be calling this anymore in the future.


Regarding your proposals and considering what I wrote on this comment, maybe a Mixin class from where two different Task class inherit is the best approach: we share code, it's cleaner and easy to refactor/remove when finally we don't use the sync task alone anymore.

I will work on this implementation and see how it goes, but it should work :)

This comment has been minimized.

@humitos

humitos Jan 23, 2018

Member

I did a commit with this change at c79f9f1

Let me know if it's clearer for you now.

This comment has been minimized.

@agjohnson

agjohnson Jan 24, 2018

Contributor

Looks good! Another option could be to start using celery's builtin task chaining. Without looking deeply into this I don't know if this would clear up any confusion around the patterns we're using here. For now this accomplishes what we want however, perhaps moving to more common celery patterns can be a later batch of work.

try:
# Hit the API ``sync_versions`` which may trigger a new build
# for the stable version
api_v2.project(self.project.pk).sync_versions.post(version_post_data)

This comment has been minimized.

@agjohnson

agjohnson Jan 23, 2018

Contributor

I waffle on making this API driven, or just using executing this as a broadcast task directly. It seems calling the API is an unnecessary level of abstraction. Do you have any strong opinions here?

This comment has been minimized.

@humitos

humitos Jan 23, 2018

Member

No, I don't.

I didn't write this code, I moved from update_imported_docs and I don't fully understand it --I just added the comment :) So, I don't have a strong opinion on this.

This comment has been minimized.

@agjohnson

agjohnson Jan 24, 2018

Contributor

I'm not sure if there is historical significance here either. We should probably just address in a separate issue.

environment = os.environ.copy()
# TODO: kind of a hack
del environment['PATH']

This comment has been minimized.

@agjohnson

agjohnson Jan 23, 2018

Contributor

Probably best anyways? I think we want tight control over this.

# exception? This will cause the first `git status` (when
# importing) to fail and be marked with RED in the Build command
# details
self.record_command(build_cmd)

This comment has been minimized.

@humitos

humitos Jan 23, 2018

Member

I don't really like that because to record a command we need to be inside a build, but this LocalEnvironment knows nothings about a build. So, for this class I override it with just a pass and the *BuildEnvironment has its own to really record the commands

self.commands.append(build_cmd)
build_cmd.run()
if record and not warn_only:

This comment has been minimized.

@humitos

humitos Jan 23, 2018

Member

Not saving the warn_only commands involves another problem: "latex command are not being recorded" :(

This comment has been minimized.

@agjohnson

agjohnson Jan 24, 2018

Contributor

The use of warn_only should produce two outcomes for commands:

  • warn_only=False means that commands that fail will raise exceptions and halt the build
  • warn_only=True means that commands that fail will be recorded as failing in the database, but an exception won't be raised. The build will continue after this.

So, I don't think we want warn_only to imply that a task is not recorded, record=False should be used explicitly if we want a command that can fail and doesn't need to be recorded.

@humitos

This comment has been minimized.

Member

humitos commented Jan 24, 2018

All the commands logged with the latest changes in this branch:

captura de pantalla_2018-01-24_11-15-51

@humitos

This comment has been minimized.

Member

humitos commented Jan 30, 2018

In case we want to apply the same method that I used at #3560 (applying the styling in another PR to simplify the review), I wrote this command to do that in an easier way:

pre-commit run --files `git diff --name-only master $(git branch | grep \* | cut -d ' ' -f2)`

It runs pre-commit only in the files that where modified in the current branch.

@agjohnson

Looks great! I'm going to give this some QA before merge, but I just raised a couple of questions on some of the notes we left in code. We can probably update this to make sure our future selves know exactly what needs to change.

"""
if record is None:
# ``self.record`` only exists when called from ``*BuildEnvironment``
record = getattr(self, 'record', False)

This comment has been minimized.

@agjohnson

agjohnson Feb 16, 2018

Contributor

Raised in another comment of mine, it seems we probably want to respect the class record if it is False.

This comment has been minimized.

@humitos

humitos Feb 16, 2018

Member

If we do that, once you initialize the class with False there is no way to change the behaviour for a specific command. I supposed that we want to override the default value for a specific command.

This would be confusing from my point of view:

  1. initialize the class with False
  2. call this method with record=True
  3. it won't be recorded :/

To me, it's the same (opposite) case as "initialize the class with True but this and this command I don't want to record, so I call it False"

@@ -310,7 +416,9 @@ def __init__(self, project=None, version=None, build=None, config=None,
self.environment = environment or {}
self.update_on_success = update_on_success
# TODO: remove this one, comes from super

This comment has been minimized.

@agjohnson

agjohnson Feb 16, 2018

Contributor

Anything blocking this change?

This comment has been minimized.

@humitos

humitos Feb 16, 2018

Member

No. Done

if record:
# TODO: I don't like how it's handled this entry point here since
# this class should know nothing about a BuildCommand (which are the
# only ones that can be saved/recorded)

This comment has been minimized.

@agjohnson

agjohnson Feb 16, 2018

Contributor

I'm pretty sure we discussed this already, so apologies if we're rehashing this :)

What would be a better pattern to use here? Or is there something more specific we can assign to ourselves with this TODO?

This comment has been minimized.

@humitos

humitos Feb 16, 2018

Member

Well, we don't have an strong opinion in how to improve this but I was thinking on something like how the signal works: pre_something, post_something, etc. The behaviour will be the same, but it will be correctly named instead of using record for something that know nothing about recording the commands.

So, we could call methods like pre_run_command and post_run_command so inherited classes can do whatever they want (in this example, the post_run_command will record it)

What do you think? In case you like it, another PR or the same one?

This comment has been minimized.

@agjohnson

agjohnson Feb 16, 2018

Contributor

Oh, interesting thought, I do like the plan. I don't think we need to do this now, but we could always turn this todo into a ticket. I have a "Refactoring" milestone that we can assign this to if so -- this would be a great place for people to jump in and clean up code.

humitos added some commits Jan 18, 2018

Abstraction for UpdateDocsTask and SyncRepositoryTask
These are two separated tasks that share some code by inheriting from
SyncRepositoryMixin.

The final goal (as ``core.views.hooks._build_url`` is *DEPRECATED*) is
to finally remove the SyncRepositoryTask and merge that code into
UpdateDocsTask.
Allow to not record specific command
Either `record=False` or `warn_only=True` commands are not
recorded. This attributes can be set at initialization time or when
calling the `run()` method.
Use record_as_success instead of force_success
We need the real exit_code for some command since there are decisions
based on that code, but we want to record it as success.

So, the fake exit_code is saved in the database but the real exit_code
is used in the whole flow of the building process.

@agjohnson agjohnson merged commit 3c34456 into master Feb 16, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@agjohnson agjohnson deleted the humitos/vcs/log-checkout branch Feb 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment