Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrator hash #197

Merged
merged 24 commits into from
Jun 21, 2018
Merged

Migrator hash #197

merged 24 commits into from
Jun 21, 2018

Conversation

CJ-Wright
Copy link
Member

@CJ-Wright CJ-Wright commented Jun 20, 2018

Closes #189
closes #138

@codecov-io
Copy link

codecov-io commented Jun 20, 2018

Codecov Report

Merging #197 into master will increase coverage by 1.25%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #197      +/-   ##
=========================================
+ Coverage   13.35%   14.6%   +1.25%     
=========================================
  Files           7       7              
  Lines         397     397              
=========================================
+ Hits           53      58       +5     
+ Misses        344     339       -5
Impacted Files Coverage Δ
conda_forge_tick/update_upstream_versions.py 0% <ø> (ø) ⬆️
conda_forge_tick/make_graph.py 0% <ø> (ø) ⬆️
conda_forge_tick/utils.py 86.36% <100%> (+22.72%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81036cb...8c206c0. Read the comment docs.

# If we've gotten this far then the node is good
attrs['bad'] = False
print('Removing feedstock dir')
rm -rf @(feedstock_dir)
return True
return migrate_return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't migrate_return always be True if we get here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes I forgot to do the upate in main.

@CJ-Wright
Copy link
Member Author

@justcalamari @scopatz ready for review! This PR will make sure that we don't re-run migrations that we have already run.

@@ -15,7 +15,9 @@ from .git_utils import (get_repo, push_repo)
# TODO: move this back to the bot file as soon as the source issue is sorted
# https://travis-ci.org/regro/00-find-feedstocks/jobs/388387895#L1870
from .migrators import *
$MIGRATORS = [Version(), Compiler()]
$MIGRATORS = [Version(),
# Compiler()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiler migration is not complete though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, isn't that why it is commented out?

Copy link
Member Author

@CJ-Wright CJ-Wright Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I need to manually migrate the graph or we'll issue PRs to all outstanding compiler migrations. This issue was one of the driving factors for making these changes.

migrator.pr_head(), migrator.remote_branch())
except github3.GitHubError as e:
if e.msg != 'Validation Failed':
raise e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be a reraise, ie raise not raise e

if e.msg != 'Validation Failed':
raise e
else:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else: pass is a no-op, that you can remove.

re-raise
properly pass migrator_hash to PRed set
return f'{n}_{attrs["new_version"]}'

def _extract_version_from_hash(self, h):
return h.split('_')[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when the version has an underscore in it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if conda allows that but I will put in some safe guards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conda does allow '_'

'smithy_version': smithy_version,
if 'PRed' not in gx.nodes[node]:
gx.nodes[node]['PRed'] = set()
gx.nodes[node]['PRed'].add(migrator_hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe do gx.nodes[node].setdefault('PRed', set()).add(migrator_hash) here to reduce lines.

or (VersionOrder(str(attrs['new_version'])) <=
VersionOrder(str(attrs['version'])))
# if PRed version is greater than newest version
or any(VersionOrder(self._extract_version_from_hash(h)) >=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sometimes h will be False. Then _extract_version_from_hash will error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, so now we default to '0.0.0'

pmy['meta_yaml'] = parse_meta_yaml(inp)
pmy['raw_meta_yaml'] = inp
pmy.update(kwargs)

m = migrator()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a single instance of each migrator for all tests? This will make the tests more robust.

(Version, multi_source, updated_multi_source, {'new_version': '2.4.1'},
'Please check that the dependencies have not changed.'),
(Version, sample_r, updated_sample_r, {'new_version': '1.3_2'},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was 1.3_2 changed to 1.3-2?

@CJ-Wright
Copy link
Member Author

@justcalamari @scopatz @isuruf Thank you for the review! Would it be possible to get another round?

@@ -89,6 +93,9 @@ class Migrator:
"""Branch to use on local and remote"""
return 'bot-pr'

def migrator_hash(self, attrs):
return f'{self.__class__.__name__}_{self._class_version}'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to remember why we didn't go with a tuple for this. A tuple might make life easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe a dict?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always change it :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that is why, I can't put dicts into sets. I can put tuples though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe I can live with lists since dicts seem nicer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A namedtuple might be what you want, actually

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will namedtuple allow me to add keys as needed or do I need to put out a new one for each set of keys?

# don't run on bad nodes
return (bool(attrs.get('archived', False)
or self.migrator_hash(attrs) in attrs.get('PRed', []))
or attrs.get('bad', False))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this is that the graph thinks that good nodes that used to be bad are still bad. We would need to set bad to False somewhere earlier in the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok, should we reset bad here? https://github.com/regro/cf-scripts/blob/master/conda_forge_tick/make_graph.py#L15
That way every node is good at least to start with.
Or would this introduce too much of a burden?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should be fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to think about this more, since failing on every bad feedstock every time the bot runs could be a problem.

@@ -89,6 +95,10 @@ class Migrator:
"""Branch to use on local and remote"""
return 'bot-pr'

def migrator_hash(self, attrs):
return {'class': self.__class__.__name__,
'class_version': self._class_version}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this a dict complicates searching with whoosh. I can probably write a new whoosh field type for list of dict, but we need to decide what key names to use. Or maybe we don't care about searching PRed and it can be left out of the schema.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how are we going to handle multi-source packages, don't they have a list of dicts?
I wouldn't worry about searching this info yet. This will most likely need to go through a refactor when libcflib has a feedstocks graph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-source packages have a list of dicts for what field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return pull request json object for later use

(eg close PRs if they are old, send pings to maintainers when the status
checks are finished)
# TODO: capture pinning here too!
gx.nodes[node].update({'PRed': attrs['new_version'],
'smithy_version': smithy_version,
migrator_hash = run(attrs=attrs, migrator=migrator, gh=gh,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run also returns PR json now.

gx.nodes[node].setdefault('PRed', []).append(migrator_hash)
# Stash the pr json data so we can access it later
gx.nodes[node].setdefault('PRed_json', {}).update(
{tuple(migrator_hash.keys()): pr_json})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a weird key. Did you mean migrator_hash.values()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ugh

add migration script for moving the graph to the new PRed format
compiler_migrations.append(node)

last_compiler_pr = 'cxxopts'
last_compiler_index = compiler_migrations.index(last_compiler_pr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is compiler_migrations in the same order every time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so (based off of the last time we hit the exact same recipes). I think it is alphabetical.

hash_type=attrs.get('hash_type', 'sha256'))

converted_hash = convert_dict_to_nt(migrator_hash)
gx.nodes[node].setdefault('PRed', set()).update(converted_hash)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add not update

# if PRed version is greater than newest version
or any(VersionOrder(self._extract_version_from_hash(h)) >=
VersionOrder(attrs['new_version']
) for h in attrs.get('PRed', [])))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs set

return n

def _extract_version_from_hash(self, h):
return h.get('version', '0.0.0')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs getattr

@CJ-Wright
Copy link
Member Author

@justcalamari @scopatz @isuruf one last round?

Once this is merged, I'll:

  1. migrate the graph manually
  2. check the graph
  3. commit/push the graph
  4. Manually trigger a PRing round (without the compiler)
  5. If everything works then I'll put the compiler migration back in and run another round and monitor it for any issues.

I think after that we should be back to normal.

@CJ-Wright
Copy link
Member Author

@mariusvniekerk (would you be interested in giving review?)

return bool(attrs.get('archived', False))
# don't run on things we've already done
# don't run on bad nodes
return bool(bool(attrs.get('archived', False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the bool casting needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to play it safe

rerender=rerender, protocol='https',
hash_type=attrs.get('hash_type', 'sha256'))

converted_hash = convert_dict_to_nt(migrator_hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have migrator_hash be a namedtuple to begin with?



class Migrator:
"""Base class for Migrators"""
rerender = False

_class_version = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if this was called migrator_version, and it seems like this should be part of the public API

@@ -89,6 +96,10 @@ class Migrator:
"""Branch to use on local and remote"""
return 'bot-pr'

def migrator_hash(self, attrs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring on this method. Also, it doesn't seem to return a hash (int)

@@ -89,6 +96,10 @@ class Migrator:
"""Branch to use on local and remote"""
return 'bot-pr'

def migrator_hash(self, attrs):
return {'class_name': self.__class__.__name__,
'class_version': self._class_version}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can drop the class_ from these keys.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or replace them with migrator_, which has more semantic meaning.

@@ -240,6 +257,15 @@ class Version(Migrator):
def remote_branch(self):
return self.attrs['new_version']

def migrator_hash(self, attrs):
n = super().migrator_hash(attrs)
n.update({'version': attrs["new_version"]})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see that this adds version

@@ -0,0 +1,29 @@
import networkx as nx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably have a #!/usr/bin/env python as a script

rerender=rerender, protocol='https',
hash_type=attrs.get('hash_type', 'sha256'))

converted_hash = migrator_uid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line can be removed and every instance of converted_hash changed to migrator_uid

@@ -44,7 +51,7 @@ class Migrator:
bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type is different now


Returns
-------
migrate_return: dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type should be namedtuple

don't write to exceptions file, just put it into the graph
style changes
@CJ-Wright
Copy link
Member Author

Interesting, trying to pickle the namedtuple is causing problems:

christopher@christopher-ubuntu ~/dev/cf-graph master $ python ../cf-scripts/migrate_to_hash.py 
Traceback (most recent call last):
  File "../cf-scripts/migrate_to_hash.py", line 30, in <module>
    nx.write_gpickle(g, '../cf-graph/graph.pkl')
  File "<decorator-gen-402>", line 2, in write_gpickle
  File "/home/christopher/mc/lib/python3.6/site-packages/networkx/utils/decorators.py", line 227, in _open_file
    result = func_to_be_decorated(*new_args, **kwargs)
  File "/home/christopher/mc/lib/python3.6/site-packages/networkx/readwrite/gpickle.py", line 70, in write_gpickle
    pickle.dump(G, path, protocol)
_pickle.PicklingError: Can't pickle <class 'conda_forge_tick.utils.converted'>: attribute lookup converted on conda_forge_tick.utils failed

@CJ-Wright
Copy link
Member Author

I think this is ready to go! I've tested this on the graph and it seems to produce the expected output.
Any last issues?
If not I'd like to have this merged by 6pm EST so I can start the PRing and do any troubleshooting.

Thank you everyone for the reviews! (sorry that this PR was a bit like pulling chicken teeth)

@CJ-Wright CJ-Wright merged commit 8619de4 into regro:master Jun 21, 2018
@CJ-Wright CJ-Wright deleted the migrator_hash branch June 21, 2018 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrator needs to put out a unique tuple Track PRs issued
5 participants