Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

liveness-based slicing should detect version of live symbols at the time the cell was run #54

Closed
smacke opened this issue Apr 10, 2021 · 4 comments · Fixed by #60
Closed
Assignees

Comments

@smacke
Copy link
Member

smacke commented Apr 10, 2021

Example failure:

# cell 1
x = 0

# cell 2
if True:
    y = 7
else:
    # even though this branch is not taken,
    # liveness-based usage should detect the
    # version of `x` used at the time it was
    # live, meaning cell 1 should get included
    # in the slice
    print(x)

# cell 3
x = 5

# cell 4
print(x + y)

The slice for reconstructing cell 4 should use all of cells 1, 2, 3, and 4, but the current slicer will leave out cell 1, since it will detect the more recent version of x used in cell 3 during liveness checking and therefore leave out the definition in cell 1.

@shreyashankar
Copy link
Contributor

Yeah my test case also has this issue, I realize

@smacke
Copy link
Member Author

smacke commented Apr 11, 2021

Suggested way to go about doing this: in safety.py, there's a function called _precheck_for_stale that is run before a cell is executed. In this function, we run liveness analysis and bail if it's detected that the cell has any live references to stale symbols.

This is a good place to record, for each live symbol, the symbol's timestamp / defined_cell_num at the time of liveness. We can add a field to DataSymbol to record this info; e.g. some kind of dictionary called version_by_liveness_timestamp that maps the current cell number to a symbol's defined_cell_num each time that symbol is live in _precheck_for_stale. There is a comment block in _precheck_for_stale with the suggested spot for where to update this field in each live data symbol with a TODO; i.e. it looks something like this:

    def _precheck_for_stale(self, cell: str) -> bool:
        ...
        ...
        ...
        # TODO: For each of the live symbols, record their `defined_cell_num` at the
        #   time of liveness, i.e. at the time `self.cell_counter()`, for use with the
        #   dynamic slicer.
        ###########
        # CODE HERE
        ###########

        self._last_refused_code = None
        return False

From here, we can use the same strategy for live references as we use for dynamic usages; i.e., in get_dependencies, we can grab all the static dependencies for every cell and then aggregate them along with all the dynamic ones:

        for sym in self.all_data_symbols():
            for live_timestamp, version in sym.version_by_liveness_timestamp.items():
                cell_num_to_static_deps[live_timestamp].add(version)

@shreyashankar
Copy link
Contributor

One other issue we may want to address is that sometimes the dynamic dependency set for cell n includes cell m where m > n. This gives us extra cells we may not need. For example:

# cell 1
x = 0

# cell 2
if True:
    y = 7
else:
    # even though this branch is not taken,
    # liveness-based usage should detect the
    # version of `x` used at the time it was
    # live, meaning cell 1 should get included
    # in the slice
    print(x)

# cell 3
x = 5

# cell 4
print(y)

In PR #60, calling get_dependencies for cell 4 will result in cells {1, 2, 3, 4} when you really only need cells {1, 2, 4}. (even cell 1 is not necessary, but this is a different problem)

@smacke
Copy link
Member Author

smacke commented Apr 17, 2021

I think I went in and fixed this for dynamic dependencies already -- I think this is happening because we're still running the liveness checker in _get_cell_dependencies, and the live symbols retrieved therein will all be the latest versions. For example, when we _get_cell_dependencies on cell 2, we'll see that x is live, but its version will already be associated with the value in cell 3, which is why we get the spurious extra dependency. With your most recent fix, I think we can just eschew running liveness analysis just-in-time altogether and instead only use the information you recorded in the PR for this issue. See 247dafe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants