pip install scopespace
ScopeSpace
is a context manager whose with
block has its own local scope.
x = 5
with ScopeSpace() as foo:
x = x + 1
print(x) # 5
print(foo.x) # 6
Notice we did not overwrite x
globally. Instead, we have foo
, a namespace with it's own version of x
:
Anything we declare within a ScopeSpace
is isolated within our chosen namespace.
with ScopeSpace() as bar:
stuff = 10
print(stuff) # NameError: name 'stuff' is not defined
print(bar.stuff) # 10
In the world of dataframes and notebook environments, naming things can be tough. You're often juggling multiple references to the same underlying data. Describing one of those with a variable name is difficult. And you're often forced to version your data by changing its name across cells, to make each cell idempotent. This confuses the reader.
There's a conflict of interest here:
- While the following may be true ...
- You want to use descriptive names.
- You want to incorporate versioning/renaming so cells can be run repeatedly.
- There are a few big problems with this:
- Renaming your data 10 times can make your code seem far more complex than it really is.
- In some libraries, like Pandas, you're strongly incentivized, naturally, to use short names, since names are often used repeatedly in the same command, to access columns.
You've just created a new notebook, and have quickly jotted down some code to perform a few manipulations on a Pandas dataframe, displaying the data at each stage.
2 problems arise from this code:
- The second and third cells will error if we try to run either of them twice in a row.
- The name,
df
is not very descriptive.
So we switch to more descriptive names, and version them across cells.
Now we've created a few more problems:
- Our code is verbose and redundant. The second line in the third cell now takes 92 characters
(the name of the dataframe occupies 57 of them), just to express
c = (c + b).astype(str)
. - It's not clear to the reader that we are only working with one dataset.
- We've created room for mistakes.
A common solution is to use functions with descriptive names and a df
parameter. This may present its own issues though:
- Introduces several unnecessary steps: 1. Declare function signature. 2. Return something. 3. Call it, and pass it arguments.
- We never intended to use it anywhere except immediately after declaration. Thus, its purpose is unclear to the reader.
"Namespaces are one honking great idea -- let's do more of those!" - Tim Peters, 'The Zen of Python'
Here, we took the idea of a notebook cell and gave it a logical structure - an improvement in both form, and function.
By isolating logical tasks to their own scopes:
- Each cell is expressive of its own purpose; the namespace labels are explicitly declared first
- We made our naming style consistent, with an obvious, repeatable pattern
- The code is more concise, but without sacrificing details
- We eliminated redundancy: Descriptive names (the namespace labels) no longer clutter our logic when working with the data.
By grouping similar tasks together under a common namespace:
- We get the logical grouping benefit that functions provide, but without needing to define or call one.
- We get the attribute organization benefit that class instances provide, but without having to define our own class.