pip install scopespace
ScopeSpace
is a context manager whose with
block has its own local scope.
x = 5
with ScopeSpace() as foo:
x = x + 1
print(x) # 5
print(foo.x) # 6
Notice we did not overwrite x
globally. Instead, we have foo
, a namespace with it's own version of x
:
Anything we declare within a ScopeSpace
is isolated within our chosen namespace.
with ScopeSpace() as bar:
stuff = 10
print(stuff) # NameError: name 'stuff' is not defined
print(bar.stuff) # 10
In the world of dataframes and notebook environments, naming things can be tough. You're often juggling multiple references to the same underlying data. Describing one of those with a variable name is difficult. And you're often forced to version your data by changing its name across cells, to make each cell idempotent. This confuses the reader.
There's a conflict of interest here:
- While the following may be true ...
- You want to use descriptive names.
- You want to incorporate versioning/renaming so cells can be run repeatedly.
- There are a few big problems with this:
- Renaming your data 10 times can make your code seem far more complex than it really is.
- In some libraries, like Pandas, you're strongly incentivized, naturally, to use short names, since names are often used repeatedly in the same command, to access columns.
You've just created a new notebook, and have quickly jotted down some code to perform a few manipulations on a Pandas dataframe, displaying the data at each stage.
![image](https://private-user-images.githubusercontent.com/90723578/238243814-cc6860a7-093c-4169-b3da-6ff38e948eeb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDAwMDk3NjcsIm5iZiI6MTc0MDAwOTQ2NywicGF0aCI6Ii85MDcyMzU3OC8yMzgyNDM4MTQtY2M2ODYwYTctMDkzYy00MTY5LWIzZGEtNmZmMzhlOTQ4ZWViLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE5VDIzNTc0N1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTNlYjI2ZWVmZWY0OWZhMmMwNjk1MDhlNTMzMzYwMjZjYWZmYTQ2ODEzM2UzMTU1MWM2MmJmNGI0NTEyZDE3MDQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.oS982UNfi8x7GHbE7MUAj5_LGXR3xSDOnXEeaG3vF-w)
2 problems arise from this code:
- The second and third cells will error if we try to run either of them twice in a row.
- The name,
df
is not very descriptive.
So we switch to more descriptive names, and version them across cells.
![image](https://private-user-images.githubusercontent.com/90723578/238244661-5b1e85f1-85da-4a5f-981c-7b693e7b10e3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDAwMDk3NjcsIm5iZiI6MTc0MDAwOTQ2NywicGF0aCI6Ii85MDcyMzU3OC8yMzgyNDQ2NjEtNWIxZTg1ZjEtODVkYS00YTVmLTk4MWMtN2I2OTNlN2IxMGUzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE5VDIzNTc0N1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTExZTVlYzM0ODdhZTdkNTEwMTk4NTI5MGUxYmY2OGNmNWQ0OTk5MjBlYWE1MGJjNjM0ODdmZWI5N2E0MWI3N2EmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.bzZOrmYF8c6mBpJa5vqe_RXop_1h9wXuRLYDZWkW26k)
Now we've created a few more problems:
- Our code is verbose and redundant. The second line in the third cell now takes 92 characters
(the name of the dataframe occupies 57 of them), just to express
c = (c + b).astype(str)
. - It's not clear to the reader that we are only working with one dataset.
- We've created room for mistakes.
A common solution is to use functions with descriptive names and a df
parameter. This may present its own issues though:
- Introduces several unnecessary steps: 1. Declare function signature. 2. Return something. 3. Call it, and pass it arguments.
- We never intended to use it anywhere except immediately after declaration. Thus, its purpose is unclear to the reader.
![image](https://private-user-images.githubusercontent.com/90723578/238264494-2dcef8c5-5f71-4e8b-bd83-ff2034a3878c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDAwMDk3NjcsIm5iZiI6MTc0MDAwOTQ2NywicGF0aCI6Ii85MDcyMzU3OC8yMzgyNjQ0OTQtMmRjZWY4YzUtNWY3MS00ZThiLWJkODMtZmYyMDM0YTM4NzhjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE5VDIzNTc0N1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTVkYzk1OWVhMmMyOWQyMGRiMDc1MTdiYzc0Zjc2MGIyNjliYjRmMmYzM2RjYjJlNWE1OWFhYzhjZWU0NmE1NDcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.cJ7CqRyWB_dcIGvJRp2mDQSt2mwqGVEjeZshXm4NI7k)
"Namespaces are one honking great idea -- let's do more of those!" - Tim Peters, 'The Zen of Python'
Here, we took the idea of a notebook cell and gave it a logical structure - an improvement in both form, and function.
By isolating logical tasks to their own scopes:
- Each cell is expressive of its own purpose; the namespace labels are explicitly declared first
- We made our naming style consistent, with an obvious, repeatable pattern
- The code is more concise, but without sacrificing details
- We eliminated redundancy: Descriptive names (the namespace labels) no longer clutter our logic when working with the data.
By grouping similar tasks together under a common namespace:
- We get the logical grouping benefit that functions provide, but without needing to define or call one.
- We get the attribute organization benefit that class instances provide, but without having to define our own class.