New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canalespy gsheet helper functions #961
base: main
Are you sure you want to change the base?
Conversation
`Args:` | ||
method: str | ||
The name of the Parsons GoogleSheets method to be attempted | ||
i: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of exposing recursion control parameters to the public API, it is probably going to be cleaner to define an inner-function like this:
def foo(a, b):
def inner_foo(a, b, i):
# <Do Recursive Stuff Here...>
inner_foo(a, b, 0)
else: | ||
ordinal = "th" | ||
|
||
logger.info(f"trying to {method} for the {i}{ordinal} time") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a debug log instead of an info log?
if i < max: | ||
time.sleep(wait_time) | ||
i += 1 | ||
output = self.attempt_gsheet_method(method, i, max, wait_time, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just do return self.attempt_gsheet_method(...)
instead of saving it as output
?
(It doesn't really matter because Python doesn't support tail call optimization anyway, but I do think it's cleaner...)
logger.info(f"trying to {method} for the {i}{ordinal} time") | ||
if i < max: | ||
time.sleep(wait_time) | ||
i += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of incrementing the argument, can we call the recursive call with i+1
? That way we're not mutating our arguments, which is bad-ish practice.
@@ -395,6 +399,148 @@ def format_cells(self, spreadsheet_id, range, cell_format, worksheet=0): | |||
ws.format(range, cell_format) | |||
logger.info("Formatted worksheet") | |||
|
|||
def attempt_gsheet_method(self, method, i=1, max=6, wait_time=15, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we adjust this to also take an *args
splat variable? That's the standard way to solve the positional arguments problem that you note in the docstring. The Python standard for function passthrough like you're doing here is to do:
def call_foo(my_function, *args, **kwargs):
# do stuff...
my_function(*args, **kwargs)
|
||
# Non-DB table options yield a list, convert to Parsons table with default worksheet col | ||
if sheet_id_list: | ||
sheet_id_tbl = Table( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit of making this a table instead of a list of dictionaries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it just be sheet_id_tbl = [{"sheet_id": x, "worksheet_id": 0} for x in sheet_id_list]
? I think that would even work as-is. (Although we'd probably want to change the name...)
|
||
# Accumulate and materialize | ||
combined.concat(data) | ||
temp_files.append(combined.materialize_to_file()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit to using the materialize_to_file
method? Looking at the source, it looks like it's still updating the table in memory?
combined.concat(data) | ||
temp_files.append(combined.materialize_to_file()) | ||
|
||
return combined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably close out all of the remaining temp files before we return. In fact, we might want to wrap that in a finally
block to make sure we don't leave temp files around in case this function errors.
data.add_column("spreadsheet_id", sheet_id[id_col]) | ||
|
||
# Retrieve sheet title (with attempts to handle rate limits) and add as a column | ||
globals()["sheet_obj"] = self.attempt_gsheet_method( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of polluting the global namespace, we should be able to store it in the current instance. This line and the next should probably be something like:
self.__sheet_obj = self.attempt_gsheet_method(...)
sheet_title = str(self.attempt_gsheet_method("self.__sheet_obj.title")
del self.__sheet_obj
"gs.gspread_client.open_by_key", key=sheet_id[id_col] | ||
) | ||
sheet_title = str(self.attempt_gsheet_method("sheet_obj.title")) | ||
data.add_column("spreadsheet_title", sheet_title) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you document the two columns that this function will add to the data in the docstring?
|
||
# Retrieve sheet title (with attempts to handle rate limits) and add as a column | ||
globals()["sheet_obj"] = self.attempt_gsheet_method( | ||
"gs.gspread_client.open_by_key", key=sheet_id[id_col] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the culprit of your test failures. gs
isn't in scope here. I think that it's supposed to be some GoogleSheets related module that gets imported as gs
?
Added some internal tmc canalespy gsheet functions to parsons! One function recursively re-attempts a specified gsheet method, while the other combines data from multiple gsheets into one parsons table. Also, not quite sure why there's the import issue in the unit tests?