Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canalespy gsheet helper functions #961

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sharinetmc
Copy link
Contributor

@sharinetmc sharinetmc commented Dec 19, 2023

Added some internal tmc canalespy gsheet functions to parsons! One function recursively re-attempts a specified gsheet method, while the other combines data from multiple gsheets into one parsons table. Also, not quite sure why there's the import issue in the unit tests?

`Args:`
method: str
The name of the Parsons GoogleSheets method to be attempted
i: int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of exposing recursion control parameters to the public API, it is probably going to be cleaner to define an inner-function like this:

def foo(a, b):
    def inner_foo(a, b, i):
         # <Do Recursive Stuff Here...>
    inner_foo(a, b, 0)

else:
ordinal = "th"

logger.info(f"trying to {method} for the {i}{ordinal} time")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a debug log instead of an info log?

if i < max:
time.sleep(wait_time)
i += 1
output = self.attempt_gsheet_method(method, i, max, wait_time, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just do return self.attempt_gsheet_method(...) instead of saving it as output?

(It doesn't really matter because Python doesn't support tail call optimization anyway, but I do think it's cleaner...)

logger.info(f"trying to {method} for the {i}{ordinal} time")
if i < max:
time.sleep(wait_time)
i += 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of incrementing the argument, can we call the recursive call with i+1? That way we're not mutating our arguments, which is bad-ish practice.

@@ -395,6 +399,148 @@ def format_cells(self, spreadsheet_id, range, cell_format, worksheet=0):
ws.format(range, cell_format)
logger.info("Formatted worksheet")

def attempt_gsheet_method(self, method, i=1, max=6, wait_time=15, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we adjust this to also take an *args splat variable? That's the standard way to solve the positional arguments problem that you note in the docstring. The Python standard for function passthrough like you're doing here is to do:

def call_foo(my_function, *args, **kwargs):
    # do stuff...
    my_function(*args, **kwargs)


# Non-DB table options yield a list, convert to Parsons table with default worksheet col
if sheet_id_list:
sheet_id_tbl = Table(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of making this a table instead of a list of dictionaries?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it just be sheet_id_tbl = [{"sheet_id": x, "worksheet_id": 0} for x in sheet_id_list]? I think that would even work as-is. (Although we'd probably want to change the name...)


# Accumulate and materialize
combined.concat(data)
temp_files.append(combined.materialize_to_file())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit to using the materialize_to_file method? Looking at the source, it looks like it's still updating the table in memory?

combined.concat(data)
temp_files.append(combined.materialize_to_file())

return combined
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably close out all of the remaining temp files before we return. In fact, we might want to wrap that in a finally block to make sure we don't leave temp files around in case this function errors.

data.add_column("spreadsheet_id", sheet_id[id_col])

# Retrieve sheet title (with attempts to handle rate limits) and add as a column
globals()["sheet_obj"] = self.attempt_gsheet_method(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of polluting the global namespace, we should be able to store it in the current instance. This line and the next should probably be something like:

self.__sheet_obj = self.attempt_gsheet_method(...)
sheet_title = str(self.attempt_gsheet_method("self.__sheet_obj.title")
del self.__sheet_obj

"gs.gspread_client.open_by_key", key=sheet_id[id_col]
)
sheet_title = str(self.attempt_gsheet_method("sheet_obj.title"))
data.add_column("spreadsheet_title", sheet_title)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you document the two columns that this function will add to the data in the docstring?


# Retrieve sheet title (with attempts to handle rate limits) and add as a column
globals()["sheet_obj"] = self.attempt_gsheet_method(
"gs.gspread_client.open_by_key", key=sheet_id[id_col]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the culprit of your test failures. gs isn't in scope here. I think that it's supposed to be some GoogleSheets related module that gets imported as gs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants