-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load HTML and/or Consider Refactoring #10
Comments
Funny. I've just had the same conversation with another user. We decided the rename about 25 minutes ago. |
In regard to HTML import. I'm totally okay with that. If you want to write the code and make a pull request I can merge it. |
I've just pushed a refactor of the code in the latest commit. Could you have a look and tell me if it's easier to navigate? |
Summary thus far:
|
You can have the latest version from pip as I've just packaged it. |
@danieldjewell reg. the HTML import: Is it something like this you wanted to do: ? |
Hehe. It is. Must be thinking on the same wavelength. 😁 🍻
Yeah. Both the rename and refactoring helps loads. (The fact that the rename is a breaking change sucks, but I guess it's kinda like "Well, let's get this one over with because it's just going to be harder in the future." 😁 )
I would suggest moving For example -- if you do a: dir(tablite) ... the result is nearly (?) everything in the package. ImportsSemi Long Philosophical Paragraph FollowsSide note/philosophical note/disclaimer: I come from a pretty strong object oriented background (PHP/Java/C#/C++) - For whatever reason, I never really liked the Pythonic way of importing specific classes/functions from a package (e.g. That said, when a package imports nothing (see pyca/cryptography) I find it rather infuriating. (I personally do a lot of prototyping using either JupyterLab or more often an ipython session in a terminal - having to run multiple tl;dr;: I recognize that my opinions may not be representative of what other Python developers might say. (The Python community seems to have lots of differing opinions on lots of things... ) That said, have a look at pandas' init.py and xarray's init.py (or even numpy's but that one has a lot more going on in it than is necessary to illustrate my point). Basically, everything that's needed for a user/developer to use/interact with the package is imported in to the root namespace. So after a So, specifically for Lines 744 to 754 in 2072e8e
One interesting other way to do it would be to put the entire " class GroupBy(object):
from tablite.groupby_utils import GroupbyFunction, Max, Min, Sum, First, Last, Count, CountUnique, Average, StandardDeviation, Median, Mode Of course this would be breaking if anyone is using import tablite as tl
tab = tl.Table(...)
g = tab.groupby(keys=['a','b'],
functions=[('f', tl.gb.Max)])
## if "gb" is just an alias for the GroupBy class and the GroupBy functions (Max,Min, etc.) are imported into that class, this becomes possible:
g = tab.gb(keys=['a', 'b'],
.... )
### OR if you do something like
import tablite.GroupBy as gb
## then
g = tab.groupby(keys=['a','b'],
functions=[('f', gb.Max)])
### This is what I'm referring to when I mentioned the way I like to personally import things and not import things into the main global namespace It might also be interesting to handle groupby a bit like pandas -- at least just the syntax -- e.g. HTML Tables
Kinda. That particular code example is the general idea... And having sort of a 2-stage option would be really nice I think:
It just now occurred to me that pandas actually has a read_html() function ... The trouble always is that there are so many ways to write an HTML table. Colspan? Rowspan? (And I'm not even going to go to that place where some tables are built using [Sorry this got kinda long... I started writing and then it was like "oh! one more idea! oh! can't forget to mention this other thing..." 😁 ] |
Hi Daniel, TL;DR: I think you've made a good case. Let's get it right. I've refactored into the branch name_space_review for you to review. All tests pass. ImportsAs the only things the user really needs are
For pycharm users, the helper will give a compact presentation as shown below: and for jupyter users
There are dependencies between
which uses the Shorthand will also work fine: I've updated groupby_tests.py, but am unsure if there is a better way than this: tablite/tests/groupby_tests.py Lines 1 to 3 in cac3a90
tablite/tests/groupby_tests.py Lines 34 to 47 in cac3a90
https://github.com/root-11/tablite/blob/name_space_review/tests/groupby_tests.py#L34 HTML reader.As you want to extend the file_readers with your own html_reader you can use:
.... Longer answer too, but in the face of ambiguity ten extra words may save ten days of work ;-) |
@danieldjewell - Hi - I'm just picking up this thread. Are you still reviewing? |
@danieldjewell |
Just stumbled across tablite. Really cool, definitely can agree with the use case (pandas and numpy are amazing, but also super massive when you just want to deal with a 5 column 20 row table.)
What are your thoughts on:
bs4.element.Tag
(the result ofpage.select_one('table#tabid')
from BeautifulSoup)? (Although I try to avoid suggesting adding dependencies, BeautifulSoup seems to be popular enough that making it an optional dependency for parsing HTML tables seems like a reasonable compromise to avoid having to parse manually - or with html5lib/lxml)table
totablite
to match the project name?import «packagename»
- which in this case, obviously won't worktable.py
file somewhere in a project (or the current directory) are much higher than someone having atablite.py
file -- this would create an import conflict and would break tablite from loading. (Again, not sure if there's PEP guidance on this, but as a general rule, I think avoiding naming packages with common words is a good idea. For submodules/packages this isn't an issue if the main/parent name is completely unique.)The text was updated successfully, but these errors were encountered: