Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a generic CsvImporter? #80

Closed
NickleDave opened this issue Sep 8, 2022 · 6 comments · Fixed by #81
Closed

Add a generic CsvImporter? #80

NickleDave opened this issue Sep 8, 2022 · 6 comments · Fixed by #81

Comments

@NickleDave
Copy link

This is discussion / feature request, not a bug

Related to discussion here:
rhine3/bioacoustics-software#3 (comment)

I wonder if it would make sense to add an importer/scraper that can take a .csv directly? This way the data could be munged + cleaned, e.g. by a script, before handing it off to rse.

It seems like this might be possible by decoupling some of the logic in the google sheets scraper?

Like maybe make a mixin that does the specific sheets accessing part, and have a more generic csv parser class that could then be combined with another mixin that parses a csv from a local file

So you'd have something like

class SpreadSheetImporter(ScraperBase):
    def parse(self, f, args):
        # csv parsing logic that's in GoogleSheetImporter now
        # starting with https://github.com/rseng/rse/blob/299ee56d33e4b9868deef4d624cca0608a786cbf/rse/main/scrapers/googlesheet.py#L58
        reader = csv.reader(f, delimiter=",")
        ...
        return self.results


class GoogleSheetMixIn:
    def scrape(self, args):
        ...
        f = StringIO(response.text)
        return f


class GoogleSheetImporter(SpreadSheetImporter, GoogleSheetMixin):
    ...
   def scrape(self, args):
       f = self.scrape(args)
       self.parse(f, args)


class CsvMixIn:
    def load(self, csv_path):
        ...
        f = pandas.read_csv(csv_path)


class CsvImporter(ScraperBase, CsvMixIn):
    ...
    def scrape(self, args):
        

I know this really starts to stretch the definition of "scrape" but I hope you see what I'm suggesting here in terms of decoupling the functionality

@vsoch
Copy link
Contributor

vsoch commented Sep 8, 2022

yep that would be very easy to do! All of the above looks reasonable, but we'd use the native csv module (not pandas, that would add a dependency that isn't necessary) and I wouldn't call them Mixins, but just class names for what they do!

I'll add to my queue of TODOs, thanks for the idea!

@vsoch
Copy link
Contributor

vsoch commented Sep 8, 2022

All set! Please see #81

@vsoch vsoch closed this as completed in #81 Sep 13, 2022
@NickleDave
Copy link
Author

I wouldn't call them Mixins, but just class names for what they do!

Your approach definitely solves the problem and I would not pretend to be an OO expert.
But you might find this interesting reading should you have nothing better to do: https://hynek.me/articles/python-subclassing-redux/

@vsoch
Copy link
Contributor

vsoch commented Sep 13, 2022

Thanks! We definitely all have preferences for how to do the different implementations. I think (from his list) mine is "Approach 4: Create a Common Base Class, Then Specialize."

@NickleDave
Copy link
Author

Agreed, and it's more than good enough for the CSV parsers. Thanks for your help and for being so efficient!

@vsoch
Copy link
Contributor

vsoch commented Sep 13, 2022

haha sure, that's what I'm good for! 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants