Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opendata.dc.gov connection class #274

Closed
5 tasks done
NealHumphrey opened this issue May 20, 2017 · 8 comments
Closed
5 tasks done

Opendata.dc.gov connection class #274

NealHumphrey opened this issue May 20, 2017 · 8 comments
Assignees

Comments

@NealHumphrey
Copy link
Collaborator

NealHumphrey commented May 20, 2017

Opendata.dc.gov provides multiple data sets that we need. They provide an api endpoint for each data file that we need already in .csv format.

In our /python/housinginsights/sources folder there is a base.py file with a BaseApiConn class.

  • Create a new class called OpenDataApiConn() (in a new file called opendata.py) that inherits from the base class.
  • add properties to init with an appropriate path to where to save each data type (e.g. /data/raw/tax or something like that)
  • add a get_XXX method for each of the data sources (where XXX is the name of the data source)
  • use get_XXX to download the csv file and save it to the appropriate sub folder of /data/raw (e.g. /data/raw/tax or data/raw/other
  • add a get_data method that calls all of the get_XXX methods.

To start, get this data download class to work with two data sets:

@NealHumphrey
Copy link
Collaborator Author

Look at the /python/housinginsights/sources/mar.py for an example of inheriting from the base class, though this does not use the csv method

@ajhalani
Copy link
Collaborator

Relates to #187 as well.

@ajhalani
Copy link
Collaborator

Please note I wrote the base opendata api conn class in #283. Would love feedback to avoid duplicate work.

@Creence
Copy link
Collaborator

Creence commented May 21, 2017

I already wrote this yesterday and did not have a chance to check it in.

@Creence
Copy link
Collaborator

Creence commented May 22, 2017

@ajhalani I added mine because the get_data method is supposed to call all of the get_XXX methods in #284 and yours seems to be a generic function for getting a particular file. This ticket was assigned to me yesterday when I worked on it. If you want to work on something you need to assign it to yourself to avoid duplicate work.

@ajhalani
Copy link
Collaborator

ajhalani commented May 22, 2017

Ofcourse. Not my intention to do someone story assigned to someone else :). SInce story is assigned to you, I defer to your PR. My crime data story needed the base class so thank you!

@NealHumphrey - The way the requirements are written for this story, wouldn't opendata logic for pre-defined filepaths be inconsistent with existing get_api_data/mar.py. mar.py which take path of file as a paramter --output.
Also

add a get_data method that calls all of the get_XXX methods.

Will we have other non-opendata get_XXX methods, if so should this "get_data" call the non-opendata get_XXX methods as well. How would this function know which date-ranges etc. it should call?

It would be nice if @Creence's PR #284 is merged soon, since then I can extend the same for completing #187. Thanks!

@NealHumphrey
Copy link
Collaborator Author

@ajhalani I'll get this merged in today and resolve any duplicative approaches by pulling from these - the discovery of the .csv endpoint from opendata.dc.gov caused a couple things that started out as separate issues to clash with each other a bit. I'm sorry for confusion with these!

@ajhalani with respect to the get_data method - as we've started thinking about how to use all these ApiConn objects systematically (i.e. on a server run every week), I think we need to have a more consistent approach than that provided by get_api_data.py, which was our first pass at how to call these classes. Forcing every ApiConn class to have a get_data method that downloads all of the normally expected .csv methods seemed like a good approach to that, with other methods being supplemental. Then we can have a new script called get_all_api_data that just does for each c in api_conn_list: c.get_data() We haven't set up code to use this new approach yet.

@ajhalani
Copy link
Collaborator

Think we can close this story!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants