Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overload column #34

Closed
wants to merge 8 commits into from
Closed

Overload column #34

wants to merge 8 commits into from

Conversation

ckirby
Copy link
Contributor

@ckirby ckirby commented Oct 25, 2016

Cleaned up pull.
I wanted the ability to have two fields in my model to map to the same incoming data column. The second of the two fields is a transformation of the data. COPY with copy_templates runs quickly as compared to updating all the rows after a copy, and denormalizing is preferred for my use case I added this feature.
I thought it was better to be explicit with a new kwarg for overloaded_mapping as opposed to allowing multiple instances of a column in the mapping parameter.

@coveralls
Copy link

coveralls commented Oct 25, 2016

Coverage Status

Coverage remained the same at 100.0% when pulling 2be775d on ckirby:overload_column into 79f71ef on california-civic-data-coalition:master.

@palewire
Copy link
Owner

palewire commented Oct 26, 2016

Thanks for this creative patch. I don't have a strong opinion either way, but I wonder what are the merits of a new kwarg over simply allowing duplication in the standard mapping? What made you want to break this out?

@ckirby
Copy link
Contributor Author

ckirby commented Oct 28, 2016

Expediency mostly. I needed the functionality for another project, and
refactoring field_header_crosswalk (and its tests) to hold a list of fields
would have been a longer process than as another kwarg. That said, if you
like the functionality I can take the time to refactor it into the existing
parameter set.

On Thu, Oct 27, 2016 at 12:25 AM, Ben Welsh notifications@github.com
wrote:

Thanks for this creative patch. I don't have a strong opinion either way,
but I wonder are the merits of a new kwarg over simply allowing duplication
in the standard mapping? What made you want to break this out?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#34 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAFeHrOIDZmlbPdUhbqWN99ww00kDAYNks5q38VPgaJpZM4Kf6Jd
.

@palewire
Copy link
Owner

I wonder if this could be handled in a similar "it just works" way as the skip functionality from your other pull request. I am going to borrow your tests here and see if I can get that to go.

palewire added a commit that referenced this pull request Nov 11, 2016
palewire added a commit that referenced this pull request Nov 12, 2016
…nd overloaded fields 'just work' and everything is more legible and straight forward. For #34.
@palewire
Copy link
Owner

@ckirby, moments ago I pushed a sweeping refactor of the library that I believe will have excluded headers and overloaded fields "just work" without any additional configuration. I've also rearranged a lot of the mechanics of the system to make it more legible and straightforward.

Your tests are passing and we're at 100% coverage but but before I ship the code I'd love to get your feedback on the master branch.

I'm also aiming to draft a blog post promoting this new release when it's ready. As part of that I'd like to share your contributions. If you have a second, let me know a little more about your work and why this library is useful to achieve your goals.

@ckirby
Copy link
Contributor Author

ckirby commented Nov 14, 2016

@palewire - The refactor looks nice, really cleaned up the code. I took it for a test drive and it works a charm also.

My usage - my company works in the healthcare space. One of our products does automated disease and condition detection based on patient data - lots of it. Due to the sensitive nature of the data, it also is run exclusively at client sites. We use django-postgres-copy on an data aggregating/reporting/analysis tool. The client sites export line-level anonymized data to this tool -20 to 50 million rows in a shot, per month. dpc give us a nice clean and quick way to load the data into our data store while respecting the django-ness of everything. dpc also sped up the processing from a naive csvreader implementation that took hours to running in about 10 minutes.

@palewire
Copy link
Owner

palewire commented Nov 14, 2016

Great. Glad to hear it's working. I'll close this ticket and you can expect version 0.1.0 to carry the new features. I'll have a post up on www.californiacivicdata.org before too long and will share it back here with you.

@palewire palewire closed this Nov 14, 2016
@palewire
Copy link
Owner

@ckirby
Copy link
Contributor Author

ckirby commented Nov 14, 2016

Awesome - thanks for the kind words!

@ckirby ckirby deleted the overload_column branch February 22, 2017 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants