-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import donation data #42
Conversation
a705933
to
cedf21e
Compare
Implements two layers in the CSV importer command to allow the importion of huge CSV files: * `DiskCache` is a cache based on the OS filesystem, but which can be used together with Django cache (usually in memory) for greater performance * `CsvSpliter` splits a CSV file in numerous smaller CSV files so the system can process them with little hassle in terms of memory
cedf21e
to
fae0786
Compare
ae12911
to
ebf3729
Compare
@leportella I think this is ready for code review. I found out that even if the I think this is enough to code review and eventually merge this PR, but I'm opening an issue to debug data and the import process on both ends ( |
from perfil.person.models import Person | ||
|
||
|
||
def election_keys(election): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initally I thought about putting this in the ImportCSVCommand or another class to be inherited by our commands, so you wouldn't have to import it everytime. Don't have any problem to leave it this way, but maybe you didn't think about the other possibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It cannot be a method of ImportCsvCommand
(or Command
) otherwise we cannot access in in the definition of our custom Command
s. What I mean is that where we have:
class Command(ImportCsvCommand):
model = …
bulk_size = …
to_cache (Election, election_keys)
…
It would be an invalid Python code to use:
class Command(ImportCsvCommand):
model = …
bulk_size = …
to_cache (Election, self.election_keys)
…
This is invalid becauseself
doesn't exist yet. Some alternatives I've though about (but discarded them all):
- using
(Election, ImportCsvCommand.election_key)
would be possible if we use the@staticmethod
decorator, but unnecessarily verbose IMHO - Defining it as the default method to cache
Election
so we don't have to pass it in theto_cache
instance variable would work, but would put some boundaries in the way we do lookups in the database and cache for each given model (making it difficult or unnecessarily verbose to customize) - We could define it per
load_whatever.py
file, but we would repeat code in our code base (election_keys
as it is is used in two or three commands I guess)
That's why it is the way it is. Does it make sense? Do you prefer any other implementation? Let me know. IMHO importing it every time is tolerable comparing to the downside of the three alternatives I could think of. What about you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes total sense. Thank you for the detailed explanation :)
class Command(ImportCsvCommand): | ||
|
||
help = dedent(""" | ||
Import CSV generated by: https://github.com/rafapolo/tribuna |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can start a proper documentation to add this kind of comments? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only concern regarding this suggestion: I don't want to keep two documentation references. So if we document it elsewhere (let's say a .md
file or GitHub wiki) we have to decide between:
- assuming the risk of having non-matching documentation and command help text (that is to say, we might update the
.md
file and forget to change thehelp
variable for the command) - or document everything elsewhere (
.md
or GitHub wiki in the exemple) creating a downside in terms of UX, that is to say: when the user is in fact at the terminal trying to import data, the help text wouldn't be helpful (would point him to the browser or to the.md
file)
That's why IMHO documenting Django commands should be it in the help message of the command themselves: no duplicates to be kept in sync, and the information is delivered in the environment in which it is needed.
What do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I am not sure if docs are the best approach. I don't have a define argument for defend it or not.
perfil/election/models.py
Outdated
class Donation(models.Model): | ||
election = models.ForeignKey(Election, on_delete=models.CASCADE) | ||
donator = models.CharField(max_length=250) | ||
donator_id = models.CharField(max_length=14, null=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the CharField fields should not have null=True
because we could both have empty string and None as a value. We should change it to blank=True
Avoid using null on string-based fields such as CharField and TextField. If a string-based field has null=True, that means it has two possible values for “no data”: NULL, and the empty string. In most cases, it’s redundant to have two possible values for “no data;” the Django convention is to use the empty string, not NULL.
on this part of documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in e8af033
@cuducos I think you created a nice structure and that will be easier to use. I just left you a comment about the models, and some suggestions I've been thinking about :) |
@cuducos can you please fix the codeclimate issue so we can merge this? |
Done in 9932c0b but TBH I tend to value PEP8 in a more flexible way, as Raymond Hettinger argues ; ) |
This PR depends on
Election
data to be available and, so, is untested (thus, WIP).