Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support copying specific columns #15

Closed
aboutaaron opened this issue Oct 20, 2015 · 1 comment
Closed

Support copying specific columns #15

aboutaaron opened this issue Oct 20, 2015 · 1 comment

Comments

@aboutaaron
Copy link
Collaborator

So, on occasion, I'll have a dataset where I want to keep the CSV intact but want to only load specific columns into a database. While I could manually do without COPY, django-postgres-copy is way quicker. I can also just remove the columns I'm not using, but I'll then need to maintain two datasets: the original and the one modified to load into the db.

So, is there a way to implement specific column targeting with PostgresSQL COPY? Here's what I explored.

If you create a dict() with your specific headers, django-postgres-copy will throw a KeyError:

# https://github.com/california-civic-data-coalition/django-postgres-copy/blob/master/postgres_copy/__init__.py#L42
...
        # Connect the headers from the CSV with the fields on the model
        self.field_header_crosswalk = []
        inverse_mapping = {v: k for k, v in self.mapping.items()}
        for h in self.get_headers():
            try:
                f_name = inverse_mapping[h]
            except KeyError:
                raise ValueError("Map does not include %s field" % h)
            try:
                f = [f for f in self.model._meta.fields if f.name == f_name][0]
            except IndexError:
                raise ValueError("Model does not include %s field" % f_name)
            self.field_header_crosswalk.append((f, h))

        self.temp_table_name = "temp_%s" % self.model._meta.db_table

We can alleviate the KeyError by checking to see if value in the headers belongs to the inverse_mapping dictionary:

        ....
        inverse_mapping = {v: k for k, v in self.mapping.items()}
        for h in self.get_headers():
            if h in inver_mapping:
                 f = [f for f in self.model._meta.fields if f.name == f_name][0] 

That worked, but then I received an error right around here: https://github.com/california-civic-data-coalition/django-postgres-copy/blob/master/postgres_copy/__init__.py#L86 in the save() method.

Unfortunately, I didn't log the errors and ended up just creating a separate dataset with the columns I needed. But, from what I researched, it seemed like the Postgres COPY function wants a 1-to-1 match from the db table and the source CSV.

Let me know what you think

@palewire
Copy link
Owner

I believe this is now solved in the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants