Add API to get non-standard fields for contributor #1321

TaiWilkin · 2021-04-23T16:19:05Z

Overview

Returns a list of all the field names that a contributor has submitted
either by API or list, minus our standard fields (country, name,
address, ppe_*, lat, lng).

Fields are parsed from list headers and from the keys of single-facility
API submissions.

Connects #1313

Demo

Testing Instructions

Login as c2@example.com.
Navigate to facilities_create, authorize as c2@example.com, and submit a facility with custom fields:

{
	"country": "China",
	"name": "Nantong Jackbeanie Headwear &amp; Garment Co. Ltd.",
	"address": "No.808,the third industry park,Guoyuan Town,Nantong 226500.",
	"ppe_product_types": ["Masks", "Gloves"],
	"ppe_contact_phone": "123-456-7890",
	"ppe_contact_email": "ppe@example.com",
	"ppe_website": "https://example.com/ppe",
	"test": "test data"
}

Submit a second facility with custom fields, some repeating:

{
	"country": "China",
	"name": "Nantong Jackbeanie Headwear 2, Garment Co. Ltd.",
	"address": "No.809,the third industry park,Guoyuan Town,Nantong 226500.",
	"ppe_product_types": ["Masks", "Gloves"],
	"ppe_contact_phone": "123-456-7890",
	"ppe_contact_email": "ppe@example.com",
	"ppe_website": "https://example.com/ppe",
	"test": "test data!!",
	"custom_field": "field data"
}

Submit and process a list with custom fields test_list.csv
Navigate to api/nonstandard-fields/
- You should see a list of custom fields with includes each of your custom fields exactly once

Checklist

fixup! commits have been squashed
CI passes after rebase
CHANGELOG.md updated with summary of features or fixes, following Keep a Changelog guidelines

jwalgran

This is a clear implementation of the feature as discussed and the "happy path" works as described. I did find some edge cases and a lingering bug that require us to make some fixes. I have mentioned them inline and here is a test case that exercises them which can be used for some test-driven fixing.

class NonstandardFieldsApiTest(APITestCase):
    def setUp(self):
        self.url = reverse('nonstandard-fields-list')
        self.user_email = 'test@example.com'
        self.user_password = 'example123'
        self.user = User.objects.create(email=self.user_email)
        self.user.set_password(self.user_password)
        self.user.save()

        self.contributor = Contributor \
            .objects \
            .create(admin=self.user,
                    name='test contributor 1',
                    contrib_type=Contributor.OTHER_CONTRIB_TYPE)

        self.list = FacilityList \
            .objects \
            .create(header='country,name,address,extra_1',
                    file_name='one',
                    name='First List')

        self.list_source = Source \
            .objects \
            .create(facility_list=self.list,
                    source_type=Source.LIST,
                    is_active=True,
                    is_public=True,
                    contributor=self.contributor)

        self.list_item = FacilityListItem \
            .objects \
            .create(name='Item',
                    address='Address',
                    country_code='US',
                    row_index=1,
                    geocoded_point=Point(0, 0),
                    status=FacilityListItem.CONFIRMED_MATCH,
                    source=self.list_source)

        self.api_source = Source \
            .objects \
            .create(source_type=Source.SINGLE,
                    is_active=True,
                    is_public=True,
                    contributor=self.contributor)

        self.api_list_item = FacilityListItem \
            .objects \
            .create(name='Item',
                    address='Address',
                    country_code='US',
                    raw_data="{'country': 'US', 'name': 'Item', 'address': 'Address', 'extra_2': 'data'}",
                    row_index=1,
                    geocoded_point=Point(0, 0),
                    status=FacilityListItem.CONFIRMED_MATCH,
                    source=self.api_source)

    def test_nonstandard_fields(self):
        self.client.login(email=self.user_email,
                          password=self.user_password)
        response = self.client.get(self.url)
        self.assertEqual(response.status_code, status.HTTP_200_OK)
        content = json.loads(response.content)
        self.assertTrue(2, len(content))
        self.assertIn('extra_1', content)
        self.assertIn('extra_2', content)

    def test_doublequote_header(self):
        self.list.header='"country","name","address","extra_1"',
        self.list.save()

        self.client.login(email=self.user_email,
                          password=self.user_password)
        response = self.client.get(self.url)
        self.assertEqual(response.status_code, status.HTTP_200_OK)
        content = json.loads(response.content)
        self.assertIn('extra_1', content)

    def test_escaped_singlequote_in_api_data(self):
        self.api_list_item.raw_data = "{'country': 'US', 'name': 'Item', 'address': 'Address', 'extra_2': 'd\'ataé'}"
        self.api_list_item.save()

        self.client.login(email=self.user_email,
                          password=self.user_password)
        response = self.client.get(self.url)
        self.assertEqual(response.status_code, status.HTTP_200_OK)
        content = json.loads(response.content)
        self.assertIn('extra_2', content)

    def test_querydict_in_api_data(self):
        # Production data at the time this test was written includes a mix of
        # strigified dicts and stringified QueryDicts
        self.api_list_item.raw_data = "<QueryDict: {'name': ['Item'], 'country': ['US'], 'address': ['Address'], 'extra_2': ['data']}>"
        self.api_list_item.save()

        self.client.login(email=self.user_email,
                          password=self.user_password)
        response = self.client.get(self.url)
        self.assertEqual(response.status_code, status.HTTP_200_OK)
        content = json.loads(response.content)
        # We plan to only store raw data as CSV or JSON in the future. Legacy
        # data can be ignored
        self.assertNotIn('extra_2', content)

jwalgran · 2021-04-26T20:20:51Z

src/django/api/views.py

+                                            'raw_data', flat=True)
+    single_facility_fields = []
+    for single_facility in single_facilities:
+        fields = list(json.loads(single_facility.replace("'", '"')).keys())


This method of converting a stringified Python dict to a JSON object works most of the time, but there are a handful of rows in the list item table that have escaped single quotes in them (\'). As of writing this, the following query returns 9 records in production

select raw_data from api_facilitylistitem i join api_source s on i.source_id = s.id where s.source_type ='SINGLE' and raw_data ilike '%\\''%';

We can trigger this issue in development by POSTing this example

{ "country": "China", "name": "SO MANY SHIRTS", "address": "No.1000,the third industry park,Guoyuan Town,Nantong 226500.", "custom_single_quote": "field's data" }

The fact that the raw data is being saved this way is a bug (#1052), and it is worse than just stringified dicts. We also have stringified QueryDicts mixed in to our raw_data fields as well.

Because we are only interested in additional data submitted in the future, I think that for now we can just catch json.decoder.JSONDecodeError within the loop and ignore it. After changing the API endpoint to properly serialize raw_data as JSON we can revisit how dealing with the legacy values.

I added a double try/except block to try handling the raw_data as JSON. If it fails, it jumps into the nested try/except to try changing the single quotes to double quotes and parsing that; failing that, it doesn't include the fields from that object in the response. This still doesn't handle all of our cases of legacy data, but it does allow at least some legacy data to be parsed, although it's not pretty. I'm open to alternative options for certain.

The nested fallback is a good approach. Thanks.

jwalgran · 2021-04-26T22:57:27Z

src/django/api/views.py

+                                                'header', flat=True)
+    list_fields = []
+    for header in list_headers:
+        list_fields = list_fields + header.split(",")


It is legal for CSV headers and fields to all be wrapped in quotes

"country","name","address" "US","Azavea","990 Spring Garden St 5th Floor, Philadelphia, PA 19123"

azavea-with-everything-quoted.csv

The field name extractor as currently written does not remove these quotes.

jwalgran

Thanks for working through these these problematic issues with real world data. I was no longer able to break it and confirmed the test instructions.

Returns a list of all the field names that a contributor has submitted either by API or list, minus our standard fields (country, name, address, ppe_*, lat, lng). Fields are parsed from list headers and from the keys of single-facility API submissions.

TaiWilkin · 2021-05-03T21:03:36Z

Thanks for the review and feedback!

TaiWilkin requested a review from jwalgran April 23, 2021 16:38

TaiWilkin assigned jwalgran Apr 23, 2021

jwalgran reviewed Apr 26, 2021

View reviewed changes

jwalgran assigned TaiWilkin Apr 26, 2021

jwalgran added the task 005 Enable contributor to upload facility data fields that are visible only on the contributor's embed label Apr 26, 2021

TaiWilkin requested a review from jwalgran April 30, 2021 16:02

jwalgran approved these changes Apr 30, 2021

View reviewed changes

jwalgran removed their assignment Apr 30, 2021

TaiWilkin force-pushed the tw/add-nonstandard-fields-api branch from 33830db to 3aeb0d9 Compare May 3, 2021 20:44

TaiWilkin merged commit 1d0cb7d into develop May 3, 2021

TaiWilkin deleted the tw/add-nonstandard-fields-api branch May 3, 2021 21:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add API to get non-standard fields for contributor #1321

Add API to get non-standard fields for contributor #1321

TaiWilkin commented Apr 23, 2021 •

edited by jwalgran

jwalgran left a comment

jwalgran Apr 26, 2021

TaiWilkin Apr 30, 2021

jwalgran Apr 30, 2021

jwalgran Apr 26, 2021

jwalgran left a comment

TaiWilkin commented May 3, 2021

Add API to get non-standard fields for contributor #1321

Add API to get non-standard fields for contributor #1321

Conversation

TaiWilkin commented Apr 23, 2021 • edited by jwalgran

Overview

Demo

Testing Instructions

Checklist

jwalgran left a comment

Choose a reason for hiding this comment

jwalgran Apr 26, 2021

Choose a reason for hiding this comment

TaiWilkin Apr 30, 2021

Choose a reason for hiding this comment

jwalgran Apr 30, 2021

Choose a reason for hiding this comment

jwalgran Apr 26, 2021

Choose a reason for hiding this comment

jwalgran left a comment

Choose a reason for hiding this comment

TaiWilkin commented May 3, 2021

TaiWilkin commented Apr 23, 2021 •

edited by jwalgran