Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Add API to get non-standard fields for contributor #1321

Merged
merged 1 commit into from May 3, 2021

Conversation

TaiWilkin
Copy link
Contributor

@TaiWilkin TaiWilkin commented Apr 23, 2021

Overview

Returns a list of all the field names that a contributor has submitted
either by API or list, minus our standard fields (country, name,
address, ppe_*, lat, lng).

Fields are parsed from list headers and from the keys of single-facility
API submissions.

Connects #1313

Demo

Screen Shot 2021-04-23 at 12 07 54 PM

Testing Instructions

{
	"country": "China",
	"name": "Nantong Jackbeanie Headwear & Garment Co. Ltd.",
	"address": "No.808,the third industry park,Guoyuan Town,Nantong 226500.",
	"ppe_product_types": ["Masks", "Gloves"],
	"ppe_contact_phone": "123-456-7890",
	"ppe_contact_email": "ppe@example.com",
	"ppe_website": "https://example.com/ppe",
	"test": "test data"
}
  • Submit a second facility with custom fields, some repeating:
{
	"country": "China",
	"name": "Nantong Jackbeanie Headwear 2, Garment Co. Ltd.",
	"address": "No.809,the third industry park,Guoyuan Town,Nantong 226500.",
	"ppe_product_types": ["Masks", "Gloves"],
	"ppe_contact_phone": "123-456-7890",
	"ppe_contact_email": "ppe@example.com",
	"ppe_website": "https://example.com/ppe",
	"test": "test data!!",
	"custom_field": "field data"
}
  • Submit and process a list with custom fields test_list.csv
  • Navigate to api/nonstandard-fields/
    • You should see a list of custom fields with includes each of your custom fields exactly once

Checklist

  • fixup! commits have been squashed
  • CI passes after rebase
  • CHANGELOG.md updated with summary of features or fixes, following Keep a Changelog guidelines

Copy link
Contributor

@jwalgran jwalgran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clear implementation of the feature as discussed and the "happy path" works as described. I did find some edge cases and a lingering bug that require us to make some fixes. I have mentioned them inline and here is a test case that exercises them which can be used for some test-driven fixing.

class NonstandardFieldsApiTest(APITestCase):
    def setUp(self):
        self.url = reverse('nonstandard-fields-list')
        self.user_email = 'test@example.com'
        self.user_password = 'example123'
        self.user = User.objects.create(email=self.user_email)
        self.user.set_password(self.user_password)
        self.user.save()

        self.contributor = Contributor \
            .objects \
            .create(admin=self.user,
                    name='test contributor 1',
                    contrib_type=Contributor.OTHER_CONTRIB_TYPE)

        self.list = FacilityList \
            .objects \
            .create(header='country,name,address,extra_1',
                    file_name='one',
                    name='First List')

        self.list_source = Source \
            .objects \
            .create(facility_list=self.list,
                    source_type=Source.LIST,
                    is_active=True,
                    is_public=True,
                    contributor=self.contributor)

        self.list_item = FacilityListItem \
            .objects \
            .create(name='Item',
                    address='Address',
                    country_code='US',
                    row_index=1,
                    geocoded_point=Point(0, 0),
                    status=FacilityListItem.CONFIRMED_MATCH,
                    source=self.list_source)

        self.api_source = Source \
            .objects \
            .create(source_type=Source.SINGLE,
                    is_active=True,
                    is_public=True,
                    contributor=self.contributor)

        self.api_list_item = FacilityListItem \
            .objects \
            .create(name='Item',
                    address='Address',
                    country_code='US',
                    raw_data="{'country': 'US', 'name': 'Item', 'address': 'Address', 'extra_2': 'data'}",
                    row_index=1,
                    geocoded_point=Point(0, 0),
                    status=FacilityListItem.CONFIRMED_MATCH,
                    source=self.api_source)

    def test_nonstandard_fields(self):
        self.client.login(email=self.user_email,
                          password=self.user_password)
        response = self.client.get(self.url)
        self.assertEqual(response.status_code, status.HTTP_200_OK)
        content = json.loads(response.content)
        self.assertTrue(2, len(content))
        self.assertIn('extra_1', content)
        self.assertIn('extra_2', content)

    def test_doublequote_header(self):
        self.list.header='"country","name","address","extra_1"',
        self.list.save()

        self.client.login(email=self.user_email,
                          password=self.user_password)
        response = self.client.get(self.url)
        self.assertEqual(response.status_code, status.HTTP_200_OK)
        content = json.loads(response.content)
        self.assertIn('extra_1', content)

    def test_escaped_singlequote_in_api_data(self):
        self.api_list_item.raw_data = "{'country': 'US', 'name': 'Item', 'address': 'Address', 'extra_2': 'd\'ataé'}"
        self.api_list_item.save()

        self.client.login(email=self.user_email,
                          password=self.user_password)
        response = self.client.get(self.url)
        self.assertEqual(response.status_code, status.HTTP_200_OK)
        content = json.loads(response.content)
        self.assertIn('extra_2', content)

    def test_querydict_in_api_data(self):
        # Production data at the time this test was written includes a mix of
        # strigified dicts and stringified QueryDicts
        self.api_list_item.raw_data = "<QueryDict: {'name': ['Item'], 'country': ['US'], 'address': ['Address'], 'extra_2': ['data']}>"
        self.api_list_item.save()

        self.client.login(email=self.user_email,
                          password=self.user_password)
        response = self.client.get(self.url)
        self.assertEqual(response.status_code, status.HTTP_200_OK)
        content = json.loads(response.content)
        # We plan to only store raw data as CSV or JSON in the future. Legacy
        # data can be ignored
        self.assertNotIn('extra_2', content)

'raw_data', flat=True)
single_facility_fields = []
for single_facility in single_facilities:
fields = list(json.loads(single_facility.replace("'", '"')).keys())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method of converting a stringified Python dict to a JSON object works most of the time, but there are a handful of rows in the list item table that have escaped single quotes in them (\'). As of writing this, the following query returns 9 records in production

select raw_data from api_facilitylistitem i join api_source s on i.source_id = s.id where s.source_type ='SINGLE' and raw_data ilike '%\\''%';

We can trigger this issue in development by POSTing this example

{
	"country": "China",
	"name": "SO MANY SHIRTS",
	"address": "No.1000,the third industry park,Guoyuan Town,Nantong 226500.",
	"custom_single_quote": "field's data"
}

The fact that the raw data is being saved this way is a bug (#1052), and it is worse than just stringified dicts. We also have stringified QueryDicts mixed in to our raw_data fields as well.

Because we are only interested in additional data submitted in the future, I think that for now we can just catch json.decoder.JSONDecodeError within the loop and ignore it. After changing the API endpoint to properly serialize raw_data as JSON we can revisit how dealing with the legacy values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a double try/except block to try handling the raw_data as JSON. If it fails, it jumps into the nested try/except to try changing the single quotes to double quotes and parsing that; failing that, it doesn't include the fields from that object in the response. This still doesn't handle all of our cases of legacy data, but it does allow at least some legacy data to be parsed, although it's not pretty. I'm open to alternative options for certain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested fallback is a good approach. Thanks.

'header', flat=True)
list_fields = []
for header in list_headers:
list_fields = list_fields + header.split(",")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is legal for CSV headers and fields to all be wrapped in quotes

"country","name","address"
"US","Azavea","990 Spring Garden St 5th Floor, Philadelphia, PA 19123"

azavea-with-everything-quoted.csv

The field name extractor as currently written does not remove these quotes.

@jwalgran jwalgran added the task 005 Enable contributor to upload facility data fields that are visible only on the contributor's embed label Apr 26, 2021
@TaiWilkin TaiWilkin requested a review from jwalgran April 30, 2021 16:02
Copy link
Contributor

@jwalgran jwalgran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through these these problematic issues with real world data. I was no longer able to break it and confirmed the test instructions.

@jwalgran jwalgran removed their assignment Apr 30, 2021
Returns a list of all the field names that a contributor has submitted
either by API or list, minus our standard fields (country, name,
address, ppe_*, lat, lng).

Fields are parsed from list headers and from the keys of single-facility
API submissions.
@TaiWilkin TaiWilkin force-pushed the tw/add-nonstandard-fields-api branch from 33830db to 3aeb0d9 Compare May 3, 2021 20:44
@TaiWilkin
Copy link
Contributor Author

Thanks for the review and feedback!

@TaiWilkin TaiWilkin merged commit 1d0cb7d into develop May 3, 2021
@TaiWilkin TaiWilkin deleted the tw/add-nonstandard-fields-api branch May 3, 2021 21:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
task 005 Enable contributor to upload facility data fields that are visible only on the contributor's embed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants