Add API to get non-standard fields for contributor #1321
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a clear implementation of the feature as discussed and the "happy path" works as described. I did find some edge cases and a lingering bug that require us to make some fixes. I have mentioned them inline and here is a test case that exercises them which can be used for some test-driven fixing.
class NonstandardFieldsApiTest(APITestCase):
def setUp(self):
self.url = reverse('nonstandard-fields-list')
self.user_email = 'test@example.com'
self.user_password = 'example123'
self.user = User.objects.create(email=self.user_email)
self.user.set_password(self.user_password)
self.user.save()
self.contributor = Contributor \
.objects \
.create(admin=self.user,
name='test contributor 1',
contrib_type=Contributor.OTHER_CONTRIB_TYPE)
self.list = FacilityList \
.objects \
.create(header='country,name,address,extra_1',
file_name='one',
name='First List')
self.list_source = Source \
.objects \
.create(facility_list=self.list,
source_type=Source.LIST,
is_active=True,
is_public=True,
contributor=self.contributor)
self.list_item = FacilityListItem \
.objects \
.create(name='Item',
address='Address',
country_code='US',
row_index=1,
geocoded_point=Point(0, 0),
status=FacilityListItem.CONFIRMED_MATCH,
source=self.list_source)
self.api_source = Source \
.objects \
.create(source_type=Source.SINGLE,
is_active=True,
is_public=True,
contributor=self.contributor)
self.api_list_item = FacilityListItem \
.objects \
.create(name='Item',
address='Address',
country_code='US',
raw_data="{'country': 'US', 'name': 'Item', 'address': 'Address', 'extra_2': 'data'}",
row_index=1,
geocoded_point=Point(0, 0),
status=FacilityListItem.CONFIRMED_MATCH,
source=self.api_source)
def test_nonstandard_fields(self):
self.client.login(email=self.user_email,
password=self.user_password)
response = self.client.get(self.url)
self.assertEqual(response.status_code, status.HTTP_200_OK)
content = json.loads(response.content)
self.assertTrue(2, len(content))
self.assertIn('extra_1', content)
self.assertIn('extra_2', content)
def test_doublequote_header(self):
self.list.header='"country","name","address","extra_1"',
self.list.save()
self.client.login(email=self.user_email,
password=self.user_password)
response = self.client.get(self.url)
self.assertEqual(response.status_code, status.HTTP_200_OK)
content = json.loads(response.content)
self.assertIn('extra_1', content)
def test_escaped_singlequote_in_api_data(self):
self.api_list_item.raw_data = "{'country': 'US', 'name': 'Item', 'address': 'Address', 'extra_2': 'd\'ataé'}"
self.api_list_item.save()
self.client.login(email=self.user_email,
password=self.user_password)
response = self.client.get(self.url)
self.assertEqual(response.status_code, status.HTTP_200_OK)
content = json.loads(response.content)
self.assertIn('extra_2', content)
def test_querydict_in_api_data(self):
# Production data at the time this test was written includes a mix of
# strigified dicts and stringified QueryDicts
self.api_list_item.raw_data = "<QueryDict: {'name': ['Item'], 'country': ['US'], 'address': ['Address'], 'extra_2': ['data']}>"
self.api_list_item.save()
self.client.login(email=self.user_email,
password=self.user_password)
response = self.client.get(self.url)
self.assertEqual(response.status_code, status.HTTP_200_OK)
content = json.loads(response.content)
# We plan to only store raw data as CSV or JSON in the future. Legacy
# data can be ignored
self.assertNotIn('extra_2', content)
src/django/api/views.py
Outdated
'raw_data', flat=True) | ||
single_facility_fields = [] | ||
for single_facility in single_facilities: | ||
fields = list(json.loads(single_facility.replace("'", '"')).keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method of converting a stringified Python dict to a JSON object works most of the time, but there are a handful of rows in the list item table that have escaped single quotes in them (\'
). As of writing this, the following query returns 9 records in production
select raw_data from api_facilitylistitem i join api_source s on i.source_id = s.id where s.source_type ='SINGLE' and raw_data ilike '%\\''%';
We can trigger this issue in development by POSTing this example
{
"country": "China",
"name": "SO MANY SHIRTS",
"address": "No.1000,the third industry park,Guoyuan Town,Nantong 226500.",
"custom_single_quote": "field's data"
}
The fact that the raw data is being saved this way is a bug (#1052), and it is worse than just stringified dicts. We also have stringified QueryDicts
mixed in to our raw_data
fields as well.
Because we are only interested in additional data submitted in the future, I think that for now we can just catch json.decoder.JSONDecodeError
within the loop and ignore it. After changing the API endpoint to properly serialize raw_data
as JSON we can revisit how dealing with the legacy values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a double try/except block to try handling the raw_data
as JSON. If it fails, it jumps into the nested try/except to try changing the single quotes to double quotes and parsing that; failing that, it doesn't include the fields from that object in the response. This still doesn't handle all of our cases of legacy data, but it does allow at least some legacy data to be parsed, although it's not pretty. I'm open to alternative options for certain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested fallback is a good approach. Thanks.
src/django/api/views.py
Outdated
'header', flat=True) | ||
list_fields = [] | ||
for header in list_headers: | ||
list_fields = list_fields + header.split(",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is legal for CSV headers and fields to all be wrapped in quotes
"country","name","address"
"US","Azavea","990 Spring Garden St 5th Floor, Philadelphia, PA 19123"
azavea-with-everything-quoted.csv
The field name extractor as currently written does not remove these quotes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working through these these problematic issues with real world data. I was no longer able to break it and confirmed the test instructions.
Returns a list of all the field names that a contributor has submitted either by API or list, minus our standard fields (country, name, address, ppe_*, lat, lng). Fields are parsed from list headers and from the keys of single-facility API submissions.
33830db
to
3aeb0d9
Compare
Thanks for the review and feedback! |
Overview
Returns a list of all the field names that a contributor has submitted
either by API or list, minus our standard fields (country, name,
address, ppe_*, lat, lng).
Fields are parsed from list headers and from the keys of single-facility
API submissions.
Connects #1313
Demo
Testing Instructions
Checklist
fixup!
commits have been squashed