Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Processing results are a list #195

Merged
merged 3 commits into from Feb 12, 2019
Merged

Conversation

jwalgran
Copy link
Contributor

@jwalgran jwalgran commented Feb 8, 2019

Overview

Converts FacilityListItem.processing_results from an object to a list. processing_results now acts as a historical log of processing events and allows the same processing action to potentially be retried in the case of a failure without losing and previously logged information.

Connects #193

Demo

openapparelregistry=# select jsonb_pretty(processing_results) from api_facilitylistitem where facility_list_id = 2 and row_index = 0;
                                                    jsonb_pretty
--------------------------------------------------------------------------------------------------------------------
 [                                                                                                                 +
     {                                                                                                             +
         "error": false,                                                                                           +
         "action": "parse",                                                                                        +
         "started_at": "2019-02-08 20:28:34.461622",                                                               +
         "finished_at": "2019-02-08 20:28:34.462310"                                                               +
     },                                                                                                            +
     {                                                                                                             +
         "data": {                                                                                                 +
             "status": "OK",                                                                                       +
             "results": [                                                                                          +
                 {                                                                                                 +
                     "types": [                                                                                    +
                         "street_address"                                                                          +
                     ],                                                                                            +
                     "geometry": {                                                                                 +
                         "location": {                                                                             +
                             "lat": 27.966814,                                                                     +
                             "lng": 120.66201                                                                      +
                         },                                                                                        +
                         "viewport": {                                                                             +
                             "northeast": {                                                                        +
                                 "lat": 27.9681629802915,                                                          +
                                 "lng": 120.6633589802915                                                          +
                             },                                                                                    +
                             "southwest": {                                                                        +
                                 "lat": 27.9654650197085,                                                          +
                                 "lng": 120.6606610197085                                                          +
                             }                                                                                     +
                         },                                                                                        +
                         "location_type": "ROOFTOP"                                                                +
                     },                                                                                            +
                     "place_id": "ChIJ52tPIaZjTzQRPPPUsIeA2pE",                                                    +
                     "plus_code": {                                                                                +
                         "global_code": "7QV2XM86+PR",                                                             +
                         "compound_code": "XM86+PR Ouhai, Wenzhou, Zhejiang, China"                                +
                     },                                                                                            +
                     "partial_match": true,                                                                        +
                     "formatted_address": "8 Shuangbao W Rd, Ouhai Qu, Wenzhou Shi, Zhejiang Sheng, China, 325000",+
                     "address_components": [                                                                       +
                         {                                                                                         +
                             "types": [                                                                            +
                                 "street_number"                                                                   +
                             ],                                                                                    +
                             "long_name": "8",                                                                     +
                             "short_name": "8"                                                                     +
                         },                                                                                        +
                         {                                                                                         +
                             "types": [                                                                            +
                                 "route"                                                                           +
                             ],                                                                                    +
                             "long_name": "Shuangbao West Road",                                                   +
                             "short_name": "Shuangbao W Rd"                                                        +
                         },                                                                                        +
                         {                                                                                         +
                             "types": [                                                                            +
                                 "political",                                                                      +
                                 "sublocality",                                                                    +
                                 "sublocality_level_1"                                                             +
                             ],                                                                                    +
                             "long_name": "Ouhai Qu",                                                              +
                             "short_name": "Ouhai Qu"                                                              +
                         },                                                                                        +
                         {                                                                                         +
                             "types": [                                                                            +
                                 "locality",                                                                       +
                                 "political"                                                                       +
                             ],                                                                                    +
                             "long_name": "Wenzhou Shi",                                                           +
                             "short_name": "Wenzhou Shi"                                                           +
                         },                                                                                        +
                         {                                                                                         +
                             "types": [                                                                            +
                                 "administrative_area_level_1",                                                    +
                                 "political"                                                                       +
                             ],                                                                                    +
                             "long_name": "Zhejiang Sheng",                                                        +
                             "short_name": "Zhejiang Sheng"                                                        +
                         },                                                                                        +
                         {                                                                                         +
                             "types": [                                                                            +
                                 "country",                                                                        +
                                 "political"                                                                       +
                             ],                                                                                    +
                             "long_name": "China",                                                                 +
                             "short_name": "CN"                                                                    +
                         },                                                                                        +
                         {                                                                                         +
                             "types": [                                                                            +
                                 "postal_code"                                                                     +
                             ],                                                                                    +
                             "long_name": "325000",                                                                +
                             "short_name": "325000"                                                                +
                         }                                                                                         +
                     ]                                                                                             +
                 }                                                                                                 +
             ]                                                                                                     +
         },                                                                                                        +
         "error": false,                                                                                           +
         "action": "geocode",                                                                                      +
         "started_at": "2019-02-08 20:28:47.365138",                                                               +
         "finished_at": "2019-02-08 20:28:48.454136"                                                               +
     },                                                                                                            +
     {                                                                                                             +
         "error": false,                                                                                           +
         "action": "match",                                                                                        +
         "started_at": "2019-02-08 20:30:19.617584",                                                               +
         "finished_at": "2019-02-08 20:30:19.618835"                                                               +
     }                                                                                                             +
 ]
(1 row)

Testing Instructions

  • Verify that the unit tests pass.
  • Reset the database.
    • ./scripts/manage reset_schema --noinput
    • ./scripts/manage migrate
    • ./scripts/manage loadfixtures
  • Process list 2.
    • ./scripts/manage batch_process -a parse -l 2
    • ./scripts/manage batch_process -a geocode -l 2
    • ./scripts/manage batch_process -a match -l 2
  • Verify that processing event results are stored as an array
    • ./scripts/manage dbshell
    • select jsonb_pretty(processing_results) from api_facilitylistitem where facility_list_id = 2 and row_index = 0;

In this commit we convert `FacilityListItem.processing_results` from an object
to a list. `processing_results` now acts as a historical log of processing
events and allows the same processing action to potentially be retried in the
case of a failure without losing and previously logged information.
@jwalgran jwalgran changed the title WIP Processing results are a list Processing results are a list Feb 8, 2019
@rajadain
Copy link
Contributor

rajadain commented Feb 8, 2019

I did not know about jsonb_pretty! That's really neat!

@rajadain
Copy link
Contributor

rajadain commented Feb 11, 2019

Processed List 2, then inspected the database for errors and found a log of events in the processing_results field:

SELECT jsonb_pretty(processing_results)
FROM api_facilitylistitem
WHERE facility_list_id = 2
  AND status = 'ERROR'
LIMIT 1;
[
  {
    "error": false,
    "action": "parse",
    "started_at": "2019-02-11 20:13:33.446086",
    "finished_at": "2019-02-11 20:13:33.447321"
  },
  {
    "error": true,
    "trace": "Traceback (most recent call last):\n  File \"/usr/local/src/api/processing.py\", line 72, in geocode_facility_list_item\n    data = geocode_address(item.address, item.country_code)\n  File \"/usr/local/src/api/geocoding.py\", line 42, in geocode_address\n    raise ValueError(\"No results were found\")\nValueError: No results were found\n",
    "action": "geocode",
    "message": "No results were found",
    "started_at": "2019-02-11 20:14:25.191696",
    "finished_at": "2019-02-11 20:14:25.285854"
  }
]

where the traceback can be formatted to look like:

Traceback (most recent call last):
  File "/usr/local/src/api/processing.py", line 72, in geocode_facility_list_item
    data = geocode_address(item.address, item.country_code)
  File "/usr/local/src/api/geocoding.py", line 42, in geocode_address
    raise ValueError("No results were found")
    ValueError: No results were found

@rajadain
Copy link
Contributor

rajadain commented Feb 11, 2019

While parse and geocode ran successfully, running match gives the following strange errors:

Exception ignored in: <function CPointerBase.__del__ at 0x7f51c1018bf8>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/django/contrib/gis/ptr.py", line 36, in __del__
  File "/usr/local/lib/python3.7/site-packages/django/contrib/gis/geos/libgeos.py", line 155, in __call__
  File "/usr/local/lib/python3.7/site-packages/django/utils/functional.py", line 36, in __get__
  File "/usr/local/lib/python3.7/site-packages/django/contrib/gis/geos/libgeos.py", line 159, in func
ImportError: sys.meta_path is None, Python is likely shutting down

I get one of these for every item in the list.

If I run match from develop (with the results processed on this branch), I get:

Traceback (most recent call last):
  File "/usr/local/src/api/processing.py", line 124, in match_facility_list_item
    'finished_at': str(datetime.utcnow()),
TypeError: list indices must be integers or slices, not str

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "manage.py", line 15, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.7/site-packages/django/core/management/__init__.py", line 371, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.7/site-packages/django/core/management/__init__.py", line 365, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python3.7/site-packages/django/core/management/base.py", line 288, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python3.7/site-packages/django/core/management/base.py", line 335, in execute
    output = self.handle(*args, **options)
  File "/usr/local/src/api/management/commands/batch_process.py", line 79, in handle
    facility, match = process(item)
  File "/usr/local/src/api/processing.py", line 134, in match_facility_list_item
    'finished_at': str(datetime.utcnow()),
TypeError: list indices must be integers or slices, not str

@rajadain
Copy link
Contributor

Can't find a clear code change that would indicate why this error is happening. Quick googling suggests it being a memory leak / garbage collection / initialization order issue.

@rajadain
Copy link
Contributor

This seems related: https://code.djangoproject.com/ticket/29543. Was fixed in django/django#10130, but isn't in a release yet, only pre-releases 2.2b1 and 2.2a1.

@jwalgran
Copy link
Contributor Author

Thanks for doing some research into that error on match. I'm skeptical that it isn't my fault, since both stack traces end in my code, although the traces are pretty confusing. I ran into this exception when I was converting the code and had a lingering item like

item.processing_results[ProcessingAction.GEOCODE]

Which was clearly tring to access what is now an array as if it was an object. That's not what is happening in the stack traces you posted, though. I'll do some more detective work.

@jwalgran
Copy link
Contributor Author

I could not reproduce the exceptions you ran into

vagrant@vagrant:/vagrant$ ./scripts/manage batch_process -a match -l 5
Starting vagrant_database_1 ... done
Value Error: Items to be matched must be in the GEOCODED status
match: 63 successes
match: 1 failures

until... I reset my database. Then:

vagrant@vagrant:/vagrant$ ./scripts/manage batch_process -a parse -l 5
Starting vagrant_database_1 ... done
Traceback (most recent call last):
  File "/usr/local/src/api/processing.py", line 50, in parse_facility_list_item
    'finished_at': str(datetime.utcnow()),
TypeError: list indices must be integers or slices, not str

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "manage.py", line 15, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.7/site-packages/django/core/management/__init__.py", line 371, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.7/site-packages/django/core/management/__init__.py", line 365, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python3.7/site-packages/django/core/management/base.py", line 288, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python3.7/site-packages/django/core/management/base.py", line 335, in execute
    output = self.handle(*args, **options)
  File "/usr/local/src/api/management/commands/batch_process.py", line 87, in handle
    process(item)
  File "/usr/local/src/api/processing.py", line 59, in parse_facility_list_item
    'finished_at': str(datetime.utcnow()),
TypeError: list indices must be integers or slices, not str

Something with the fixture data is breaking an expectation in the code. I should be able to track it down.

@jwalgran
Copy link
Contributor Author

I was able to reproduce the ImportError: sys.meta_path is None, Python is likely shutting down on develop.

It turns out that our inconsistent fixture data led us into this problem. The fixtures already have some Facility rows created from FacilityListItems however those FacilityListItems have their state incorrectly set to UPLOADED rather than AUTOMATIC. Reprocessing them causes an unhandled IntegrityError. That exception is being masked by the ImportError: sys.meta_path is None, Python is likely shutting down bug, which has been fixed in the Django develop branch but not yet released.

My plan is to just make the fixtures consistent to avoid the integrity error and wait on a new Django release for the fix to the CPointer bug.

We were previously creating a `Facility` for every `FacilityListItem` but we
were only creating a `FacilityMatch` for a random third of the items and not
setting the status of the `FacilityListItem`s correctly. In this commit we have
updated the `makefixtures` command and the affected fixtures so that the data
is consistent.
@jwalgran
Copy link
Contributor Author

I pushed up a revision to the fixtures and the fixture generator that make the loaded data consistent and avoids the unhandled IntegrityErrors. After resetting the database and running through the tasks you will see many more "Items to be matched must be in the __ status" messages since we are now correctly setting the status to half of the items to MATCHED from the start.

Copy link
Contributor

@rajadain rajadain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 tested. Seeing more Items to be matched must be in the ??? status errors as predicted, but could run it all the way through to match successfully.

@rajadain rajadain assigned jwalgran and unassigned rajadain Feb 12, 2019
@rajadain
Copy link
Contributor

Great work!

@jwalgran
Copy link
Contributor Author

Thanks for finding and puzzling through that exception with me.

@jwalgran jwalgran merged commit a6e6896 into develop Feb 12, 2019
@jwalgran jwalgran deleted the jcw/processing-results-are-a-list branch February 12, 2019 20:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants