Skip to content
This repository has been archived by the owner on Dec 3, 2020. It is now read-only.

Improve price string parsing when there is a single price on the page #79

Closed
javaun opened this issue Aug 30, 2018 · 8 comments
Closed

Comments

@javaun
Copy link

javaun commented Aug 30, 2018

Edit (bdanforth): For a detailed explanation of the root causes of these failures, see this comment below.

TL;DR: These symptoms are primarily due to a brittle utility function that is used to reformat the price strings extracted from a product page that does not handle the case when parsing fails. Other symptoms covered by other issues include: how to handle pages with more than one price (#86 ) and failed extraction (#95 ).

See also #42.


Extension v0.1.0

Fuji XT2 Camera on Amazon:

STR:

  • New profile on Fx Nightly
  • While page is loading, I initially see the "currently tracking" item in the dropdown (a backpack)
  • When the page is done loading, I close then reopen the panel and it's blank

URL
https://www.amazon.com/Fujifilm-X-T2-Mirrorless-F2-8-4-0-Lens/dp/B01I3LNQ6M/ref=sr_1_2?ie=UTF8&qid=1535594119&sr=8-2&keywords=fuji+xt2+camera

I'm guessing extraction failed?

  1. Initial loading, I correctly see current items in the drop down
    screen shot 2018-08-29 at 10 01 17 pm

  2. When page is done loading, I close/reopen panel and it's now blank

screen shot 2018-08-29 at 9 59 43 pm

@javaun
Copy link
Author

javaun commented Aug 30, 2018

I copied the console from the Fuji page. I'm guessing it's just the first bit on ExtractedProductCard
https://www.amazon.com/Fujifilm-X-T2-Mirrorless-F2-8-4-0-Lens/dp/B01I3LNQ6M/ref=sr_1_2?ie=UTF8&qid=1535594119&sr=8-2&keywords=fuji+xt2+camera

The above error occurred in the <ExtractedProductCard> component:
    in ExtractedProductCard (created by Connect(ExtractedProductCard))
    in Connect(ExtractedProductCard) (created by EmptyOnboarding)
    in div (created by EmptyOnboarding)
    in EmptyOnboarding (created by BrowserActionApp)
    in BrowserActionApp (created by Connect(BrowserActionApp))
    in Connect(BrowserActionApp)
    in Provider

Consider adding an error boundary to your tree to customize error handling behavior.
Visit https://fb.me/react-error-boundaries to learn more about error boundaries. react-dom.development.js:14226
logCapturedError
react-dom.development.js:14226
logError
react-dom.development.js:14265
createRootErrorUpdate/update.callback
react-dom.development.js:14918
callCallback
react-dom.development.js:10878
commitUpdateQueue
react-dom.development.js:10922
commitLifeCycles
react-dom.development.js:14396
commitAllLifeCycles
react-dom.development.js:15462
callCallback
react-dom.development.js:100
invokeGuardedCallbackDev
react-dom.development.js:138
invokeGuardedCallback
react-dom.development.js:187
commitRoot
react-dom.development.js:15603
completeRoot
react-dom.development.js:16618
performWorkOnRoot
react-dom.development.js:16563
performWork
react-dom.development.js:16482
performSyncWork
react-dom.development.js:16454
requestWork
react-dom.development.js:16354
scheduleWork$1
react-dom.development.js:16218
scheduleRootUpdate
react-dom.development.js:16785
updateContainerAtExpirationTime
react-dom.development.js:16812
updateContainer
react-dom.development.js:16839
./node_modules/react-dom/cjs/react-dom.development.js/ReactRoot.prototype.render
react-dom.development.js:17122
legacyRenderSubtreeIntoContainer/<
react-dom.development.js:17262
unbatchedUpdates
react-dom.development.js:16679
legacyRenderSubtreeIntoContainer
react-dom.development.js:17258
render
react-dom.development.js:17317
./src/browser_action/index.jsx
index.jsx:22
__webpack_require__
bootstrap:19
<anonymous>
bootstrap:83
<anonymous>
moz-extension://c4fba691-901c-ab41-bb9a-05bb7f0f158a/browser_action.bundle.js:1:11
TypeError: priceMatch is null, can't access property 1 of it[Learn More] utils.js:54:8
priceStringToAmount
utils.js:54:8
priceFromExtracted
prices.js:327:12
priceWrapperFromExtracted
prices.js:338:26
render
ProductCard.jsx:220:18
render self-hosted:978:17 finishClassComponent
react-dom.development.js:13193
updateClassComponent
react-dom.development.js:13155
beginWork
react-dom.development.js:13824
performUnitOfWork
react-dom.development.js:15863
workLoop
react-dom.development.js:15902
callCallback
react-dom.development.js:100
invokeGuardedCallbackDev
react-dom.development.js:138
invokeGuardedCallback
react-dom.development.js:187
replayUnitOfWork
react-dom.development.js:15310
renderRoot
react-dom.development.js:15962
performWorkOnRoot
react-dom.development.js:16560
performWork
react-dom.development.js:16482
performSyncWork
react-dom.development.js:16454
requestWork
react-dom.development.js:16354
scheduleWork$1
react-dom.development.js:16218
scheduleRootUpdate
react-dom.development.js:16785
updateContainerAtExpirationTime
react-dom.development.js:16812
updateContainer
react-dom.development.js:16839
./node_modules/react-dom/cjs/react-dom.development.js/ReactRoot.prototype.render
react-dom.development.js:17122
legacyRenderSubtreeIntoContainer/<
react-dom.development.js:17262
unbatchedUpdates
react-dom.development.js:16679
legacyRenderSubtreeIntoContainer
react-dom.development.js:17258
render
react-dom.development.js:17317
./src/browser_action/index.jsx
index.jsx:22
__webpack_require__
bootstrap:19
<anonymous>
bootstrap:83
<anonymous>
moz-extension://c4fba691-901c-ab41-bb9a-05bb7f0f158a/browser_action.bundle.js:1:11
Webconsole context has changed

@javaun
Copy link
Author

javaun commented Aug 30, 2018

I broke it again on this Gap.com page (Gap wasn't in our list of 5 top sites but I was trying different sites anyway...)

https://www.gap.com/browse/product.do?pid=357386022&rrec=true&mlink=5050,12413545,HP_gaphome5_rr_2&clink=12413545

The above error occurred in the <ExtractedProductCard> component:
    in ExtractedProductCard (created by Connect(ExtractedProductCard))
    in Connect(ExtractedProductCard) (created by EmptyOnboarding)
    in div (created by EmptyOnboarding)
    in EmptyOnboarding (created by BrowserActionApp)
    in BrowserActionApp (created by Connect(BrowserActionApp))
    in Connect(BrowserActionApp)
    in Provider

Consider adding an error boundary to your tree to customize error handling behavior.
Visit https://fb.me/react-error-boundaries to learn more about error boundaries. react-dom.development.js:14226
logCapturedError
react-dom.development.js:14226
logError
react-dom.development.js:14265
createRootErrorUpdate/update.callback
react-dom.development.js:14918
callCallback
react-dom.development.js:10878
commitUpdateQueue
react-dom.development.js:10922
commitLifeCycles
react-dom.development.js:14396
commitAllLifeCycles
react-dom.development.js:15462
callCallback
react-dom.development.js:100
invokeGuardedCallbackDev
react-dom.development.js:138
invokeGuardedCallback
react-dom.development.js:187
commitRoot
react-dom.development.js:15603
completeRoot
react-dom.development.js:16618
performWorkOnRoot
react-dom.development.js:16563
performWork
react-dom.development.js:16482
performSyncWork
react-dom.development.js:16454
requestWork
react-dom.development.js:16354
scheduleWork$1
react-dom.development.js:16218
scheduleRootUpdate
react-dom.development.js:16785
updateContainerAtExpirationTime
react-dom.development.js:16812
updateContainer
react-dom.development.js:16839
./node_modules/react-dom/cjs/react-dom.development.js/ReactRoot.prototype.render
react-dom.development.js:17122
legacyRenderSubtreeIntoContainer/<
react-dom.development.js:17262
unbatchedUpdates
react-dom.development.js:16679
legacyRenderSubtreeIntoContainer
react-dom.development.js:17258
render
react-dom.development.js:17317
./src/browser_action/index.jsx
index.jsx:22
__webpack_require__
bootstrap:19
<anonymous>
bootstrap:83
<anonymous>
moz-extension://c4fba691-901c-ab41-bb9a-05bb7f0f158a/browser_action.bundle.js:1:11
TypeError: priceMatch is null, can't access property 1 of it[Learn More]
utils.js:54:8
priceStringToAmount
utils.js:54:8
priceFromExtracted
prices.js:327:12
priceWrapperFromExtracted
prices.js:338:26
render
ProductCard.jsx:220:18
render self-hosted:978:17 finishClassComponent
react-dom.development.js:13193
updateClassComponent
react-dom.development.js:13155
beginWork
react-dom.development.js:13824
performUnitOfWork
react-dom.development.js:15863
workLoop
react-dom.development.js:15902
callCallback
react-dom.development.js:100
invokeGuardedCallbackDev
react-dom.development.js:138
invokeGuardedCallback
react-dom.development.js:187
replayUnitOfWork
react-dom.development.js:15310
renderRoot
react-dom.development.js:15962
performWorkOnRoot
react-dom.development.js:16560
performWork
react-dom.development.js:16482
performSyncWork
react-dom.development.js:16454
requestWork
react-dom.development.js:16354
scheduleWork$1
react-dom.development.js:16218
scheduleRootUpdate
react-dom.development.js:16785
updateContainerAtExpirationTime
react-dom.development.js:16812
updateContainer
react-dom.development.js:16839
./node_modules/react-dom/cjs/react-dom.development.js/ReactRoot.prototype.render
react-dom.development.js:17122
legacyRenderSubtreeIntoContainer/<
react-dom.development.js:17262
unbatchedUpdates
react-dom.development.js:16679
legacyRenderSubtreeIntoContainer
react-dom.development.js:17258
render
react-dom.development.js:17317
./src/browser_action/index.jsx
index.jsx:22
__webpack_require__
bootstrap:19
<anonymous>
bootstrap:83
<anonymous>
moz-extension://c4fba691-901c-ab41-bb9a-05bb7f0f158a/browser_action.bundle.js:1:11

​```

@javaun
Copy link
Author

javaun commented Aug 30, 2018

It also breaks on amazon.co.uk
https://www.amazon.co.uk/dp/B07BRPYB15/ref=cm_gf_aap_ikt_d_p0_qd0______________________392EW6259VHRSH79GZ3N

The above error occurred in the <ExtractedProductCard> component:
    in ExtractedProductCard (created by Connect(ExtractedProductCard))
    in Connect(ExtractedProductCard) (created by EmptyOnboarding)
    in div (created by EmptyOnboarding)
    in EmptyOnboarding (created by BrowserActionApp)
    in BrowserActionApp (created by Connect(BrowserActionApp))
    in Connect(BrowserActionApp)
    in Provider

Consider adding an error boundary to your tree to customize error handling behavior.
Visit https://fb.me/react-error-boundaries to learn more about error boundaries. react-dom.development.js:14226
TypeError: priceMatch is null, can't access property 1 of it[Learn More] utils.js:54:8

​```

@javaun
Copy link
Author

javaun commented Aug 30, 2018

The doorhanger is also broken on Macys.com (a supported site for MVP)
https://www.macys.com/shop/product/barrett-4-pc.-sheet-sets-1400-thread-count-cotton-blend?ID=6244044&CategoryID=9915#fn=SHEET_TYPE%3DSheet%20Sets%26SIZE%3D%26sp%3D1%26spc%3D172%26ruleId%3D78%7CBOOST%20ATTRIBUTE%7CBOOST%20SAVED%20SET%26searchPass%3DmatchNone%26slotId%3D3


The above error occurred in the <ExtractedProductCard> component:
    in ExtractedProductCard (created by Connect(ExtractedProductCard))
    in Connect(ExtractedProductCard) (created by EmptyOnboarding)
    in div (created by EmptyOnboarding)
    in EmptyOnboarding (created by BrowserActionApp)
    in BrowserActionApp (created by Connect(BrowserActionApp))
    in Connect(BrowserActionApp)
    in Provider

Consider adding an error boundary to your tree to customize error handling behavior.
Visit https://fb.me/react-error-boundaries to learn more about error boundaries. react-dom.development.js:14226
logCapturedError
react-dom.development.js:14226
logError
react-dom.development.js:14265
createRootErrorUpdate/update.callback
react-dom.development.js:14918
callCallback
react-dom.development.js:10878
commitUpdateQueue
react-dom.development.js:10922
commitLifeCycles
react-dom.development.js:14396
commitAllLifeCycles
react-dom.development.js:15462
callCallback
react-dom.development.js:100
invokeGuardedCallbackDev
react-dom.development.js:138
invokeGuardedCallback
react-dom.development.js:187
commitRoot
react-dom.development.js:15603
completeRoot
react-dom.development.js:16618
performWorkOnRoot
react-dom.development.js:16563
performWork
react-dom.development.js:16482
performSyncWork
react-dom.development.js:16454
requestWork
react-dom.development.js:16354
scheduleWork$1
react-dom.development.js:16218
scheduleRootUpdate
react-dom.development.js:16785
updateContainerAtExpirationTime
react-dom.development.js:16812
updateContainer
react-dom.development.js:16839
./node_modules/react-dom/cjs/react-dom.development.js/ReactRoot.prototype.render
react-dom.development.js:17122
legacyRenderSubtreeIntoContainer/<
react-dom.development.js:17262
unbatchedUpdates
react-dom.development.js:16679
legacyRenderSubtreeIntoContainer
react-dom.development.js:17258
render
react-dom.development.js:17317
./src/browser_action/index.jsx
index.jsx:22
__webpack_require__
bootstrap:19
<anonymous>
bootstrap:83
<anonymous>
moz-extension://c4fba691-901c-ab41-bb9a-05bb7f0f158a/browser_action.bundle.js:1:11
TypeError: priceMatch is null, can't access property 1 of it[Learn More]
utils.js:54:8
priceStringToAmount
utils.js:54:8
priceFromExtracted
prices.js:327:12
priceWrapperFromExtracted
prices.js:338:26
render
ProductCard.jsx:220:18
render self-hosted:978:17 finishClassComponent
react-dom.development.js:13193
updateClassComponent
react-dom.development.js:13155
beginWork
react-dom.development.js:13824
performUnitOfWork
react-dom.development.js:15863
workLoop
react-dom.development.js:15902
callCallback
react-dom.development.js:100
invokeGuardedCallbackDev
react-dom.development.js:138
invokeGuardedCallback
react-dom.development.js:187
replayUnitOfWork
react-dom.development.js:15310
renderRoot
react-dom.development.js:15962
performWorkOnRoot
react-dom.development.js:16560
performWork
react-dom.development.js:16482
performSyncWork
react-dom.development.js:16454
requestWork
react-dom.development.js:16354
scheduleWork$1
react-dom.development.js:16218
scheduleRootUpdate
react-dom.development.js:16785
updateContainerAtExpirationTime
react-dom.development.js:16812
updateContainer
react-dom.development.js:16839
./node_modules/react-dom/cjs/react-dom.development.js/ReactRoot.prototype.render
react-dom.development.js:17122
legacyRenderSubtreeIntoContainer/<
react-dom.development.js:17262
unbatchedUpdates
react-dom.development.js:16679
legacyRenderSubtreeIntoContainer
react-dom.development.js:17258
render
react-dom.development.js:17317
./src/browser_action/index.jsx
index.jsx:22
__webpack_require__
bootstrap:19
<anonymous>
bootstrap:83
<anonymous>
moz-extension://c4fba691-901c-ab41-bb9a-05bb7f0f158a/browser_action.bundle.js:1:11

​

@Osmose Osmose added the demo label Aug 30, 2018
@biancadanforth
Copy link
Collaborator

Thank you Javaun! The pages you reference actually point primarily at two related but separate issues:

  1. Issue Improve price string parsing when there is a single price on the page #79 (this issue): Handle the case where there is only one price on a page by improving price string parsing (via Fathom, a parsing library, or with our own methods). If string parsing fails, fallback to the same UI as if extraction had failed (just show the tracked products list or empty onboarding screen in the popup). Sites you reference that match this case:
  • This Amazon page
    • Has a "," in the price string.
  • This Amazon UK page
    • Price string parsing after extraction failed because of the British pound sign. Since we are only supporting EN-US sites for the MVP, I think the follow-up here would be covered above by the fallback above when parsing fails.
  1. Issue What should the extension do when there is not a single price on a product page? #86 : Handle the case where there is more than one price on a page (for example: a product has different prices for different sizes, colors, etc.). Sites you reference that match this case:
  • This Walmart page
    • The price changes based on the color selected

Other pages you cited that don't fall into these categories:

  • Gap page
    • Extraction failed here; but it won't be investigated for now, since it's not one of our top 5 sites.
  • Macy's page
    • This page is a class of products page that contains no single price or range of prices above the fold. I would not expect extraction to work here; I would expect it to work on this product page further down after selecting a size, but it didn't. I filed Issue Macy's: Page not recognized correctly #95 for this page.

I'm going to edit the title of this issue to be more descriptive of situation 1, now that the digging is done.

@biancadanforth biancadanforth changed the title Door Hanger is blank (extraction issue?) Improve price string parsing when there is a single price on the page Sep 10, 2018
@biancadanforth
Copy link
Collaborator

After I renamed the issue, I realized this is a duplicate of an issue Osmose filed much earlier: #42 . Closing in favor of that one.

biancadanforth added a commit that referenced this issue Sep 12, 2018
* Update how we pull the price string from extracted Fathom price elements to provide main and subunits (e.g. dollars and cents) if available.
* Added price string cleaning methods to remove extra characters (like commas) that were causing price parsing to fail.
* Handle case when price string parsing still fails after cleaning by checking in the background script that the price string is formatted correctly before rendering the browserAction popup.
  * This will guarantee we never see the “blank panel” reported in #79/#88.

Price element innerText strings now supported as a result of these changes:
* "$1327 /each" ([Home Depot example page](https://www.homedepot.com/p/KitchenAid-Classic-4-5-Qt-Tilt-Head-White-Stand-Mixer-K45SSWH/202546032))
* "$1,049.00" ([Amazon example page](https://www.amazon.com/Fujifilm-X-T2-Mirrorless-F2-8-4-0-Lens/dp/B01I3LNQ6M/ref=sr_1_2?ie=UTF8&qid=1535594119&sr=8-2&keywords=fuji+xt2+camera))
* "US $789.99" ([Ebay example page](https://www.ebay.com/itm/Dell-Inspiron-7570-15-6-Touch-Laptop-i7-8550U-1-8GHz-8GB-1TB-NVIDIA-940MX-W10/263827294291))
* "$4.99+" ([Etsy example page](https://www.etsy.com/listing/555504975/frankenstein-2-custom-stencil?ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=&ref=sr_gallery-1-13))

Note: This does not handle the case where there is more than one price for the product page (e.g. if we see a range of prices such as "$19.92 - $38.00" or if the price changes based on size/color, etc.); that’s handled by Issue #86.
biancadanforth added a commit that referenced this issue Sep 12, 2018
* Update how we pull the price string from extracted Fathom price elements to provide main and subunits (e.g. dollars and cents) if available.
* Add price string cleaning methods to remove extra characters (like commas) that were causing price parsing to fail.
* Handle case when price string parsing still fails after cleaning by checking in the background script that the price string is formatted correctly before rendering the browserAction popup.
  * This will guarantee we never see the “blank panel” reported in #79 and #88.

Price element innerText strings now supported as a result of these changes:
* "$1327 /each" ([Home Depot example page](https://www.homedepot.com/p/KitchenAid-Classic-4-5-Qt-Tilt-Head-White-Stand-Mixer-K45SSWH/202546032))
* "$1,049.00" ([Amazon example page](https://www.amazon.com/Fujifilm-X-T2-Mirrorless-F2-8-4-0-Lens/dp/B01I3LNQ6M/ref=sr_1_2?ie=UTF8&qid=1535594119&sr=8-2&keywords=fuji+xt2+camera))
* "US $789.99" ([Ebay example page](https://www.ebay.com/itm/Dell-Inspiron-7570-15-6-Touch-Laptop-i7-8550U-1-8GHz-8GB-1TB-NVIDIA-940MX-W10/263827294291))
* "$4.99+" ([Etsy example page](https://www.etsy.com/listing/555504975/frankenstein-2-custom-stencil?ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=&ref=sr_gallery-1-13))

Note: This does not handle the case where there is more than one price for the product page (e.g. if we see a range of prices such as "$19.92 - $38.00" or if the price changes based on size/color, etc.); that’s handled by Issue #86.
biancadanforth added a commit that referenced this issue Sep 18, 2018
* Update how we pull the price string from extracted Fathom price elements to provide main and subunits (e.g. dollars and cents) if available.
* Add price string cleaning methods to remove extra characters (like commas) that were causing price parsing to fail.
* Handle case when price string parsing still fails after cleaning by checking in the background script that the price string is formatted correctly before rendering the browserAction popup.
  * This will guarantee we never see the “blank panel” reported in #79 and #88.

Price element innerText strings now supported as a result of these changes:
* "$1327 /each" ([Home Depot example page](https://www.homedepot.com/p/KitchenAid-Classic-4-5-Qt-Tilt-Head-White-Stand-Mixer-K45SSWH/202546032))
* "$1,049.00" ([Amazon example page](https://www.amazon.com/Fujifilm-X-T2-Mirrorless-F2-8-4-0-Lens/dp/B01I3LNQ6M/ref=sr_1_2?ie=UTF8&qid=1535594119&sr=8-2&keywords=fuji+xt2+camera))
* "US $789.99" ([Ebay example page](https://www.ebay.com/itm/Dell-Inspiron-7570-15-6-Touch-Laptop-i7-8550U-1-8GHz-8GB-1TB-NVIDIA-940MX-W10/263827294291))
* "$4.99+" ([Etsy example page](https://www.etsy.com/listing/555504975/frankenstein-2-custom-stencil?ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=&ref=sr_gallery-1-13))

Note: This does not handle the case where there is more than one price for the product page (e.g. if we see a range of prices such as "$19.92 - $38.00" or if the price changes based on size/color, etc.); that’s handled by Issue #86.
biancadanforth added a commit that referenced this issue Sep 20, 2018
* Update how we pull the price string from extracted Fathom price elements to provide main and subunits (e.g. dollars and cents) if available.
* Add price string cleaning methods to remove extra characters (like commas) that were causing price parsing to fail.
* Handle case when price string parsing still fails after cleaning by checking in the background script that the price string is formatted correctly before rendering the browserAction popup.
  * This will guarantee we never see the “blank panel” reported in #79 and #88.

Price element innerText strings now supported as a result of these changes:
* "$1327 /each" ([Home Depot example page](https://www.homedepot.com/p/KitchenAid-Classic-4-5-Qt-Tilt-Head-White-Stand-Mixer-K45SSWH/202546032))
* "$1,049.00" ([Amazon example page](https://www.amazon.com/Fujifilm-X-T2-Mirrorless-F2-8-4-0-Lens/dp/B01I3LNQ6M/ref=sr_1_2?ie=UTF8&qid=1535594119&sr=8-2&keywords=fuji+xt2+camera))
* "US $789.99" ([Ebay example page](https://www.ebay.com/itm/Dell-Inspiron-7570-15-6-Touch-Laptop-i7-8550U-1-8GHz-8GB-1TB-NVIDIA-940MX-W10/263827294291))
* "$4.99+" ([Etsy example page](https://www.etsy.com/listing/555504975/frankenstein-2-custom-stencil?ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=&ref=sr_gallery-1-13))

Note: This does not handle the case where there is more than one price for the product page (e.g. if we see a range of prices such as "$19.92 - $38.00" or if the price changes based on size/color, etc.); that’s handled by Issue #86.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants