Threw an error on ingestion #21

Truncated · 2024-05-08T22:03:09Z

1715205555277 | DEBUG | onValidate called

Caller: HTMLDivElement.<anonymous> (app://obsidian.md/app.js:1:2170951)

[
  {
    "enabled": true,
    "custom": false,
    "_key": "link",
    "_idx": 0,
    "id": "link",
    "metaFields": [
      "url",
      "og:url",
      "parsely-link",
      "twitter:url"
    ],
    "defaultIdx": 0,
    "defaultKey": "link",
    "description": "Page URL provided or a permalink discovered in metadata."
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "byline",
    "_idx": 1,
    "id": "byline",
    "metaFields": [
      "author",
      "article:author",
      "parsely-author",
      "cXenseParse:author"
    ],
    "defaultIdx": 1,
    "defaultKey": "byline",
    "description": "Name of the primary author or the first author detected."
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "site",
    "_idx": 2,
    "id": "siteName",
    "metaFields": [
      "og:site_name",
      "page.content.source",
      "application-name",
      "apple-mobile-web-app-title",
      "twitter:site"
    ],
    "defaultIdx": 2,
    "defaultKey": "site",
    "description": "Website or publication name."
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "date",
    "_idx": 3,
    "_format": "d|YYYY-MM-DDTHH:mm",
    "id": "publishedTime",
    "metaFields": [
      "article:published_time",
      "parsely-pub-date",
      "datePublished",
      "article.published"
    ],
    "defaultIdx": 3,
    "defaultKey": "date",
    "description": "Date/time that the page was initially published.",
    "defaultFormat": "d|YYYY-MM-DDTHH:mm"
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "updated",
    "_idx": 4,
    "_format": "d|YYYY-MM-DDTHH:mm",
    "id": "modifiedTime",
    "metaFields": [
      "article:modified_time",
      "dateModified",
      "dateLastPubbed"
    ],
    "defaultIdx": 4,
    "defaultKey": "updated",
    "description": "Date/time that the page was last modified, if available.",
    "defaultFormat": "d|YYYY-MM-DDTHH:mm"
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "type",
    "_idx": 5,
    "id": "type",
    "metaFields": [
      "og:type",
      "parsely-type",
      "medium",
      "page.content.type"
    ],
    "defaultIdx": 5,
    "defaultKey": "type",
    "description": "Type of publication, eg: \"page\", \"post\", \"article\"."
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "excerpt",
    "_idx": 6,
    "id": "excerpt",
    "metaFields": [
      "description",
      "og:description",
      "twitter:description"
    ],
    "defaultIdx": 6,
    "defaultKey": "excerpt",
    "description": "Often used for subtitles, excerpts, descriptions, and abstracts."
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "twitter",
    "_idx": 7,
    "_format": "s|https://twitter.com/{s}",
    "id": "twitter",
    "metaFields": [
      "twitter:creator",
      "twitter:site"
    ],
    "defaultIdx": 7,
    "defaultKey": "twitter",
    "description": "Twitter/X link for the author or site.",
    "defaultFormat": "s|https://twitter.com/{s}"
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "tags",
    "_idx": 8,
    "_format": "S|{prefix}/{tag}",
    "id": "tags",
    "metaFields": [
      "tags",
      "keywords",
      "article:tag",
      "parsely-tags",
      "news_keywords"
    ],
    "defaultIdx": 8,
    "defaultKey": "tags",
    "description": "Tags and keywords present in the page's metadata.",
    "defaultFormat": "S|{prefix}/{tag}"
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "onion",
    "_idx": 9,
    "id": "onion",
    "metaFields": [
      "onion-location"
    ],
    "defaultIdx": 9,
    "defaultKey": "onion",
    "description": "Link to a mirror of the content on Tor."
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "slurped",
    "_idx": 10,
    "_format": "d|YYYY-MM-DDTHH:mm",
    "id": "slurped",
    "defaultIdx": 10,
    "defaultKey": "slurped",
    "description": "Date/time that the page was accessed by Slurp.",
    "defaultFormat": "d|YYYY-MM-DDTHH:mm"
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "title",
    "_idx": 11,
    "id": "title",
    "metaFields": [
      "og:title",
      "twitter:title"
    ],
    "defaultIdx": 11,
    "defaultKey": "title",
    "description": "Page title as seen in the browser, falling back to the title presented in metadata."
  }
]

The text was updated successfully, but these errors were encountered:

inhumantsar · 2024-05-09T03:36:05Z

hey thanks for the report! looks like neither readability nor slurp were able to find a title for this page. i'll probably have to submit a patch upstream of slurp for this one.

can you share the URL? i'm not seeing it in the logs

Truncated · 2024-05-09T15:08:40Z

These were from www.fastcompany.com
Any link does this; if I'm reading the log above correctly, that represents multiple different links but I honestly didn't think to record the URLs with the auto-generated bug log. I will in the future (got a few more to submit).

inhumantsar · 2024-05-09T20:26:30Z

The logs were quoting a product page. I looked it up and found this: https://sparksoftcorp.com/dev-sec-ops-delivery

The site doesn't have any meta tags or even a title tag so there's not much that Slurp can do on its own. Filenames are sourced from the title. I could set it up to just call it Untitled Page or something but this feels like a pretty rare edge case.

I will be adding more options to the Slurp New Note dialog soon though. That will be the best place to manually give it a title to use.

Truncated · 2024-05-10T14:00:41Z

That's a red herring - the sparkssoft pages were ones I had ingested prior; yes, there wasn't much to pull, but I was most concerned with the text and didn't care about the metadata.

It's the links from the fast company site which is what throws the error. The log output in settings didn't give me a good way to reliably tell what was needed for just the error message, so you got both of the ingestions.

Literally any link from Fastcompany.com throws an error. Here's a clean example from https://www.fastcompany.com/91122708/heres-how-california-state-agencies-plan-use-generative-ai

1715349697499 | DEBUG | onValidate called

Caller: HTMLDivElement.<anonymous> (app://obsidian.md/app.js:1:2170951)

[
  {
    "enabled": true,
    "custom": false,
    "_key": "Source",
    "_idx": 0,
    "id": "link",
    "metaFields": [
      "url",
      "og:url",
      "parsely-link",
      "twitter:url"
    ],
    "defaultIdx": 0,
    "defaultKey": "link",
    "description": "Page URL provided or a permalink discovered in metadata."
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "byline",
    "_idx": 1,
    "id": "byline",
    "metaFields": [
      "author",
      "article:author",
      "parsely-author",
      "cXenseParse:author"
    ],
    "defaultIdx": 1,
    "defaultKey": "byline",
    "description": "Name of the primary author or the first author detected."
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "site",
    "_idx": 2,
    "id": "siteName",
    "metaFields": [
      "og:site_name",
      "page.content.source",
      "application-name",
      "apple-mobile-web-app-title",
      "twitter:site"
    ],
    "defaultIdx": 2,
    "defaultKey": "site",
    "description": "Website or publication name."
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "date",
    "_idx": 3,
    "_format": "d|YYYY-MM-DDTHH:mm",
    "id": "publishedTime",
    "metaFields": [
      "article:published_time",
      "parsely-pub-date",
      "datePublished",
      "article.published"
    ],
    "defaultIdx": 3,
    "defaultKey": "date",
    "description": "Date/time that the page was initially published.",
    "defaultFormat": "d|YYYY-MM-DDTHH:mm"
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "updated",
    "_idx": 4,
    "_format": "d|YYYY-MM-DDTHH:mm",
    "id": "modifiedTime",
    "metaFields": [
      "article:modified_time",
      "dateModified",
      "dateLastPubbed"
    ],
    "defaultIdx": 4,
    "defaultKey": "updated",
    "description": "Date/time that the page was last modified, if available.",
    "defaultFormat": "d|YYYY-MM-DDTHH:mm"
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "type",
    "_idx": 5,
    "id": "type",
    "metaFields": [
      "og:type",
      "parsely-type",
      "medium",
      "page.content.type"
    ],
    "defaultIdx": 5,
    "defaultKey": "type",
    "description": "Type of publication, eg: \"page\", \"post\", \"article\"."
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "excerpt",
    "_idx": 6,
    "id": "excerpt",
    "metaFields": [
      "description",
      "og:description",
      "twitter:description"
    ],
    "defaultIdx": 6,
    "defaultKey": "excerpt",
    "description": "Often used for subtitles, excerpts, descriptions, and abstracts."
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "twitter",
    "_idx": 7,
    "_format": "s|https://twitter.com/{s}",
    "id": "twitter",
    "metaFields": [
      "twitter:creator",
      "twitter:site"
    ],
    "defaultIdx": 7,
    "defaultKey": "twitter",
    "description": "Twitter/X link for the author or site.",
    "defaultFormat": "s|https://twitter.com/{s}"
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "tags",
    "_idx": 8,
    "_format": "S|{prefix}/{tag}",
    "id": "tags",
    "metaFields": [
      "tags",
      "keywords",
      "article:tag",
      "parsely-tags",
      "news_keywords"
    ],
    "defaultIdx": 8,
    "defaultKey": "tags",
    "description": "Tags and keywords present in the page's metadata.",
    "defaultFormat": "S|{prefix}/{tag}"
  },
  {
    "enabled": false,
    "custom": false,
    "_key": "onion",
    "_idx": 9,
    "id": "onion",
    "metaFields": [
      "onion-location"
    ],
    "defaultIdx": 9,
    "defaultKey": "onion",
    "description": "Link to a mirror of the content on Tor."
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "slurped",
    "_idx": 10,
    "_format": "d|YYYY-MM-DDTHH:mm",
    "id": "slurped",
    "defaultIdx": 10,
    "defaultKey": "slurped",
    "description": "Date/time that the page was accessed by Slurp.",
    "defaultFormat": "d|YYYY-MM-DDTHH:mm"
  },
  {
    "enabled": true,
    "custom": false,
    "_key": "title",
    "_idx": 11,
    "id": "title",
    "metaFields": [
      "og:title",
      "twitter:title"
    ],
    "defaultIdx": 11,
    "defaultKey": "title",
    "description": "Page title as seen in the browser, falling back to the title presented in metadata."
  }
]

inhumantsar · 2024-05-10T22:52:34Z

ah ok, yeah the error message slurp displays says that it got a 403 back from fast company, so I'm guessing that they block non-browsers from accessing their pages. I'll have a look but there's likely not much we can do about that

- fix: refactor new note modal, add validation (#21) - fix: remove broken github link and useless log refresh button (#22) - fix: avoid saving settings if no changes are detected

inhumantsar · 2024-05-11T22:06:38Z

fast company does seem to block application access entirely, so i've added a validation step to new note creation which will complain if a fast company link is used. did the same for that product site too.

let me know if you find any other sites which just refuse to be slurped!

inhumantsar added the bug Something isn't working label May 9, 2024

inhumantsar self-assigned this May 9, 2024

inhumantsar added a commit that referenced this issue May 11, 2024

release: 0.1.11

7ba8a4a

- fix: refactor new note modal, add validation (#21) - fix: remove broken github link and useless log refresh button (#22) - fix: avoid saving settings if no changes are detected

inhumantsar added a commit that referenced this issue May 11, 2024

fix: refactor new note modal, add validation (#21)

c6671c5

inhumantsar added a commit that referenced this issue May 11, 2024

release: 0.1.11

f0e1114

- fix: refactor new note modal, add validation (#21) - fix: remove broken github link and useless log refresh button (#22) - fix: avoid saving settings if no changes are detected

inhumantsar closed this as completed May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threw an error on ingestion #21

Threw an error on ingestion #21

Truncated commented May 8, 2024 •

edited

inhumantsar commented May 9, 2024

Truncated commented May 9, 2024

inhumantsar commented May 9, 2024

Truncated commented May 10, 2024 •

edited

inhumantsar commented May 10, 2024

inhumantsar commented May 11, 2024

Threw an error on ingestion #21

Threw an error on ingestion #21

Comments

Truncated commented May 8, 2024 • edited

1715205555277 | DEBUG | onValidate called

inhumantsar commented May 9, 2024

Truncated commented May 9, 2024

inhumantsar commented May 9, 2024

Truncated commented May 10, 2024 • edited

1715349697499 | DEBUG | onValidate called

inhumantsar commented May 10, 2024

inhumantsar commented May 11, 2024

Truncated commented May 8, 2024 •

edited

Truncated commented May 10, 2024 •

edited