Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Unit tests #126

Open
wants to merge 33 commits into
base: master
Choose a base branch
from
Open

[WIP] Unit tests #126

wants to merge 33 commits into from

Conversation

JKingweb
Copy link

@JKingweb JKingweb commented Jul 9, 2023

This patch constitutes a series of generic unit tests for microformat parsers, using the existing test format. While it is not possible to segregate certain parsing features entirely (notably implied property parsing), I've attempted to keep to the following design philosophy:

  • each file tests a single feature or facet of parsing, as much as possible
  • types of microformats and names of properties avoid known vocabularies to emphasize parsing is generic
  • no two microformats in a given file produce the same output, to make output comparsison easier
  • example.test is used instead of example.com, as the former is guaranteed never to be a real domain
  • tests try to be as thorough as practical
  • standard features which have experimental alternatives are tested in isolated files so that they may be skipped easily
  • under- or un-specified behaviour is tested in isolated files prefixed with tentative- so that they may be easily skipped; these files reflect common behaviour among established implementations where there is majority agreement
  • leave existing tests alone to validate that these tests do not contradict existing tests

At present the test suite offers only partial coverage. The to-do list is thus:

  • Microformat type and property name splitting/matching
  • Basic property parsing
  • ID parsing
  • textContent parsing
  • Value class pattern parsing
  • Value-title parsing
  • Value class date parsing
  • Implied property parsing
  • Nested microformat parsing
  • Link relation parsing
  • Template handling
  • Foreign content handling
  • URL resolution
  • URL normalization
  • Backcompat processing
    • vocabularies
      • adr
      • vcard
      • vevent
      • hfeed
      • hentry
      • geo
      • hproduct
      • hrecipe
      • hresume
      • hreview
      • hreview-aggregate
      • hnews (?)
    • includes
    • multi-roots
    • mixed v2/backcompat
    • mixed v2 implied/backcompat
  • Experimental features
    • lang parsing
    • alternate textContent parsing
    • Universal date parsing
    • Universal implied date
    • Implied time zone
    • Ignoring Tailwind types
    • srcset parsing

I am posting this while still far from finished to gather feedback early. I expect it will take me quite a while to do everything, but what's here can already be useful to implementers.

There is much divergent and poorly-documented on display here
As suggested by gRegor and Tantek Çelik, this avoids any implication we
are testing real vocabularies, past, present, or future
Copy link
Member

@tantek tantek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spot checked all the files and what I checked looked good. Lots here and at this point I think we should land this and see how existing parser implementations do, and then investigate any failures in detail to verify that any failing test is itself valid.

@JKingweb
Copy link
Author

I have no objections to merging the work thus far as a partial test suite, though I should probably write up some draft documentation (more or less what's detailed in the cover of this request) first. Is there a preferred format? Plain text, markdown, HTML, something else?

@Zegnat
Copy link
Member

Zegnat commented Nov 5, 2023

Personally I think a markdown file containing basically what is in the PR description would be nice. Linked to from README, I would say, and maybe even at the root level of the repository. But there does not seem to be any precedent.

There are some changelog files, but honestly I have never read them, in part because they are HTML files. That makes it basically a requirement to clone the repo and open it with a browser to read comfortably. That is why I would prefer markdown, GitHub has native support for it, and it will be easier to refer to it inside the repo.

@gRegorLove
Copy link
Member

I've been using https://keepachangelog.com/ format recently on some projects and liking it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants