Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract content from GitHub repos. #306

Merged
merged 5 commits into from
Mar 14, 2019

Conversation

benubois
Copy link
Contributor

@benubois benubois commented Mar 4, 2019

This helps mercury find README content for GitHub repos. Currently you get this when using mercury on a repository:

You can’t perform that action at this time.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.

Thanks!

@postlight-org
Copy link
Collaborator

🤖 Automated Parsing Preview 🤖

Commit: Extract content from GitHub repos.

Screenshot of fixture (this embed should work after repo is public)

Original Article | HTML Fixture | Parsed Content Preview

Parsed JSON
{
  "title": "steventroughtonsmith/marzipanify",
  "content": "<div><div><article class=\"markdown-body entry-content\">\n<p>marzipanify is an unsupported commandline tool to take an existing iOS Simulator binary (with minimum deployment target of iOS 12.0) and statically convert it and its embedded libraries &amp; frameworks to run on macOS 10.14&apos;s UIKit runtime (Marzipan).</p>\n<p>This isn&apos;t a tool to automatically port your iOS app to the Mac &#x2014; moreso something to get you up and running quickly.</p>\n<p>As an iOS Simulator app links against the iOS Simulator version of UIKit, it won&apos;t contain Marzipan-specific APIs like menu &amp; window toolbar support. It&apos;s up to the user to know how to class-dump UIKitCore from /System/iOSSupport/System/Library/PrivateFrameworks and check for the macOS-specific UIKit APIs at runtime so the app can be a good Mac citizen.</p>\n<p>N.B. You will still need all the relevant Marzipan-related enabler steps (like disabling SIP &amp; AMFI) before a converted app will run with your signature.</p>\n<h2><a id=\"user-content-usage\" class=\"anchor\" href=\"https://github.com/steventroughtonsmith/marzipanify#usage\"><svg class=\"octicon octicon-link\" width=\"16\" height=\"16\"><path/></svg></a>Usage</h2>\n<p><code>marzipanify MyApp.app|MyFramework.framework|MyBinary</code></p>\n<h2><a id=\"user-content-screenshot\" class=\"anchor\" href=\"https://github.com/steventroughtonsmith/marzipanify#screenshot\"><svg class=\"octicon octicon-link\" width=\"16\" height=\"16\"><path/></svg></a>Screenshot</h2>\n<p><a href=\"https://camo.githubusercontent.com/3b4a3c8b44a950b670dcbfb1e6eb86a57f9d5e6d/68747470733a2f2f686363646174612e73332e616d617a6f6e6177732e636f6d2f67685f6d61727a6970616e6966792e6a7067\"><img src=\"https://hccdata.s3.amazonaws.com/gh_marzipanify.jpg\" alt=\"screenshot\"></a></p>\n</article></div></div>",
  "author": null,
  "date_published": null,
  "lead_image_url": "https://avatars0.githubusercontent.com/u/45212?s=400&v=4",
  "dek": null,
  "next_page_url": null,
  "url": "https://github.com/steventroughtonsmith/marzipanify",
  "domain": "github.com",
  "word_count": 34,
  "direction": "ltr",
  "total_pages": 1,
  "rendered_pages": 1
}

null fields

  • author

  • date_published

  • dek

  • next_page_url

✅ All tests passed

@adampash
Copy link
Contributor

adampash commented Mar 7, 2019

@benubois Do you think we could get author out of the page?

It also seems like we could use span[itemprop="dateModified"] relative-time for the date_published, and I think the span[itemprop="about"] would work for the dek (functions in the same basic manner)?

@benubois
Copy link
Contributor Author

benubois commented Mar 7, 2019

Do you think we could get author out of the page?

I actually have a question about this. The generic extractor does get something for author. The docs mention that "it will fall back to its default generic extractor" if no matches are found from the selectors array, but that does not seem to be the case. Is there any way to force it to use the generic extractor for author rather than duplicating the logic in the custom extractor?

It also seems like we could use span[itemprop="dateModified"] relative-time for the date_published, and I think the span[itemprop="about"] would work for the dek (functions in the same basic manner)?

Sounds good! Done.

@adampash
Copy link
Contributor

adampash commented Mar 7, 2019

The docs mention that "it will fall back to its default generic extractor" if no matches are found from the selectors array, but that does not seem to be the case.

So Mercury will fall back to the default generic extractor when it runs outside of the tests, but the tests pass the fallback option to ensure we're testing only the custom parser as its written and not fallbacks. Does that make sense?

@benubois
Copy link
Contributor Author

benubois commented Mar 7, 2019

Yes! But then does the custom extractor need anything for author? The output from cli does indeed include it:

{
  "title": "steventroughtonsmith/marzipanify",
  "content": "<div><div><article class=\"markdown-body entry-content\">\n<p>marzipanify is an unsupported commandline tool to take an existing iOS Simulator binary (with minimum deployment target of iOS 12.0) and statically convert it and its embedded libraries &amp; frameworks to run on macOS 10.14&apos;s UIKit runtime (Marzipan).</p>\n<p>This isn&apos;t a tool to automatically port your iOS app to the Mac &#x2014; moreso something to get you up and running quickly.</p>\n<p>As an iOS Simulator app links against the iOS Simulator version of UIKit, it won&apos;t contain Marzipan-specific APIs like menu &amp; window toolbar support. It&apos;s up to the user to know how to class-dump UIKitCore from /System/iOSSupport/System/Library/PrivateFrameworks and check for the macOS-specific UIKit APIs at runtime so the app can be a good Mac citizen.</p>\n<p>N.B. You will still need all the relevant Marzipan-related enabler steps (like disabling SIP &amp; AMFI) before a converted app will run with your signature.</p>\n<h2><a id=\"user-content-usage\" class=\"anchor\" href=\"https://github.com/steventroughtonsmith/marzipanify#usage\"><svg class=\"octicon octicon-link\" width=\"16\" height=\"16\"><path/></svg></a>Usage</h2>\n<p><code>marzipanify MyApp.app|MyFramework.framework|MyBinary</code></p>\n<h2><a id=\"user-content-screenshot\" class=\"anchor\" href=\"https://github.com/steventroughtonsmith/marzipanify#screenshot\"><svg class=\"octicon octicon-link\" width=\"16\" height=\"16\"><path/></svg></a>Screenshot</h2>\n<p><a href=\"https://camo.githubusercontent.com/3b4a3c8b44a950b670dcbfb1e6eb86a57f9d5e6d/68747470733a2f2f686363646174612e73332e616d617a6f6e6177732e636f6d2f67685f6d61727a6970616e6966792e6a7067\"><img src=\"https://hccdata.s3.amazonaws.com/gh_marzipanify.jpg\" alt=\"screenshot\"></a></p>\n</article></div></div><hr><h4>Page 2</h4><div><p id=\"ajax-error-message\" class=\"ajax-error-message\"> <svg class=\"octicon octicon-alert\" width=\"16\" height=\"16\"><path/></svg> You can&#x2019;t perform that action at this time. </p><p class=\"js-stale-session-flash\"> <svg class=\"octicon octicon-alert\" width=\"16\" height=\"16\"><path/></svg> <span class=\"signed-in-tab-flash\">You signed in with another tab or window. <a href=\"https://github.com/steventroughtonsmith/marzipanify/pull/2\">Reload</a> to refresh your session.</span> <span class=\"signed-out-tab-flash\">You signed out in another tab or window. <a href=\"https://github.com/steventroughtonsmith/marzipanify/pull/2\">Reload</a> to refresh your session.</span> </p><template id=\"site-details-dialog\"> <details class=\"details-reset details-overlay details-overlay-dark lh-default text-gray-dark\"> <summary></summary> <details-dialog class=\"Box Box--overlay d-flex flex-column anim-fade-in fast\"> </details-dialog> </details>\n</template></div>",
  "author": "steventroughtonsmith",
  "date_published": "2019-03-04T12:37:07.000Z",
  "lead_image_url": "https://avatars0.githubusercontent.com/u/45212?s=400&v=4",
  "dek": null,
  "next_page_url": "https://github.com/steventroughtonsmith/marzipanify/pull/2",
  "url": "https://github.com/steventroughtonsmith/marzipanify",
  "domain": "github.com",
  "excerpt": "Convert an iOS Simulator app bundle to an iOSMac (Marzipan) one (Unsupported & undocumented, WIP) - steventroughtonsmith/marzipanify",
  "word_count": 181,
  "direction": "ltr",
  "total_pages": 2,
  "pages_rendered": 2
}

@postlight-org
Copy link
Collaborator

🤖 Automated Parsing Preview 🤖

Commit: Timezone fix.

Screenshot of fixture (this embed should work after repo is public)

Original Article | HTML Fixture | Parsed Content Preview

Parsed JSON
{
  "title": "steventroughtonsmith/marzipanify",
  "content": "<div><div><article class=\"markdown-body entry-content\">\n<p>marzipanify is an unsupported commandline tool to take an existing iOS Simulator binary (with minimum deployment target of iOS 12.0) and statically convert it and its embedded libraries &amp; frameworks to run on macOS 10.14&apos;s UIKit runtime (Marzipan).</p>\n<p>This isn&apos;t a tool to automatically port your iOS app to the Mac &#x2014; moreso something to get you up and running quickly.</p>\n<p>As an iOS Simulator app links against the iOS Simulator version of UIKit, it won&apos;t contain Marzipan-specific APIs like menu &amp; window toolbar support. It&apos;s up to the user to know how to class-dump UIKitCore from /System/iOSSupport/System/Library/PrivateFrameworks and check for the macOS-specific UIKit APIs at runtime so the app can be a good Mac citizen.</p>\n<p>N.B. You will still need all the relevant Marzipan-related enabler steps (like disabling SIP &amp; AMFI) before a converted app will run with your signature.</p>\n<h2><a id=\"user-content-usage\" class=\"anchor\" href=\"https://github.com/steventroughtonsmith/marzipanify#usage\"><svg class=\"octicon octicon-link\" width=\"16\" height=\"16\"><path/></svg></a>Usage</h2>\n<p><code>marzipanify MyApp.app|MyFramework.framework|MyBinary</code></p>\n<h2><a id=\"user-content-screenshot\" class=\"anchor\" href=\"https://github.com/steventroughtonsmith/marzipanify#screenshot\"><svg class=\"octicon octicon-link\" width=\"16\" height=\"16\"><path/></svg></a>Screenshot</h2>\n<p><a href=\"https://camo.githubusercontent.com/3b4a3c8b44a950b670dcbfb1e6eb86a57f9d5e6d/68747470733a2f2f686363646174612e73332e616d617a6f6e6177732e636f6d2f67685f6d61727a6970616e6966792e6a7067\"><img src=\"https://hccdata.s3.amazonaws.com/gh_marzipanify.jpg\" alt=\"screenshot\"></a></p>\n</article></div></div>",
  "author": null,
  "date_published": "2019-03-04T12:37:07.000Z",
  "lead_image_url": "https://avatars0.githubusercontent.com/u/45212?s=400&v=4",
  "dek": "Convert an iOS Simulator app bundle to an iOSMac (Marzipan) one (Unsupported & undocumented, WIP)",
  "next_page_url": null,
  "url": "https://github.com/steventroughtonsmith/marzipanify",
  "domain": "github.com",
  "word_count": 34,
  "direction": "ltr",
  "total_pages": 1,
  "rendered_pages": 1
}

null fields

  • author

  • next_page_url

✅ All tests passed

@adampash
Copy link
Contributor

adampash commented Mar 8, 2019

But then does the custom extractor need anything for author?

I like to include it whenever possible just for the sake of redundancy and because custom selectors are a more direct route to the data we want — it's saying "author is here" rather than "use heuristics to find author." I know it might seem a little tedious, but the idea is that it's faster, and if it does eventually fail, it can always fall back to the heuristics.

@postlight-org
Copy link
Collaborator

🤖 Automated Parsing Preview 🤖

Commit: Merge branch 'master' into readme_extractor

Screenshot of fixture (this embed should work after repo is public)

Original Article | HTML Fixture | Parsed Content Preview

Parsed JSON
{
  "title": "steventroughtonsmith/marzipanify",
  "content": "<div><div><article class=\"markdown-body entry-content\">\n<p>marzipanify is an unsupported commandline tool to take an existing iOS Simulator binary (with minimum deployment target of iOS 12.0) and statically convert it and its embedded libraries &amp; frameworks to run on macOS 10.14&apos;s UIKit runtime (Marzipan).</p>\n<p>This isn&apos;t a tool to automatically port your iOS app to the Mac &#x2014; moreso something to get you up and running quickly.</p>\n<p>As an iOS Simulator app links against the iOS Simulator version of UIKit, it won&apos;t contain Marzipan-specific APIs like menu &amp; window toolbar support. It&apos;s up to the user to know how to class-dump UIKitCore from /System/iOSSupport/System/Library/PrivateFrameworks and check for the macOS-specific UIKit APIs at runtime so the app can be a good Mac citizen.</p>\n<p>N.B. You will still need all the relevant Marzipan-related enabler steps (like disabling SIP &amp; AMFI) before a converted app will run with your signature.</p>\n<h2><a id=\"user-content-usage\" class=\"anchor\" href=\"https://github.com/steventroughtonsmith/marzipanify#usage\"><svg class=\"octicon octicon-link\" width=\"16\" height=\"16\"><path/></svg></a>Usage</h2>\n<p><code>marzipanify MyApp.app|MyFramework.framework|MyBinary</code></p>\n<h2><a id=\"user-content-screenshot\" class=\"anchor\" href=\"https://github.com/steventroughtonsmith/marzipanify#screenshot\"><svg class=\"octicon octicon-link\" width=\"16\" height=\"16\"><path/></svg></a>Screenshot</h2>\n<p><a href=\"https://camo.githubusercontent.com/3b4a3c8b44a950b670dcbfb1e6eb86a57f9d5e6d/68747470733a2f2f686363646174612e73332e616d617a6f6e6177732e636f6d2f67685f6d61727a6970616e6966792e6a7067\"><img src=\"https://hccdata.s3.amazonaws.com/gh_marzipanify.jpg\" alt=\"screenshot\"></a></p>\n</article></div></div>",
  "author": null,
  "date_published": "2019-03-04T12:37:07.000Z",
  "lead_image_url": "https://avatars0.githubusercontent.com/u/45212?s=400&v=4",
  "dek": "Convert an iOS Simulator app bundle to an iOSMac (Marzipan) one (Unsupported & undocumented, WIP)",
  "next_page_url": null,
  "url": "https://github.com/steventroughtonsmith/marzipanify",
  "domain": "github.com",
  "word_count": 34,
  "direction": "ltr",
  "total_pages": 1,
  "rendered_pages": 1
}

null fields

  • author

  • next_page_url

✅ All tests passed

Copy link
Contributor

@adampash adampash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @benubois!

@adampash adampash merged commit a7e4c67 into postlight:master Mar 14, 2019
@postlight-org
Copy link
Collaborator

🤖 Automated Parsing Preview 🤖

Commit: Merge branch 'master' into readme_extractor

Screenshot of fixture (this embed should work after repo is public)

Original Article | HTML Fixture | Parsed Content Preview

Parsed JSON
{
  "title": "steventroughtonsmith/marzipanify",
  "content": "<div><div><article class=\"markdown-body entry-content\">\n<p>marzipanify is an unsupported commandline tool to take an existing iOS Simulator binary (with minimum deployment target of iOS 12.0) and statically convert it and its embedded libraries &amp; frameworks to run on macOS 10.14&apos;s UIKit runtime (Marzipan).</p>\n<p>This isn&apos;t a tool to automatically port your iOS app to the Mac &#x2014; moreso something to get you up and running quickly.</p>\n<p>As an iOS Simulator app links against the iOS Simulator version of UIKit, it won&apos;t contain Marzipan-specific APIs like menu &amp; window toolbar support. It&apos;s up to the user to know how to class-dump UIKitCore from /System/iOSSupport/System/Library/PrivateFrameworks and check for the macOS-specific UIKit APIs at runtime so the app can be a good Mac citizen.</p>\n<p>N.B. You will still need all the relevant Marzipan-related enabler steps (like disabling SIP &amp; AMFI) before a converted app will run with your signature.</p>\n<h2><a id=\"user-content-usage\" class=\"anchor\" href=\"https://github.com/steventroughtonsmith/marzipanify#usage\"><svg class=\"octicon octicon-link\" width=\"16\" height=\"16\"><path/></svg></a>Usage</h2>\n<p><code>marzipanify MyApp.app|MyFramework.framework|MyBinary</code></p>\n<h2><a id=\"user-content-screenshot\" class=\"anchor\" href=\"https://github.com/steventroughtonsmith/marzipanify#screenshot\"><svg class=\"octicon octicon-link\" width=\"16\" height=\"16\"><path/></svg></a>Screenshot</h2>\n<p><a href=\"https://camo.githubusercontent.com/3b4a3c8b44a950b670dcbfb1e6eb86a57f9d5e6d/68747470733a2f2f686363646174612e73332e616d617a6f6e6177732e636f6d2f67685f6d61727a6970616e6966792e6a7067\"><img src=\"https://hccdata.s3.amazonaws.com/gh_marzipanify.jpg\" alt=\"screenshot\"></a></p>\n</article></div></div>",
  "author": null,
  "date_published": "2019-03-04T12:37:07.000Z",
  "lead_image_url": "https://avatars0.githubusercontent.com/u/45212?s=400&v=4",
  "dek": "Convert an iOS Simulator app bundle to an iOSMac (Marzipan) one (Unsupported & undocumented, WIP)",
  "next_page_url": null,
  "url": "https://github.com/steventroughtonsmith/marzipanify",
  "domain": "github.com",
  "word_count": 34,
  "direction": "ltr",
  "total_pages": 1,
  "rendered_pages": 1
}

null fields

  • author

  • next_page_url

✅ All tests passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants