Skip to content

Complex Websites

sixcious edited this page Nov 10, 2023 · 10 revisions

While Infy was designed to work with as many websites as possible, some sites use techniques that make them complex (or even impossible!) to work with. The following is a list of these types of websites:

  • AJAX Sites and SPAs (Single-page Applications)
  • JavaScript Links
  • Forms
  • Sites that require Puppeteering

Important: All of these strategies presented here require you to use the new AJAX Append Mode.

AJAX Sites and SPAs (Single-page Applications)

This is the most common type you'll want to understand. Some sites are not designed to be viewed as multiple pages, but rather as AJAX Websites or Single-page Applications (SPAs). These sites typically don't feature links, so it makes it difficult to append pages for them.

So, how can you tell if you're on an AJAX site or SPA? Here are three big giveaways:

  • The entire page (tab) isn't reloading as you navigate from page to page (in other words, only part of the page is being updated)
  • The address bar isn't updating as you navigate from page to page (however, many AJAX sites do update the address bar, and there is a small possibility it may actually be a Form site even if it doesn't update it)
  • The site isn't using links for its pagination/next page button (inspect the HTML content and see if it's actually using any a elements for links)
Example Website

https://www.pixiv.net/tags/オリジナル/illustrations?mode=safe

You can follow this example website more in depth here.

It's difficult to give just one example website because this type encompasses a lot of sites, such as:

  • Many modern and newer websites
  • Manga or Comic Readers
  • Novel Readers
Example Code

This is a pretty broad category that encompasses a lot of different types of websites, so it's hard to give just one example. However, if you inspect their Next buttons and it doesn't look like it's a link (an a element), it might be an SPA, like the following:

<button>Next</button>

As the above example shows, this site is just using a button and not an a with a href that points to a URL.

Strategy

Infy should be compatible with many of these sites using the AJAX append mode. SPAs and AJAX sites work by having you click a button and then by replacing the elements on the current page with the next page's elements. Try using Infy's Click Element action in conjunction with the new AJAX append mode.

JavaScript Links

Some sites have a next link that doesn't point to a normal URL, but rather a javascript: protocol URL. Alternatively, the next link may use an onclick event handler, either declared inline in its HTML or added separately in a script. The JavaScript code usually calls a function that navigates you to the next link. This style of link is generally considered bad web design practice and dated by today's standards.

Example Code

Example 1

<a href="javascript:navigateToNextURL('foo', 2);">Next</a>

Example 2

<a href="#" onclick="self.location='/foo/2'; return false;">Next</a>
Strategy

You should be able to make most of these sites work by using the new AJAX append mode. Alternatively, you can try making it work in other ways. For example, try clicking the Next link and see what the final URL is in your browser's address bar; if it's just doing something simple like incrementing a page number, you can try using the Increment URL action.

Forms

Some sites use forms as their next buttons, not links. The form usually uses the post method, so it doesn't update the URL in your address bar with the inputs it used to find the next page.

Example Website

https://www.startpage.com

Example Code
<form method="post" action="/search">
  <input type="hidden" name="query" value="foo">
  <input type="hidden" name="page" value="2">
  <button type="submit">Next</button>
</form>
Strategy

You should be able to make most of these sites work by using the Click Element and AJAX Iframe append mode. In the UI Window, open the Scripts dialog and check the Mirror the page (Forms/SPAs) setting and select the By Importing option as well. This will clone the page (and its form), allowing the AJAX Iframe to start on the same page you're on before it starts clicking the form's submit button. Then proceed as normal to fill in the Click Element (the form's button/input) and AJAX Iframe settings.

Alternate Strategy

As a last resort, alternatively, you may be able to make it work by using the Increment URL action to construct a URL that contains both the form's action with the inputs the form is expecting. Try using DevTools to inspect the website and look at the HTML of the form. For example, if the form's action is /search and you can see the inputs inside the form's HTML are query and page, you could construct a URL like https://www.example.com/search?query=foo&page=2 and have it keep incrementing the page number. This may not work if the website requires the form to be submitted via the post method, but is worth a try.

Sites that require Puppeteering

This is technically a sub-category of SPAs, but warrants its own category as it requires some special handling. Some sites are not only SPAs, but hide their page content in such a way that requires some form of manual clicking (puppeteering) to load the content you want to load. These sites require the AJAX Iframe append mode to mirror the page, as well as a puppeteering script.

For example, consider sites that load their page content (the part you want infinite scrolling on) in:

  • Dialogs or Popups
  • Collapsible elements (like Accordions)

When a site does this, Infy may not be able to see the content in the AJAX Iframe initially. This would require a form of Puppeteering to mimic the clicks you performed to get the AJAX Iframe in the same state you're currently seeing the page in. Luckily, there's an option that does just that in Infy's Scripts dialog.

Example Website

There's an example website in the InfyScroll Database you can look at. This specific example site has the part we want infinite scrolling in its Reviews popup, which is hidden inside an accordion. (Note that it's possible that this site may no longer be working at the point in time you're reading this.)

Strategy

First, make sure you are using the AJAX Iframe append mode. Then, open the Scripts dialog (bottom right corner) and:

  1. Check the Mirror the page (Forms/SPAs) setting and select the By Puppeteering option. You can then write a puppeteer-like script that contains the elements to click on.
  2. Also, you'll almost always want to enable the Watch for changes on the page to help enable late activation (AJAX/SPA) setting, so make sure to check it.

Currently, Infy only supports scripts that can click elements and the paths must be CSS Selectors (XPath is not allowed here). So each line in the Puppeteer script must look something like this:

await page.click("<selector>");

Here's an example script for the above example website that clicks on its reviews accordion and then clicks a button opened by the accordion that opens the reviews in a dialog (a more complex case).

await page.click("[data-test='reviewsAccordionClick'] > summary");
await page.click("button[data-test='more-reviews']");

If puppeteering isn't working, you can try using the AJAX Native append mode as a last resort.