Fix browsers blocking GET requests on files/html due CORS #94

fabiodrg · 2021-10-30T00:53:18Z

Fixes the development environment issue where unit tests would not work because of CORS when fetching HTML files dynamically. Closes CORS breaks existing unit testing approach #93
Adjust timeouts on some unit tests
Fixes table validation in Datatable extractor ( a bit off-topic, but...)
Attempts to extract the last row if its a footer. For now, only supports rows with an explicit Sigarra's class that indicates the row shows a total value. By removing the rows from the body, such rows are not considered for sorting/filtering, which I think is a nice touch
Ensures the DOM remains untouched if the DataTable fails by cloning the original <table> node and making all pre-processing and DataTable init in the cloned version. Only when there is a guarantee the table is valid, the DOM is updated.
Fixes loading GIF not showing in infinite scroll after first page loading
Enhances how new pages data are inserted in the table in infinite scroll: rather than destroying the table, it uses the existing APIs to add the new fetched items
Disables data table and infinite scroll in pages where we know in advance they will fail and could conflict with other extractors: bills, time table and exams

The updatejQueryContext is updated to prepend a localhost:3000 address on all URL requests. Besides, all extractors requeted paths are updated. Before, they started on /test/pages/<...>. Since this is only used for HTML pages under /test/pages/, I think we could remove that part. The server can actually have the root on that directory, recuding the length of paths in the code

The previous attempt was launching the web server programatically and then using Node.js process APIs to launch a browser on `src/tests.html` file. However, I realized that http-server CLI has the option to open a file, thus a single command can launch the webserver and open the default browser with the main tests page. This probably already works cross-platform, while my attempt would require special handling in Linux/Windows/Mac I'am afraid... oh well, sounds like a good approach.

One of the tests was failling due timeouts. Thus, I slightly increased the timeout for that particular test to 5 seconds

incorrect

The previous heuristic was to get all <tr> of a table and if the length is >= 2, then it is valid. The reasoning was one <tr> is for the header, thus we need two extra <tr> that should be the table body. Since there is a method that gathers the table header rows, the new approach is smarter because it does not assume a single header row. Computing the diff <all rows> - <header rows> gives the data/body rows, which then is used for validation

fabiodrg · 2021-11-02T09:27:51Z

I am converting this to a draft because I detected a potential conflict with datatables and other extractors. For instance, the exam extractor. I will look into it later today. I will probably push into this PR because I did some fixes in datatables here, soo.. just preventing git conflicts

So, some of the Sigarra code uses tables for layout. Very old technique, but still there. For instance, in the degree plan, it is possible to see multiple nested tables, and only the last one in the tree shows the UCs, ECTCs, etc.

By extracting some footers, for example showing a total value or similar, it would be possible to support more data tables. The heuristic is fairly simple: if the last table has unmatching columns with the table body, then wrap around it <tfoot>.

I noticed that sometimes DataTable starts modifying the DOM and fails later in process. Which means, the DOM suffered modifications, e.g. filter/buttons, sorting buttons, etc (which funny enough can work to some extent in some columns, but not all..). In such cases, the proposed change tries to aggressively restore the DOM as if there was no attempt to apply DataTable. Its achieved by cloning the table before any pre-processing, and then restoring the DOM if needed.

fabiodrg · 2021-11-07T00:13:58Z

Issue went away. Indeed applying DataTable was clashing with the bills extractor. But since DataTable fails, after performing the cleaning, the bills extractor still works. Nonetheless, I think we should review all extractors contexts and block DataTables if (a) data tables don't make sense on some specific page (i.e. exams page) or if there is any chance of conflicts (e.g. bills, which in fact we know in advance it will fail anyway). Will do that tomorrow

I noticed a few issues in some tables, so I reverted the heuristic used. Instead, it relies in a class used by Sigarra.

failure Before it was like: - Clone original table DOM - Try to apply DataTable - If fails, try to restaure the original DOM, cloned in first step The recovery is not that easy because the DataTable can fail at very different stages. Perhaps, it just added the wrapper containers for the buttons (csv, excel, ...) and search input. But I found cases where it goes further and inserts sorting event listeners and the buttons in some columns. There are also scenarios where it fails in very early stages, adding just a few classes in 'class' attribute. Thus, it becomes difficult to ensure a clean DOM like the original one. With this commit, the procedure is: - Clone original table - Try to pre-process and apply DataTable in the cloned version - If OK, then replace the original table in the DOM with the new DOM containing filter/search components, the DataTable itself, etc. Advantages for the new approach are: - 100% clean DOM upon failure - More performant. The DOM in the webpage is manipulated once per table on success. In case of failure, theres no change at all saving rendering/parsing cycles.

changes - Main issue was infinity scroll querying DOM elements at the constructor before the DataTable extractor completing its job. Therefore, the DOM would modify and the refereces in Infinite Scroll would be invalid - Another issue was the loading gif missing after the first load. The reason is when the DataTable is updated with new data, the DOM is reconstructured and the <img> node for the GIF is destroied. The fix is to attach this element in the DOM everytime. - This also changes how data is added to an existing table. There is no need to destroy the previous table. The API allows to pass the new elements and then redraw the table. Besides, you can pass HTML elements as input (i.e. <tr>)

In some pages we know in advance that data table or infinite scroll simply wont work and could conflict with other extractors. The manifest is updated to exlude the said extractors in specific pages: bills, exams and time table

invoked The '$' selector was creating a new DOM object everytime is invoked. Imagine an extract that uses '$' to select some DOM objects and modify the dom. A subsequent use of '$' would create a new DOM from the original HTML file and previous changes would be lost.

fabiodrg added 3 commits October 30, 2021 01:45

chore: Add 'http-server' to create a local server for unit testing

a7579de

See #93

wip: Initial experiment on launching local web server for unit testing

653b4b8

fabiodrg changed the base branch from master to dev October 30, 2021 00:53

fabiodrg added 4 commits October 30, 2021 11:43

test: Increase timeout for 'grades' unit tests

8c67882

One of the tests was failling due timeouts. Thus, I slightly increased the timeout for that particular test to 5 seconds

fix: The validation test for single row tables in datatables was

96e2df3

incorrect

fabiodrg marked this pull request as ready for review October 30, 2021 11:16

fabiodrg marked this pull request as draft November 2, 2021 09:26

fabiodrg self-assigned this Nov 2, 2021

fabiodrg added 4 commits November 6, 2021 22:12

feat: Perform cleanup if DataTable fails

991bfae

feat: Try to extract footers in tables

4ee40dd

By extracting some footers, for example showing a total value or similar, it would be possible to support more data tables. The heuristic is fairly simple: if the last table has unmatching columns with the table body, then wrap around it <tfoot>.

fabiodrg added 4 commits November 7, 2021 12:31

fix: Make footer extraction more conservative

d7a3a18

I noticed a few issues in some tables, so I reverted the heuristic used. Instead, it relies in a class used by Sigarra.

fabiodrg marked this pull request as ready for review November 7, 2021 23:36

fabiodrg requested a review from msramalho November 7, 2021 23:52

msramalho approved these changes Dec 2, 2021

View reviewed changes

fabiodrg added 2 commits March 13, 2022 19:41

test: Complete unit tests for datatable extractor

1a78fd0

fabiodrg merged commit 34c77e0 into dev Mar 13, 2022

fabiodrg deleted the fix/cors-unit-test branch March 13, 2022 20:12

fabiodrg added this to Done in v4.2 Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix browsers blocking GET requests on files/html due CORS #94

Fix browsers blocking GET requests on files/html due CORS #94

fabiodrg commented Oct 30, 2021 •

edited

fabiodrg commented Nov 2, 2021

fabiodrg commented Nov 7, 2021 •

edited

Fix browsers blocking GET requests on files/html due CORS #94

Fix browsers blocking GET requests on files/html due CORS #94

Conversation

fabiodrg commented Oct 30, 2021 • edited

fabiodrg commented Nov 2, 2021

fabiodrg commented Nov 7, 2021 • edited

fabiodrg commented Oct 30, 2021 •

edited

fabiodrg commented Nov 7, 2021 •

edited