Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix browsers blocking GET requests on files/html due CORS #94

Merged
merged 17 commits into from
Mar 13, 2022

Conversation

fabiodrg
Copy link
Collaborator

@fabiodrg fabiodrg commented Oct 30, 2021

  • Fixes the development environment issue where unit tests would not work because of CORS when fetching HTML files dynamically. Closes CORS breaks existing unit testing approach #93
  • Adjust timeouts on some unit tests
  • Fixes table validation in Datatable extractor ( a bit off-topic, but...)
  • Attempts to extract the last row if its a footer. For now, only supports rows with an explicit Sigarra's class that indicates the row shows a total value. By removing the rows from the body, such rows are not considered for sorting/filtering, which I think is a nice touch
    image
  • Ensures the DOM remains untouched if the DataTable fails by cloning the original <table> node and making all pre-processing and DataTable init in the cloned version. Only when there is a guarantee the table is valid, the DOM is updated.
  • Fixes loading GIF not showing in infinite scroll after first page loading
  • Enhances how new pages data are inserted in the table in infinite scroll: rather than destroying the table, it uses the existing APIs to add the new fetched items
  • Disables data table and infinite scroll in pages where we know in advance they will fail and could conflict with other extractors: bills, time table and exams

The updatejQueryContext is updated to prepend a localhost:3000 address
on all URL requests. Besides, all extractors requeted paths are updated.
Before, they started on /test/pages/<...>. Since this is only used for
HTML pages under /test/pages/, I think we could remove that part. The
server can actually have the root on that directory, recuding the length
of paths in the code
@fabiodrg fabiodrg changed the base branch from master to dev October 30, 2021 00:53
The previous attempt was launching the web server programatically and
then using Node.js process APIs to launch a browser on `src/tests.html`
file. However, I realized that http-server CLI has the option to open a
file, thus a single command can launch the webserver and open the
default browser with the main tests page. This probably already works
cross-platform, while my attempt would require special handling in
Linux/Windows/Mac I'am afraid... oh well, sounds like a good approach.
One of the tests was failling due timeouts. Thus, I slightly increased
the timeout for that particular test to 5 seconds
The previous heuristic was to get all <tr> of a table and if the length
is >= 2, then it is valid. The reasoning was one <tr> is for the header,
thus we need two extra <tr> that should be the table body.

Since there is a method that gathers the table header rows, the new
approach is smarter because it does not assume a single header row.
Computing the diff <all rows> - <header rows> gives the data/body rows,
which then is used for validation
@fabiodrg fabiodrg marked this pull request as ready for review October 30, 2021 11:16
@fabiodrg fabiodrg marked this pull request as draft November 2, 2021 09:26
@fabiodrg
Copy link
Collaborator Author

fabiodrg commented Nov 2, 2021

I am converting this to a draft because I detected a potential conflict with datatables and other extractors. For instance, the exam extractor. I will look into it later today. I will probably push into this PR because I did some fixes in datatables here, soo.. just preventing git conflicts

@fabiodrg fabiodrg self-assigned this Nov 2, 2021
So, some of the Sigarra code uses tables for layout. Very old technique,
but still there. For instance, in the degree plan, it is possible to see
multiple nested tables, and only the last one in the tree shows the UCs,
ECTCs, etc.
By extracting some footers, for example showing a total value or
similar, it would be possible to support more data tables. The heuristic
is fairly simple: if the last table has unmatching columns with the
table body, then wrap around it <tfoot>.
I noticed that sometimes DataTable starts modifying the DOM and fails
later in process. Which means, the DOM suffered modifications, e.g.
filter/buttons, sorting buttons, etc (which funny enough can work to
some extent in some columns, but not all..). In such cases, the proposed
change tries to aggressively restore the DOM as if there was no attempt
to apply DataTable. Its achieved by cloning the table before any
pre-processing, and then restoring the DOM if needed.
@fabiodrg
Copy link
Collaborator Author

fabiodrg commented Nov 7, 2021

Issue went away. Indeed applying DataTable was clashing with the bills extractor. But since DataTable fails, after performing the cleaning, the bills extractor still works. Nonetheless, I think we should review all extractors contexts and block DataTables if (a) data tables don't make sense on some specific page (i.e. exams page) or if there is any chance of conflicts (e.g. bills, which in fact we know in advance it will fail anyway). Will do that tomorrow

I noticed a few issues in some tables, so I reverted the heuristic used.
Instead, it relies in a class used by Sigarra.
failure

Before it was like:
- Clone original table DOM
- Try to apply DataTable
- If fails, try to restaure the original DOM, cloned in first step

The recovery is not that easy because the DataTable can fail at very
different stages. Perhaps, it just added the wrapper containers for the
buttons (csv, excel, ...) and search input. But I found cases where it
goes further and inserts sorting event listeners and the buttons in some
columns. There are also scenarios where it fails in very early stages,
adding just a few classes in 'class' attribute. Thus, it becomes
difficult to ensure a clean DOM like the original one.

With this commit, the procedure is:
- Clone original table
- Try to pre-process and apply DataTable in the cloned version
- If OK, then replace the original table in the DOM with the new DOM
  containing filter/search components, the DataTable itself, etc.

Advantages for the new approach are:
- 100% clean DOM upon failure
- More performant. The DOM in the webpage is manipulated once per table
  on success. In case of failure, theres no change at all saving
rendering/parsing cycles.
changes

- Main issue was infinity scroll querying DOM elements at the
  constructor before the DataTable extractor completing its job.
Therefore, the DOM would modify and the refereces in Infinite Scroll
would be invalid
- Another issue was the loading gif missing after the first load. The
  reason is when the DataTable is updated with new data, the DOM is
reconstructured and the <img> node for the GIF is destroied. The fix is
to attach this element in the DOM everytime.
- This also changes how data is added to an existing table. There is no
  need to destroy the previous table. The API allows to pass the new
elements and then redraw the table. Besides, you can pass HTML elements
as input (i.e. <tr>)
In some pages we know in advance that data table or infinite scroll simply wont work
and could conflict with other extractors. The manifest is updated to
exlude the said extractors in specific pages: bills, exams and time
table
@fabiodrg fabiodrg marked this pull request as ready for review November 7, 2021 23:36
invoked

The '$' selector was creating a new DOM object everytime is invoked.
Imagine an extract that uses '$' to select some DOM objects and modify
the dom. A subsequent use of '$' would create a new DOM from the
original HTML file and previous changes would be lost.
@fabiodrg fabiodrg merged commit 34c77e0 into dev Mar 13, 2022
@fabiodrg fabiodrg deleted the fix/cors-unit-test branch March 13, 2022 20:12
@fabiodrg fabiodrg added this to Done in v4.2 Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
v4.2
  
Done
Development

Successfully merging this pull request may close these issues.

CORS breaks existing unit testing approach
2 participants