Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add character encoding detection and decoding logic #206

Merged
merged 10 commits into from Feb 15, 2024

Conversation

cm-dyoshikawa
Copy link
Contributor

@cm-dyoshikawa cm-dyoshikawa commented Feb 2, 2024

Resolve #199

  • implement
  • Add test cases

@jshemas
Copy link
Owner

jshemas commented Feb 3, 2024

Looks like this causes a bunch of unit tests to fail. Can you look into that?

@dyoshikawa
Copy link
Contributor

@jshemas Thank you for your comment. Sure, I'm looking into those.

@cm-dyoshikawa cm-dyoshikawa marked this pull request as ready for review February 3, 2024 03:05
@cm-dyoshikawa
Copy link
Contributor Author

cm-dyoshikawa commented Feb 3, 2024

@jshemas The result of npm run mocha:unit has now become successful.

@jshemas
Copy link
Owner

jshemas commented Feb 7, 2024

Hello, can also look into why the integration tests are failing? npm run mocha:int (You can just focus on the failing tests in encoding.spec.ts )

@cm-dyoshikawa
Copy link
Contributor Author

Hello, can also look into why the integration tests are failing? npm run mocha:int (You can just focus on the failing tests in encoding.spec.ts )

Sorry, I worked on unit tests only. I'll also check on that.

@cm-dyoshikawa
Copy link
Contributor Author

@jshemas

I have confirmed that the following test execution command passes all tests:

npx ts-mocha --recursive \"./tests/integration/encoding.spec.ts\" --timeout 10000

The errors that were occurring were due to files intended to be in euc-jp or Shift_JIS being encoded in utf-8 on openGraphScraperPages.

I have converted these files to the expected character encoding and temporarily hosted them on my github.io for testing purposes, so please check it out.

I have created a separate Pull Request regarding this issue.

Once it is merged into openGraphScraperPages, I will make further adjustments to this test.

@cm-dyoshikawa
Copy link
Contributor Author

@jshemas

https://github.com/jshemas/openGraphScraper/actions/runs/7810736250/job/21348174777

  109 passing (19s)
  7 pending
  1 failing

  1) fetch
       setting a timeout:

      AssertionError: expected 'Page not found' to deeply equal 'The operation was aborted due to time…'
      + expected - actual

      -Page not found
      +The operation was aborted due to timeout
      
      at /home/runner/work/openGraphScraper/openGraphScraper/tests/integration/fetch.spec.ts:48:33

Using response.clone() seemed to cause this test to fail.

Therefore, I made changes to avoid using clone() without altering the behavior. This resolved the error.

@cm-dyoshikawa
Copy link
Contributor Author

@jshemas I would be happy if you could rerun the test. Sorry to bother you while you're busy.

@jshemas
Copy link
Owner

jshemas commented Feb 15, 2024

Thanks for fixing this issue! PR looks good to me.

@jshemas jshemas merged commit c487da6 into jshemas:master Feb 15, 2024
1 of 3 checks passed
@jshemas
Copy link
Owner

jshemas commented Feb 15, 2024

Changes are live in open-graph-scraper@6.4.0

@cm-dyoshikawa cm-dyoshikawa deleted the charset-issue branch February 15, 2024 04:44
@cm-dyoshikawa
Copy link
Contributor Author

@jshemas Thank you for checking and merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fail to detect charset from certain ShiftJIS page
3 participants