HtmlExctractors returns more characters than expected #66

evans · 2023-09-07T22:24:13Z

First, thanks for creating a wonderful library! I ran into a bug where HtmlExctractors creates a message with excess characters when html is used within a prop. The following script returns more than the expected: Some good text. <a href="example.com">Learn more</a>.

import { GettextExtractor, HtmlExtractors } from 'gettext-extractor';

  const markupExtractor = new GettextExtractor();
  markupExtractor
    .createHtmlParser([HtmlExtractors.elementContent('[translated]', {})])
    .parseFilesGlob('**/*.js', undefined, {});

  markupExtractor.getMessages().forEach((message) => {
    console.log(message.text);
  });

The parsed file

const Text = ({ children }) => <div>{children}</div>;
const Container = ({ children, secondaryText }) => (
  <div>
    {children}
    {secondaryText}
  </div>
);

const Parent = () => {
  return (
    <Container
      secondaryText={
        <Text translated>
          Some good text. <a href="example.com">Learn more</a>.
        </Text>
      }
      maxlength={25}
    />
  );
};

The text was updated successfully, but these errors were encountered:

evans · 2023-09-07T23:03:56Z

Looks like the issue lies in parse5, since it returns a node that includes more child nodes for Text than expected. I'm guessing this sort of usage isn't expected, so if you have advice for mixing this sort of html/jsx extraction, I'd love to hear your thoughts!

gettext-extractor/src/html/parser.ts

Line 15 in c5b19e9

let document = parse5.parse(source, {sourceCodeLocationInfo: true});

lukasgeiter · 2023-09-08T13:46:39Z

The extractor is made for HTML not JSX. I'm also not really sure what you would expect.
If you only want Some good text extracted, why don't you do <Text translated>Some good text</Text> instead?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HtmlExctractors returns more characters than expected #66

HtmlExctractors returns more characters than expected #66

evans commented Sep 7, 2023

evans commented Sep 7, 2023

lukasgeiter commented Sep 8, 2023

HtmlExctractors returns more characters than expected #66

HtmlExctractors returns more characters than expected #66

Comments

evans commented Sep 7, 2023

evans commented Sep 7, 2023

lukasgeiter commented Sep 8, 2023