Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need Help Adding Publisher and Retrieving Keywords: Assistance with CH Class Modification #525

Open
hamz0640 opened this issue May 19, 2024 · 1 comment
Labels
question Further information is requested

Comments

@hamz0640
Copy link

Question

Hey there,

I'm trying to add a publisher to the CH class, and the publisher I'm attempting to add is aargauerzeitung.ch. However, I'm encountering an issue with retrieving the topics or keywords. Here's the code for the class:

import datetime
from typing import List, Optional

from lxml.cssselect import CSSSelector

from fundus.parser import ArticleBody, BaseParser, ParserProxy, attribute
from fundus.parser.utility import (
extract_article_body_with_selector,
generic_author_parsing,
generic_date_parsing,
)

class AARGAUERZEITUNGParser(ParserProxy):
class V1(BaseParser):
_paragraph_selector = CSSSelector(
"p.headline__lead")

    @attribute
    def body(self) -> ArticleBody:
        return extract_article_body_with_selector(
            self.precomputed.doc,
            paragraph_selector=self._paragraph_selector,
        )

    @attribute
    def publishing_date(self) -> Optional[datetime.datetime]:
        return generic_date_parsing(self.precomputed.ld.bf_search("datePublished"))

    @attribute
    def authors(self) -> List[str]:
        return generic_author_parsing(self.precomputed.ld.bf_search("author"))

    @attribute
    def topics(self) -> List[str]:
        return generic_topic_parsing(self.precomputed.meta.get("keywords"))

    @attribute
    def title(self) -> Optional[str]:
        return self.precomputed.meta.get("og:title")

    @attribute
    def authors(self) -> List[str]:
        return generic_author_parsing(self.precomputed.ld.bf_search("author"))

Here's an example of an article from the publisher: https://www.aargauerzeitung.ch/aargau/baden/baden-kreativitaet-liegt-in-der-familie-schwester-von-esc-superstar-nemo-hat-kampagne-fuer-das-grand-casino-baden-produziert-ld.2619716?reduced=true

This is the current code structure for adding a publisher:

Appenzeller_Zeitung = PublisherSpec(
name="Appenzeller Zeitung",
domain="https://www.appenzellerzeitung.ch/",
sources=[
RSSFeed("https://www.appenzellerzeitung.ch/schweiz.rss"),
NewsMap(" https://www.appenzellerzeitung.ch/sitemap.xml"),
Sitemap("https://www.appenzellerzeitung.ch/sitemap.xml"),
],
parser=APPENZELLERZEITUNGParser
)

I appreciate any help or guidance you can provide!

@hamz0640 hamz0640 added the question Further information is requested label May 19, 2024
@MaxDall
Copy link
Collaborator

MaxDall commented May 20, 2024

Hey @hamz0640 thanks for the work and trying to add a new publisher 👍

Judging by the link you send me you're using the wrong key and data source.

Try this instead:

@attribute
def topics(self) -> List[str]:
    return generic_topic_parsing(self.precomputed.ld.bf_search("keywords"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants