-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RSS Feed loader #942
RSS Feed loader #942
Conversation
loader = LangchainRSSFeedLoader(urls=[url]) | ||
data = loader.load() | ||
|
||
for entry in data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be really slow if the RSS feed is quite long. You might want to use multithreading to speed up the loader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it will update this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sidmohanty11 Seems like substack loader is calling rss loader in its load function. Any particular reason why we need both? Am I missing something?
You're 100% correct, as of now this is incomplete. We need to figure out a way to get all RSS feed posts for the substack URL. Then we can map it through from there. Else scrape the remaining posts and add it here |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #942 +/- ##
==========================================
- Coverage 64.33% 63.75% -0.58%
==========================================
Files 118 120 +2
Lines 4405 4448 +43
==========================================
+ Hits 2834 2836 +2
- Misses 1571 1612 +41 ☔ View full report in Codecov by Sentry. |
d61d357
to
f90a1c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
|
||
@register_deserializable | ||
class RSSFeedChunker(BaseChunker): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use common chunker instead (I created one earlier). Keep the default chunk_size=2000
Description
This PR adds support for loading RSS feeds directly from Embedchain.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Output,
Checklist: