Introducing the "update" module: Enabling temporal analysis #11
Replies: 1 comment
-
Work-in-Progress Update (April 2024)Initially, we considered designing the module with a function that takes an While multithreading can improve processing speed for large datasets by enabling concurrent task execution, the QPM limit acts as a bottleneck, constraining the overall throughput regardless of the number of threads used. Therefore, we focused on optimising individual queries and processing logic to maximise efficiency within the API constraints. To achieve this, we implemented an automatic update time interval calculation based on the number of non-archived rows:
By dynamically adjusting the update interval based on the dataset size, we aim to optimise resource utilisation while adhering to the API's QPM limit, ensuring efficient and reliable data retrieval and processing. |
Beta Was this translation helpful? Give feedback.
-
We're proposing the introduction of a new "update" module that will streamline and automate the process of updating crucial metrics for existing submissions and comments.
The primary goals of this module are:
upvote ratio
,score
,awards
, andnumber of comments
for submissions in our database.score
for comments in our database.A key advantage of this update module is the ability to track how various metrics, such as the upvote ratio or score, change over time for specific posts. This capability sets RedditHarbor apart from many other Reddit database resources, such as PushShift or Academic Torrents, which typically provide a static "snapshot" of submissions and comments at a random point in time.
By incorporating temporal analysis into our platform, researchers and data analysts can gain valuable insights into the evolution of Reddit content, enabling them to study dynamic patterns, identify trends, and uncover relationships that might otherwise be obscured in a static dataset.
We envision that the "update" module will open up new avenues for research and analysis, further solidifying RedditHarbor's position as a comprehensive and powerful resource for exploring the rich tapestry of Reddit data.
We welcome your feedback, suggestions, and contributions to make this module as robust and effective as possible, and to collectively shape the future direction of RedditHarbor!
Beta Was this translation helpful? Give feedback.
All reactions