Notes on files in the example directory...
-
top_n_items_redis writes the json data from the hackernews api to redis
-
redis_to_file writes the json data from redis to a file
-
story_ids writes just the ids to a file
-
story_title_delete removes the ids that do not have a title
-
story_tile creates a file with lines of json including the id and title
The redis data are dump.rdb files created via top_n_items_redis
We need to reprocess everything next time and remove these cases along with making sure that the stories we store in Redis has a title
For now I will attempt to hand remove from Redis the following ids that look like this....
https://hacker-news.firebaseio.com/v0/item/21948540.json?print=pretty
https://hacker-news.firebaseio.com/v0/item/21949067.json?print=pretty
https://hacker-news.firebaseio.com/v0/item/21949136.json?print=pretty
https://hacker-news.firebaseio.com/v0/item/21949339.json?print=pretty
Next time through on the processing these IDs should not be in there.