Link Archive for year 2020

Suite of projects

Captured using Django application: https://github.com/rumca-js/Django-link-archive
Bookmarked links https://github.com/rumca-js/RSS-Link-Database
daily link archive for the year 2023 https://github.com/rumca-js/RSS-Link-Database-2023
daily link archive for the year 2022 https://github.com/rumca-js/RSS-Link-Database-2022
daily link archive for the year 2021 https://github.com/rumca-js/RSS-Link-Database-2021
daily link archive for the year 2020 https://github.com/rumca-js/RSS-Link-Database-2020

Goal

Archive purposes
Data analysis - possible to verify link rot, etc.
Google sucks at providing results for various topics (dead internet)

Inspirations

I Tracked Everything I Read on the Internet for a Year https://www.tdpain.net/blog/a-year-of-reading.
Automating a Reading List https://zanshin.net/2022/09/11/automating-a-reading-list/
Google Search Is Dying https://dkb.io/post/google-search-is-dying
Luke Smith: Search Engines are Totally Useless Now... https://www.youtube.com/watch?v=N8P6MTOQlyk
Luke Smith: Remember to Consoom Next Content on YouTube https://www.youtube.com/watch?v=nI3GVw2JSEI. As a society we provide news instead of building a data base of important information
Ryan George What Google Search Is Like In 2022 https://www.youtube.com/watch?v=NT7_SxJ3oSI

Data

Daily Data

RSS links are captured for each source separately
two files formats for each day and source: JSON and markdown
markdown file is generated as a form of preview, JSON can be reused, imported
links are bookmarked, but that does not necessarily mean something is endorsed. It shows particular intrest in topic. It is indication of importance. Such links are stored 'forever'

Sources

provided in sources.json file
provides information about sources, like title, url, langugage

Data analysis

With these data we can perform further analysis:

how many of old links are not any longer valid (link rot test)
capture all domains from RSS links (internal, and leading outside?). Analyse which domains are most common
which site generates most entries
we can capture all external links from entries, to capture where these sites lead to (check network effect, etc)
we can verify who reported first on certain topics

Problems, notes

Internet Archive (archive.org) does not provide snapshots for each and every day for all RSS sources. It is sometimes pretty slow. We would like to be sure that a such snapshot takes place. Therefore we need to export links to daily repo ourselves. Django RSS app also makes requests to archive to make the snapshots
Google fails to deliver content of small creators (blogs etc. private pages). Google focuses on corporate hosting. Most common links are towards YouTube, Google maps, Facebook, reddit
We cannot replace Google search
Google provides only 31 pages of news (in news filter) and around 10 pages for ordinary search. This is a very small number. It is like looking through keyhole at the Internet
Link rot is real. My links may be not working after some time
Is the data relevant, or useful for anyone?
Either we would like to record data from 'well established sources' or gather as many links as possible. I think web engines do it? We cannot gather too much data, as it can destroy our potato servers.
there are other RSS solutions like 'feedly', but it is an app, not data. You cannot parse it, you do not own the data, you can only do things that feedly allows you to do

Ending notes

All links belong to us!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
2020		2020
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2020

2020

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Link Archive for year 2020

Suite of projects

Goal

Inspirations

Data

Daily Data

Sources

Data analysis

Problems, notes

Ending notes

About

Releases

Packages

License

rumca-js/RSS-Link-Database-2020

Folders and files

Latest commit

History

Repository files navigation

Link Archive for year 2020

Suite of projects

Goal

Inspirations

Data

Daily Data

Sources

Data analysis

Problems, notes

Ending notes

About

Topics

Resources

License

Stars

Watchers

Forks