New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consider adding some of these misc links to books/articles/papers/etc #29
Comments
Hey @mortdeus. I'm updated our our readme to include your links (issues #25, #26, #27, #28) in this pr #31. Super thankful! This list is also great, and though we're not exactly sure what to do with non-papers as of yet, I took a subset of these and included them in a wiki-page for further expansion, as well as a link to the page from the readme. We'd definitely add some more as well! |
Lol, soon were going to need something like a Dewey Decimal System to keep everything organized. For example I wasn't even aware you guys already had the CSP paper included when I posted the link above. Any ideas? |
The reorganizing/sorting/categorizing issue(s) is something we, @papers-we-love/owners, have been discussing for awhile. We plan to add a script(s) and/or hook to help w/ naming (from a pdf's title) and de-duping. And, we're exploring some other options to deal w/ the organizing, which is definitely a larger issue. No fully-complete answers yet, but we're def. on the same wavelength :). |
You guys also need to consider how to address vetting links to papers to make sure they honor the copywrite license. For example we have to be vigilant and not just allow anybody to post a direct link to a pirated pdf version of the Dragonbook (aka Compilers: Principles, Techniques, and Tools) hosted and shared from their personal Google Drive storage. Also have you guys made sure you are honoring the licenses of the papers being distributed in this repo? |
For example consider the license terms for /distributed_systems/the_google_file_system.pdf
If the author's sent github a DMCA takedown notice, I assume the whole git repo (probably forks too) would have to go down with it. |
Totally agree and good catch. Admittedly, there was a PR that was merged since everything started really taking off and that we needed to vet more throughly. I was planning on doing that this weekend. I am working on a |
But, you are right, we really must be more tactful in our approach. Obviously, we'd love to add copyright check/parsing into our automated process. |
That paper's been removed from the history. Contributing update and audit is on the way. |
I'd put a Readme.MD in each folder with links to papers we don't have permission to distribute directly. Similar to the way I posted and formatted the above links in my initial issue post. |
I'm thinking it'd be better to just remove them and then take a better approach going forward, no? Or do you mean we should keep a README as more of an audit trail, after we remove these papers? |
IMHO, I think it would be better to get rid of the pdfs and add their links to a md file that best categorizes them. For example, say I have 5 hyperlinks to papers related specifically to operating systems. The first paper talks about generic operating system architecture design. The way I set this up in my google drive fs, is I make a folder called This is basically the same general approach I would recommend you guys take except each folder has a README.md which we insure conforms to a special format layout we have specifically defined so we can build automated tools that perform various different tasks on the file hierarchy. (one tool would be a crawler bot that looks for urls in the README.md files, which it can then test if the link is still valid, etc) Also, another benefit of using URLs instead of pdfs is the fact that an automated dedup tool could reliably assume that identical links reference the same paper, even when submitted by two different contributors. However if the same tool was trying to look for duplicate pdfs by it's filename, the tool wont be able to tell, strictly by name, whether or not |
All interesting points - dedup: we can compare sha1's of a PDF, not perfect but better than comparing filenames. In combination with filenames should be fairly robust. copyright - agree that we need to be vigilant about copyright, but also want papers to be as accessible as possible. Would rather err on the side of having less papers with clear copyright than more papers than we can audit with murky licensing status. This is why all PRs now require at least two +1s and if there is a question we can require more thorough auditing. The core focus (for now) of the repo is to provide access to foundational computer science papers that are alluded to often, but difficult to find. Linking to foundational books is cool, if they're just links and we can make some effort to check their legal status before dropping them into a wiki page. |
100% agreed @DarrenN. |
I will stay that we're planning to draft-up a combination plan going forward @mortdeus. We'll be auditing the current set of papers. I still think there's a good amount that can stay in the repo and warrant staying there. For those in murkier territory or just allow for the URL link, we'll take something like your approach to handle those cases, which should create a good balance of resources. |
Closing this for now, but we can continue discussion if need be. |
While these aren't all technically "research papers", they are still amazing, exotic and free educational reading resources i've collected over the last few years, so I'll link them just in case you guys want to display some of them.
Operating Systems: Three Easy Pieces
Communicating Sequential Processes (CSP)
Unixca's historical links archive
Programming UNIX Sockets in C - Frequently Asked Questions
C Craft
Ken Thompson Q&A
The Cathedral and the Bazaar
What every programmer should know about memory
Linux Technology Reference
1024cores
switch
command center
asktog
The Art of Unix Programming
stitz zeager math books
That should be enough for right now. I still have a bunch more, so if you guys want me to post more interesting links just let me know you guys are interested via a reply in this issue's comments.
The text was updated successfully, but these errors were encountered: