Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider adding some of these misc links to books/articles/papers/etc #29

Closed
mortdeus opened this issue Mar 6, 2014 · 15 comments
Closed

Comments

@mortdeus
Copy link

mortdeus commented Mar 6, 2014

While these aren't all technically "research papers", they are still amazing, exotic and free educational reading resources i've collected over the last few years, so I'll link them just in case you guys want to display some of them.

Operating Systems: Three Easy Pieces
Communicating Sequential Processes (CSP)
Unixca's historical links archive
Programming UNIX Sockets in C - Frequently Asked Questions
C Craft
Ken Thompson Q&A
The Cathedral and the Bazaar
What every programmer should know about memory
Linux Technology Reference
1024cores
switch
command center
asktog
The Art of Unix Programming
stitz zeager math books

That should be enough for right now. I still have a bunch more, so if you guys want me to post more interesting links just let me know you guys are interested via a reply in this issue's comments.

@zeeshanlakhani
Copy link
Member

Hey @mortdeus. I'm updated our our readme to include your links (issues #25, #26, #27, #28) in this pr #31. Super thankful! This list is also great, and though we're not exactly sure what to do with non-papers as of yet, I took a subset of these and included them in a wiki-page for further expansion, as well as a link to the page from the readme.

We'd definitely add some more as well!

@mortdeus
Copy link
Author

mortdeus commented Mar 6, 2014

Lol, soon were going to need something like a Dewey Decimal System to keep everything organized. For example I wasn't even aware you guys already had the CSP paper included when I posted the link above.
If we don't figure something out soon we will run into the situation I have in my Google drive PDF mega-library. (Having to spend hours reorganizing/sorting/cateorgizing/etc)

Any ideas?

@zeeshanlakhani
Copy link
Member

The reorganizing/sorting/categorizing issue(s) is something we, @papers-we-love/owners, have been discussing for awhile. We plan to add a script(s) and/or hook to help w/ naming (from a pdf's title) and de-duping. And, we're exploring some other options to deal w/ the organizing, which is definitely a larger issue.

No fully-complete answers yet, but we're def. on the same wavelength :).

@mortdeus
Copy link
Author

mortdeus commented Mar 6, 2014

You guys also need to consider how to address vetting links to papers to make sure they honor the copywrite license. For example we have to be vigilant and not just allow anybody to post a direct link to a pirated pdf version of the Dragonbook (aka Compilers: Principles, Techniques, and Tools) hosted and shared from their personal Google Drive storage.

Also have you guys made sure you are honoring the licenses of the papers being distributed in this repo?

@mortdeus
Copy link
Author

mortdeus commented Mar 6, 2014

For example consider the license terms for /distributed_systems/the_google_file_system.pdf

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SOSP’03, October 19–22, 2003, Bolton Landing, New York, USA. Copyright 2003 ACM 1-58113-757-5/03/0010 ...$5.00.

If the author's sent github a DMCA takedown notice, I assume the whole git repo (probably forks too) would have to go down with it.

@zeeshanlakhani
Copy link
Member

Totally agree and good catch. Admittedly, there was a PR that was merged since everything started really taking off and that we needed to vet more throughly. I was planning on doing that this weekend. I am working on a Contributing.md file, as per #19, that will also explicitly mention this.

@zeeshanlakhani
Copy link
Member

But, you are right, we really must be more tactful in our approach. Obviously, we'd love to add copyright check/parsing into our automated process.

@zeeshanlakhani
Copy link
Member

That paper's been removed from the history. Contributing update and audit is on the way.

@mortdeus
Copy link
Author

mortdeus commented Mar 6, 2014

I'd put a Readme.MD in each folder with links to papers we don't have permission to distribute directly. Similar to the way I posted and formatted the above links in my initial issue post.

@zeeshanlakhani
Copy link
Member

I'm thinking it'd be better to just remove them and then take a better approach going forward, no? Or do you mean we should keep a README as more of an audit trail, after we remove these papers?

@mortdeus
Copy link
Author

mortdeus commented Mar 6, 2014

IMHO, I think it would be better to get rid of the pdfs and add their links to a md file that best categorizes them.

For example, say I have 5 hyperlinks to papers related specifically to operating systems.

The first paper talks about generic operating system architecture design.
The next 2 paper's topic is specifically related to Linux's design,
Then the last 2 papers are related to plan9, but the last is specific to the plan9 derived inferno operating system.

The way I set this up in my google drive fs, is I make a folder called /os, and then I make the folders /os/unix/, /os/unix/linux /os/plan9/ /os/plan9/inferno etc. Then I just put all my pdfs where they belong.

This is basically the same general approach I would recommend you guys take except each folder has a README.md which we insure conforms to a special format layout we have specifically defined so we can build automated tools that perform various different tasks on the file hierarchy. (one tool would be a crawler bot that looks for urls in the README.md files, which it can then test if the link is still valid, etc)

Also, another benefit of using URLs instead of pdfs is the fact that an automated dedup tool could reliably assume that identical links reference the same paper, even when submitted by two different contributors. However if the same tool was trying to look for duplicate pdfs by it's filename, the tool wont be able to tell, strictly by name, whether or not/foo/$TITLE.pdf and /bar/$TITLE.pdf are references to the same paper. Anytime the tool finds two pdfs that share the same $TITLE, it would have to compare the pdf's contents before the tool's automated removal of files it suspects are dupes is reliable enough for us to trust.

@DarrenN
Copy link
Contributor

DarrenN commented Mar 7, 2014

All interesting points -

dedup: we can compare sha1's of a PDF, not perfect but better than comparing filenames. In combination with filenames should be fairly robust.

copyright - agree that we need to be vigilant about copyright, but also want papers to be as accessible as possible. Would rather err on the side of having less papers with clear copyright than more papers than we can audit with murky licensing status. This is why all PRs now require at least two +1s and if there is a question we can require more thorough auditing.

The core focus (for now) of the repo is to provide access to foundational computer science papers that are alluded to often, but difficult to find. Linking to foundational books is cool, if they're just links and we can make some effort to check their legal status before dropping them into a wiki page.

@zeeshanlakhani
Copy link
Member

100% agreed @DarrenN.

@zeeshanlakhani
Copy link
Member

I will stay that we're planning to draft-up a combination plan going forward @mortdeus. We'll be auditing the current set of papers. I still think there's a good amount that can stay in the repo and warrant staying there. For those in murkier territory or just allow for the URL link, we'll take something like your approach to handle those cases, which should create a good balance of resources.

@zeeshanlakhani
Copy link
Member

Closing this for now, but we can continue discussion if need be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants