Skip to content
This repository has been archived by the owner on Aug 28, 2019. It is now read-only.

Add citation suggestion to README.md #2503

Closed
dhcodes opened this issue Oct 21, 2017 · 28 comments
Closed

Add citation suggestion to README.md #2503

dhcodes opened this issue Oct 21, 2017 · 28 comments

Comments

@dhcodes
Copy link
Contributor

dhcodes commented Oct 21, 2017

I worry some of the new content directly plagiarizes other sites. IMHO, we should work on a recommended way to cite other sources and discourage unattributed copying/pasting.

Thoughts?

@bryanchapel
Copy link
Contributor

Haha! I tweeted Quincy about this just yesterday and already put in a pull request for on particular article I found. This is how I did it: https://github.com/freeCodeCamp/guides/pull/2337/files?short_path=230a905#diff-230a9052be3f27a5607aea2debfbf534

@dhcodes
Copy link
Contributor Author

dhcodes commented Oct 21, 2017

I like it @bryanchapel; Good work! We need to write it into the README assuming others agree. Would you like to do that once everyone has a chance to comment?

@bryanchapel
Copy link
Contributor

Can do!

bryanchapel added a commit to bryanchapel/guides that referenced this issue Oct 21, 2017
 Update README.md with content attribution policy per issue freeCodeCamp#2503
@bryanchapel
Copy link
Contributor

Since I'm kind of new to this, about how long should you wait before making a decision on an issue? I made a change to the README and referenced this issue in the commit above to help the discussion along. Let me know how it looks. :)

@davcri
Copy link
Contributor

davcri commented Oct 21, 2017

I agree whit all you said!

Only one note: we should also pay attention to license and term of service. For example someone opened a pull request with copy pasted text from Quora that has this long ToS. I don't know if this is legal, but one thing is sure: this is not ethical!

@davcri
Copy link
Contributor

davcri commented Oct 21, 2017

I found another PR Algorithm: Add AVL Tree Article with content copied from tutorialspoint.

It's a sad situation and it's difficult to not discourage contributors, but we are trying to create a community content, not a copy-paste website.

@lvcoulter
Copy link
Contributor

lvcoulter commented Oct 21, 2017

Not sure if this conversation is open, but I'm throwing in my thoughts. I wrote my first article last night. I was also an English teacher in my last incarnation, so I struggled with how to write concise content without it sounding exactly like my favorite resources. I opened several sources on the same topic and read them all, then closed them and wrote all my thoughts down without looking at any. That might be a good suggestion for your README. I did find that my own voice came out very similar to that favorite I mentioned but I I made sure to change up the examples and insert my own examples. Hope this helps.

@Ethan-Arrowood
Copy link
Member

college student perspective: My university classes are all governed by very strict no-plagiarism rules. I think the guides should reflect similar values as other content creators work hard to produce their work and if we are not giving them proper citation then that is extremely unfair.

We could ask everyone to use something like APA citations (its already a scientific standard). This would not only benefit in making the guide more professional, but also provide valuable SEO link-backs to creators.

@bryanchapel
Copy link
Contributor

@Ethan-Arrowood Agreed. Didn't even think of the SEO benefits of this, haha!

@lvcoulter I think you're highlighting the difference between paraphrasing what you've researched and learned, and directly quoting. I think that's totally appropriate. My only suggestion would be to collect some of those references and stick them in a ### Other Resources section near the end of the doc so readers can further explore the topics.

@davcri I think the most important piece of Quora's ToS is this:

Subject to these Terms, Quora gives you a worldwide, royalty-free, revokable, non-assignable and non-exclusive license to re-post any of the Content on Quora anywhere on the rest of the web provided that the Content was added to the Service after April 22, 2010, and provided that the user who created the content has not explicitly marked the content as not for reproduction, and provided that you: (a) do not modify the Content; (b) attribute Quora by name in readable text and with a human and machine-followable link (an HTML anchor tag) linking back to the page displaying the original source of the content on http://quora.com on every page that contains Quora content; (c) upon request, either by Quora or a user, remove the user's name from Content which the user has subsequently made anonymous; (d) upon request, either by Quora or by a user who contributed to the Content, make a reasonable effort to update a particular piece of Content to the latest version on http://quora.com; and (e) upon request, either by Quora or by a user who contributed to the Content, make a reasonable attempt to delete Content that has been deleted or marked as not for reproduction on quora.com.

Also a good point I didn't think of. User's should be away of a resource's ToS and the concepts of fair use.

I'll add these suggestions to the README changes I'm proposing in the commit referenced above. Good discussion all!

@bryanchapel
Copy link
Contributor

bryanchapel commented Oct 21, 2017

On APA vs MLA formatting for citations, I agree that we could use APA since it's typically the scientific standard. I do like that the MLA specifies a "Date Accessed" for the citation. I think that's really helpful for possibly spotting references that may have changed or gone out of date since a topic was added to the guide, and we can amend/update these as needed.

Also, listing the link in the citation, as APA recommends, is a bit redundant as we should be creating a link to the resource itself in the markdown. I think that convention applies more to print/non-web citations.

I think the best format for our purposes would look something like this:

Author Last Name, Author First Name (if listed). "Article Title." Publication. Publisher. Date Published(if listed). Date Accessed.

And in the markdown it would look like:
[Author Last Name, Author First Name. "Article Title." *Publication.* Publisher. Date Published. Date Accessed.](https://LinkToSource.com)

Maybe we could do a bit of both? Thoughts?

@davcri
Copy link
Contributor

davcri commented Oct 21, 2017

@QuincyLarson @Bouncey @HKuz @timo (I'm tagging top contributors, they are more experienced than me): can you please give feedback ? I think this is a delicate subject.

@dhcodes
Copy link
Contributor Author

dhcodes commented Oct 21, 2017

I think direct copy/paste should generally be discouraged unless directly quoted and integral to the article. Paraphrasing, which is what @lvcoulter is doing, it a-okay by me assuming we cite sources. As mentioned, I like @bryanchapel's approach as it's similar to Wikipedia. We could even model wikipedia's citation format if we wanted. I'm not sure it matters truly if it's APA or MLA as long as we give credit where credit is due.

@QuincyLarson
Copy link
Contributor

QuincyLarson commented Oct 22, 2017

@dhcodes Sorry I'm so late to this thread.

Here are my thoughts on this: by forcing contributors to abide by a style guide, we're making it harder to contribute. Such a style guide should instead by enforced through an automated script. Just like we use ESLint for our JavaScript, we should use a style checker for our citations.

And we should tackle plagiarism the same way: by running a build script.

That way, if the build task detects what might be plagiarism, a human can look at it and make sure it's properly attributed.

Here's a library that does this. It hasn't been touched in a couple years, but we might be able to make it work. It's in Python, so @Ethan-Arrowood might be a good candidate for testing it out and seeing if we can get it running and incorporated into TravisCI: https://github.com/architshukla/Plagiarism-Checker

Again, my sentiment is we should put up as few rules and as few impediments to contributing as absolutely necessary. And those rules should be enforced at the CI-level that's transparent and consistent.

@QuincyLarson
Copy link
Contributor

@davcri Thanks for spotting that case of clear plagiarism. I've closed that contributor's pull request and also reverted another PR from them that I spotted which had plagiarism. I gave him a stern one-time warning (the notion of plagiarism is less familiar in some parts of the world and I gave him the benefit of the doubt).

If we spot people plagiarizing, we should give them a one-time warning that they will be banned from contributing to the freeCodeCamp GitHub organization if they're caught again, and we should refer them to the Academic Honesty Policy: https://www.freecodecamp.org/academic-honesty

@Ethan-Arrowood
Copy link
Member

Ethan-Arrowood commented Oct 22, 2017

I like the idea of a plagiarism-checker. The python module @QuincyLarson linked is now broken due to the Google API it was using being deprecated. Furthermore, it would be easier to run a Node script through the Travis Build anyways. . . So I propose we add a Plagiarism-Checker Node.JS script as a down-the-road feature. However, at the moment I am way too busy to start this project. I have a lot on my plate including interview prep, university work, and personal projects (started my own OS project this week). If no one else wants to take up the lead on this I can create a blank repo and begin work in a few months once my life calms down a bit.

In the meantime I think the best course of action would be to write a CONTRIBUTING.md that highlights the basics to contributing as well as some additional details such as our stance on plagiarism and citations. Here is a good resource (includes examples) on how to properly set up a CONTIBUTING.md file.

@bryanchapel
Copy link
Contributor

bryanchapel commented Oct 22, 2017

I agree about the checker script as well. There is a Node version by Copyleaks (https://www.npmjs.com/package/plagiarism-checker) that we might be able to use. I also think that writing one from scratch might not be that hard.

You could use the request-promise and cheerio libraries to send chunks of the committed text to Google, then parse the first 10 or so results and check the text chunk against it for a fuzzy match. If there's, say, a 60% or something similarity, the PR gets flagged as needing review. Everything I've found so far was the first hit returned by Google when I copied and pasted parts of the article. See this article on unconscious bias as an example. This user might also need a warning, as outlined by @QuincyLarson above? I put in a PR to fix their issue with citations already.

At any rate, I added a note about the Academic Honesty Policy to my README commit, in addition to the stuff I added about proper attribution. Let me know how this looks, or if it should be pulled out into a separate CONTRIBUTING file as @Ethan-Arrowood suggests. Might even be best to mention it in both places just so it's clear and people don't have an excuse to say "I didn't see that guideline".

@davcri
Copy link
Contributor

davcri commented Oct 23, 2017

@bryanchapel did you already make a PR with your updated README ? If not, can you make it ? In this way we can discuss it (for me it's almost all right, I have only a doubt in using HTML tags vs markdown).

I vote for writing about the Honesty Policy in both the README and the CONTRIBUTING files.

I also opened a new issue to discusse about adding a plagiarism check inside the Travis Build #3315

@bryanchapel bryanchapel mentioned this issue Oct 23, 2017
@bryanchapel
Copy link
Contributor

Just made the PR. #3371. This is just for the README. Didn't do anything for CONTRIBUTING.

@dhcodes
Copy link
Contributor Author

dhcodes commented Oct 23, 2017

I added some small edits.

@QuincyLarson
Copy link
Contributor

@bryanchapel @dhcodes I've merged your edits! Thanks! We should mirror this in CONTRIBUTING to make sure people see it. Then I believe we can close this issue.

@QuincyLarson
Copy link
Contributor

@bryanchapel Nice find on the plagiarism checker! Yes - we would absolutely love your help implementing this.

Seeing that @Ethan-Arrowood is a bit busy at the moment, and has determined that the Python library definitely doesn't work, you're now our only hope on this.

QuincyLarson pushed a commit that referenced this issue Oct 24, 2017
* Update README.md with content attribution policy.

 Update README.md with content attribution policy per issue #2503

* Adding a link to the academic honesty policy.

* suggested edits to attribution section
@dhcodes
Copy link
Contributor Author

dhcodes commented Oct 24, 2017

I've looked into this a bit more and assuming we use a comparison search via a search engine (google or bing), we may need to limit the test to only files changed in the PR since the free plan for Google Custom Search now limits you to 100 queries/day. I've looked for alternatives, but there aren't many--Bing also has removed their free plan.

I know Jest can run tests only on changed files, but I'm looking at alternatives as well. I'm not sure if this is a setting on Travis. Still researching.

@QuincyLarson
Copy link
Contributor

@dhcodes Yes - I agree. We should only test files changed in the PR.

@Ethan-Arrowood pointed out that we might want this to be part of our pre-commit step, so that we can point out possible plagiarism to the contributor before it even gets committed. Then if the contributor thinks there's a false positive, they could run the commit task again with --not-plagiarism and it would skip this step, but add a note to the commit description like "plagiarism check skipped" so we'd know to eye-ball their contribution for anything suspicious before accepting the PR.

@Bouncey
Copy link
Member

Bouncey commented Oct 27, 2017

@dhcodes Travis can run any script you give it, we just have to write it. If any check we make fails you can process.exit(1) and Travis will mark the commit as build failed

@QuincyLarson This check would be better as a Travis check due to the amount opf PR's coming via the GitHub GUI. Pre-commit hooks only work when committing locally.

@dhcodes
Copy link
Contributor Author

dhcodes commented Oct 27, 2017

@Bouncey yeah I think based on what everyone has said, it may be best to go the PR bot route. I'm currently working on making one in probot, but I'm slow so if someone else wants to give it a go, by all means, go for it.

@dhcodes
Copy link
Contributor Author

dhcodes commented Oct 30, 2017

I probably wasn't clear enough in my last post. If anyone wants to work on this, consider it open. There's no assurance that I'll get anything working and there are many other skilled programmers out there who could probably whip something up faster than I.

@FrancesCoronel
Copy link
Contributor

Just to note, there are some PRs that reference this issue but in the interest of maintaining positive contributions, I am marking the PRs that have the majority of the content copied and pasted from external websites as invalid and closing them.

To restate what @davcri has said and what I ultimately agree with, "we are trying to create community content, not a copy-paste website".

@davcri
Copy link
Contributor

davcri commented Nov 2, 2017

@dhcodes could you provide some insights of the PR bot here #3315 ? I would like to know more about them.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants