Skip to content
This repository has been archived by the owner on May 19, 2021. It is now read-only.

Security/Safety "Best Practices" for rOpenSci Package Developers/Reviewers #35

Open
hrbrmstr opened this issue Apr 24, 2018 · 16 comments
Open

Comments

@hrbrmstr
Copy link

We've done a bit of this ad-hoc, but we could spend some dedicated cycles ensuring that rOpenSci not only has the best technical and maintenance standards — which it most certainly does — but is also the de-facto standard to replicate when considering safety/security.

@elinw
Copy link

elinw commented Apr 25, 2018

How are you thinking about safety/security? I think this is a great concept.

@karthik
Copy link
Member

karthik commented Apr 25, 2018

We discuss this regularly in our staff channels and would be super grateful for your advice/help on this! cc @maelle

@maelle
Copy link
Member

maelle commented Apr 25, 2018

We'd like to link to https://ropenscilabs.github.io/r-security-practices/ whenever it's ready. Just sayin' 👼

@mmulvahill
Copy link

@hrbrmstr I'm interested in learning more about how to think about security/safety w.r.t. R. That's all I have to add for now 😉

@hrbrmstr
Copy link
Author

hrbrmstr commented May 15, 2018

I somehow missed the comment 20d ago @elinw (apologies). https://github.com/hrbrmstr/rpwnd provides some context for the evil one can do with R and https://ropenscilabs.github.io/r-security-practices/ (which @stephlocke penned and @maelle noted) has a great start for that and other topics.

Packages with embedded other-lang libraries need care & feeding and some way to inform users they are in need of an update. Package authors may be putting vulnerable researchers (some who may not even know they fit that classification) and users in harms way without even knowing it depending on what type of internet calls they make or system traces they leave around.

We also started work last year on a way to help ensure package download safety (https://ropensci.org/blog/2017/07/25/notary/) but all of us who worked on it have been super busy and even if we weren't, it's somewhat moot b/c there's no backing infrastructure for it nor support in R itself for it (which is where it'd need to be).

One thing from the notary work that'd be an interesting "mandate" from rOpenSci is the requirement that all contributors use PGP and sign all commits and no GH merges or releases happen w/o that. Since R has no way for us to have "developer certs" like Apple or Android have for their apps, and since the package ecosystem is more collaborative in nature, the "everybody PGPs" approach at least provides a better guarantee that we can truly trace commits back to the person and not just the GH account.

In the context of ^^ perhaps one "fun" (I have weird ideas of what constitute that) wld be to get everyone on Keybase at the unconf. I 💙💙💙 what @stephlocke is doing with that in her personal and professional R work and perhaps finishing https://github.com/hrbrmstr/keybase wld be a possible unconf project.

@noamross
Copy link

noamross commented May 15, 2018 via email

@elinw
Copy link

elinw commented May 15, 2018 via email

@hrbrmstr
Copy link
Author

Aye. And there's "guidance" that might be useful to note in some API packages. For instance, I wrote epidata to access the economic policy institute data and use the data from it for various classes. Each call out to that API I do from home is logged (Federal requirement and also a side-$-business) by Comcast and searchable by authorities or interested third-parties. They use that data to classify me as a left-leaning activist (when, in fact, I'm really just a non-affiliated anti-authority anarchist :-) I've seen evidence of that in various mailings, adverts on sites that manage to get through my ad-blocking infrastructure, etc. And, due to a job stint at one of the world's largest network providers, I've also maybe even seen said databases. It's worse in other countries/regions and many at-risk researchers (again, who don't even realize they're 'at-risk') do not realize they shld be using, say, a VPN for some API calls or using DNS-over-HTTPS or DNSCrypt since DNS leaks where you're going.

I'm not suggesting rOpenSci can solve or provide guidance on all the issues, but we (I say "we" despite working in a rly strange proto-science vs a real one like y'all) cld definitely up the safety game for those using R.

@hrbrmstr
Copy link
Author

hrbrmstr commented May 15, 2018

@elinw (re: PGP) aye, is is no panacea and unless you're a die heard infosec geek or have a die hard infosec hobby, being religious about PGP configs and use is a pain, especially when setting up new systems. Keybase definitely helps alot and perhaps we (like @noamross was alluding to) cld develop a "safety/security check" package/function similar to devtools::dr_devtools() or goodpractice as part of this to help both identify gaps and provide helpers or at least friendly tips on fixing things.

@noamross
Copy link

If you want to do live testing of a package, like seeing what system files/folders it modifies, I'm working on a Dockerized setup for our standard package tests: https://github.com/noamross/launchboat, so one could run tests in an isolated environment before installing.

@boshek
Copy link

boshek commented May 15, 2018

Oh this is all so interesting. After reading about notary last year and some linked horror stories I try to sign all my commits now. So thanks @hrbrmstr !

It occurs to me that this is related to this possible project and in fact may be a key component. It is so easy to build packages/scripts and miss significant security considerations (at least for me) that this area likely has many spaces that could be improved upon. Providing means for reviewers to identify and even just consider that as part of a reviewer suite of tools would likely be useful.

@hrbrmstr
Copy link
Author

@noamross aye. been keeping an 👀 on launchboat and am also keen to also be watching the network calls pkgs make.

@jennybc
Copy link
Member

jennybc commented May 15, 2018

I'd appreciate knowing what the most realistic threat model is for the R package ecosystem and how that aligns against various measures to tighten things up.

Example: I am dimly aware of malicious packages in some other language's repository that had names very close to the "real" packages. And the Bad People exploited mis-spellings to get users to install and run them. That's a really different threat from, say, someone impersonating me and making commits to packages I maintain.

Which threats should we be most worried about and who has to do something to mitigate it?

@batpigandme
Copy link

+1 to all of this…
Also, and maybe this is limited audience (or just unrelated), but basic file threat-assessment. Sometimes you've gotta deal with someone else's data, and (e.g. with readxl) they have to get it to you some way…

@hrbrmstr
Copy link
Author

That's a 👍 point @batpigandme. "Thankfully?" malicious XML and JSON docs are usually targeting browsers and wld have some serious impediments trying to account for various R interpreter environs. Similarly, malicious PDFs are usually targeting Acrobat or Preview or third-party Windows PDF readers. However, the pkgs in the R ecosystem are all using the same core, [vulnerable] libraries so there is room for caution. And, we all get Word docs, Excel docs, PDFs, etc which all have threat vectors.

@hrbrmstr
Copy link
Author

@jennybc that's definitely a good unconf working-group mind-meld/group convo (since I'm likely far from the typical R user and cld use some examples of daily use patterns to help with said threat modeling :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants