forked from python/peps
-
Notifications
You must be signed in to change notification settings - Fork 2
/
pep-0458.txt
1239 lines (984 loc) · 63.2 KB
/
pep-0458.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
PEP: 458
Title: Surviving a Compromise of PyPI
Version: $Revision$
Last-Modified: $Date$
Author: Trishank Karthik Kuppusamy <trishank@nyu.edu>,
Vladimir Diaz <vladimir.diaz@nyu.edu>,
Marina Moore <mm9693@nyu.edu>,
Lukas Puehringer <lukas.puehringer@nyu.edu>,
Donald Stufft <donald@stufft.io>,
Justin Cappos <jcappos@nyu.edu>
BDFL-Delegate: Donald Stufft <donald@stufft.io>
Discussions-To: DistUtils mailing list <distutils-sig@python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 27-Sep-2013
Abstract
========
This PEP proposes how the Python Package Index (PyPI [1]_) should be integrated
with The Update Framework [2]_ (TUF). TUF was designed to be a flexible
security add-on to a software updater or package manager. The framework
integrates best security practices such as separating role responsibilities,
adopting the many-man rule for signing packages, keeping signing keys offline,
and revocation of expired or compromised signing keys. For example, attackers
would have to steal multiple signing keys stored independently to compromise
a role responsible for specifying a repository's available files. Another role
responsible for indicating the latest snapshot of the repository may have to be
similarly compromised, and independent of the first compromised role.
The proposed integration will allow modern package managers such as pip [3]_ to
be more secure against various types of security attacks on PyPI and protect
users from such attacks. Specifically, this PEP describes how PyPI processes
should be adapted to generate and incorporate TUF metadata (i.e., the minimum
security model). The minimum security model supports verification of PyPI
distributions that are signed with keys stored on PyPI: distributions uploaded
by developers are signed by PyPI, require no action from developers (other than
uploading the distribution), and are immediately available for download. The
minimum security model also minimizes PyPI administrative responsibilities by
automating much of the signing process.
This PEP does not prescribe how package managers such as pip should be adapted
to install or update projects from PyPI with TUF metadata. Package managers
interested in adopting TUF on the client side may consult TUF's `library
documentation`__, which exists for this purpose. Support for project
distributions that are signed by developers (maximum security model) is also
not discussed in this PEP, but is outlined in the appendix as a possible future
extension and covered in detail in PEP 480 [26]_. The PEP 480 extension
focuses on the maximum security model, which requires more PyPI administrative
work (none by clients), but it also proposes an easy-to-use key management
solution for developers, how to interface with a potential future build farm on
PyPI infrastructure, and discusses the feasibility of end-to-end signing.
__ https://github.com/theupdateframework/tuf/tree/v0.11.1/tuf/client#updaterpy
PEP Status
==========
Due to the amount of work required to implement this PEP, in early
2019 it was deferred until appropriate funding could be secured to
implement the PEP. The Python Software Foundation secured this funding
[27]_.
Motivation
==========
In January 2013, the Python Software Foundation (PSF) announced [4]_ that the
python.org wikis for Python, Jython, and the PSF were subjected to a security
breach that caused all of the wiki data to be destroyed on January 5, 2013.
Fortunately, the PyPI infrastructure was not affected by this security breach.
However, the incident is a reminder that PyPI should take defensive steps to
protect users as much as possible in the event of a compromise. Attacks on
software repositories happen all the time [5]_. The PSF must accept the
possibility of security breaches and prepare PyPI accordingly because it is a
valuable resource used by thousands, if not millions, of people.
Before the wiki attack, PyPI used MD5 hashes to tell package managers, such as
pip, whether or not a package was corrupted in transit. However, the absence
of SSL made it hard for package managers to verify transport integrity to PyPI.
It was therefore easy to launch a man-in-the-middle attack between pip and
PyPI, and change package content arbitrarily. Users could be tricked into
installing malicious packages with man-in-the-middle attacks. After the wiki
attack, several steps were proposed (some of which were implemented) to deliver
a much higher level of security than was previously the case: requiring SSL to
communicate with PyPI [6]_, restricting project names [7]_, and migrating from
MD5 to SHA-2 hashes [8]_.
These steps, though necessary, are insufficient because attacks are still
possible through other avenues. For example, a public mirror is trusted to
honestly mirror PyPI, but some mirrors may misbehave due to malice or accident.
Package managers such as pip are supposed to use signatures from PyPI to verify
packages downloaded from a public mirror [9]_, but none are known to actually
do so [10]_. Therefore, it would be wise to add more security measures to
detect attacks from public mirrors or content delivery networks [11]_ (CDNs).
Even though official mirrors are being deprecated on PyPI [12]_, there remain a
wide variety of other attack vectors on package managers [13]_. These attacks
can crash client systems, cause obsolete packages to be installed, or even
allow an attacker to execute arbitrary code. In `September 2013`__, a post was
made to the Distutils mailing list showing that the latest version of pip (at
the time) was susceptible to such attacks, and how TUF could protect users
against them [14]_. Specifically, testing was done to see how pip would
respond to these attacks with and without TUF. Attacks tested included replay
and freeze, arbitrary packages, slow retrieval, and endless data. The post
also included a demonstration of how pip would respond if PyPI were
compromised.
__ https://mail.python.org/pipermail/distutils-sig/2013-September/022755.html
With the intent to protect PyPI against infrastructure compromises, this PEP
proposes integrating PyPI with The Update Framework [2]_ (TUF). TUF helps
secure new or existing software update systems. Software update systems are
vulnerable to many known attacks, including those that can result in clients
being compromised or crashed. TUF solves these problems by providing a flexible
security framework that can be added to software updaters.
Threat Model
============
The threat model assumes the following:
* Offline keys are safe and securely stored.
* Attackers can compromise at least one of PyPI's trusted keys stored online,
and may do so at once or over a period of time.
* Attackers can respond to client requests.
An attacker is considered successful if they can cause a client to install (or
leave installed) something other than the most up-to-date version of the
software the client is updating. If the attacker is preventing the installation
of updates, they want clients to not realize there is anything wrong.
Definitions
===========
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in RFC 2119__.
__ http://www.ietf.org/rfc/rfc2119.txt
This PEP focuses on integrating TUF with PyPI; however, the reader is
encouraged to read about TUF's design principles [2]_ and SHOULD be
familiar with the TUF specification [16]_.
Terms used in this PEP are defined as follows:
* Projects: Projects are software components that are made available for
integration. Projects include Python libraries, frameworks, scripts,
plugins, applications, collections of data or other resources, and various
combinations thereof. Public Python projects are typically registered on the
Python Package Index [17]_.
* Releases: Releases are uniquely identified snapshots of a project [17]_.
* Distributions: Distributions are the packaged files that are used to publish
and distribute a release [17]_.
* Simple index: The HTML page that contains internal links to the
distributions of a project [17]_.
* Roles: There is one *root* role in PyPI. There are multiple roles whose
responsibilities are delegated to them directly or indirectly by the *root*
role. The term top-level role refers to the *root* role and any role
specified directly by the *root* role, i.e. *timestamp*, *snapshot* and
*targets* roles. Each role has a single metadata file that it is trusted to
provide.
* Metadata: Metadata are signed files that describe roles, other metadata, and
target files.
* Repository: A repository is a resource comprised of named metadata and target
files. Clients request metadata and target files stored on a repository.
* Consistent snapshot: A set of TUF metadata and PyPI targets that capture the
complete state of all projects on PyPI as they existed at some fixed point in
time.
* Developer: Either the owner or maintainer of a project who is allowed to
update the TUF metadata as well as distribution metadata and files for the
project.
* Online key: A private cryptographic key that MUST be stored on the PyPI
server infrastructure. This is usually to allow automated signing with the
key. However, an attacker who compromises the PyPI infrastructure will be
able to read these keys.
* Offline key: A private cryptographic key that MUST be stored independent of
the PyPI server infrastructure. This prevents automated signing with the
key. An attacker who compromises the PyPI infrastructure will not be able to
immediately read these keys.
* Threshold signature scheme: A role can increase its resilience to key
compromises by specifying that at least t out of n keys are REQUIRED to sign
its metadata. A compromise of t-1 keys is insufficient to compromise the
role itself. Saying that a role requires (t, n) keys denotes the threshold
signature property.
Overview of TUF
===============
At its highest level, TUF provides applications with a secure method of
obtaining files and knowing when new versions of files are available. On the
surface, this all sounds simple. The basic steps for updating applications are:
* Knowing when an update exists.
* Downloading a correct copy of the latest version of an updated file.
The problem is that updating applications is only simple when there are no
malicious activities in the picture. If an attacker is trying to interfere with
these seemingly simple steps, there is plenty they can do.
Assume a software updater takes the approach of most systems (at least the ones
that try to be secure). It downloads both the file it wants and a cryptographic
signature of the file. The software updater already knows which key it trusts
to make the signature. It checks that the signature is correct and was made by
this trusted key. Unfortunately, the software updater is still at risk in many
ways, including:
* An attacker keeps giving the software updater the same update file, so it
never realizes there is an update.
* An attacker gives the software updater an older, insecure version of a file
that it already has, so it downloads that one and blindly uses it thinking it
is newer.
* An attacker gives the software updater a newer version of a file it has but
it is not the newest one. The file is newer to the software updater, but it
may be insecure and exploitable by the attacker.
* An attacker compromises the key used to sign these files and now the software
updater downloads a malicious file that is properly signed.
TUF is designed to address these attacks, and others, by adding signed metadata
(text files that describe the repository's files) to the repository and
referencing the metadata files during the update procedure. Repository files
are verified against the information included in the metadata before they are
handed off to the software update system. The framework also provides
multi-signature trust, explicit and implicit revocation of cryptographic keys,
responsibility separation of the metadata, and minimizes key risk. For a full
list and outline of the repository attacks and software updater weaknesses
addressed by TUF, see Appendix A.
Integrating TUF with PyPI
=========================
A software update system must complete two main tasks to integrate with TUF.
First, it must add the framework to the client side of the update system. For
example, TUF MAY be integrated with the pip package manager. Second, the
repository on the server side MUST be modified to provide signed TUF metadata.
This PEP is concerned with the second part of the integration, and the changes
required on PyPI to support software updates with TUF.
What Additional Repository Files are Required on PyPI?
------------------------------------------------------
In order for package managers like pip to download and verify packages with
TUF, a few extra files MUST exist on PyPI. These extra repository files are
called TUF metadata. TUF metadata contains information such as which keys are
trustable, the cryptographic hashes of files, signatures to the metadata,
metadata version numbers, and the date after which the metadata should be
considered expired.
When a package manager wants to check for updates, it asks TUF to do the work.
That is, a package manager never has to deal with this additional metadata or
understand what's going on underneath. If TUF reports back that there are
updates available, a package manager can then ask TUF to download these files
from PyPI. TUF downloads them and checks them against the TUF metadata that it
also downloads from the repository. If the downloaded target files are
trustworthy, TUF then hands them over to the package manager.
The `Metadata`__ document provides information about each of the required
metadata and their expected content. The next section covers the different
kinds of metadata RECOMMENDED for PyPI.
__ https://github.com/theupdateframework/tuf/blob/v0.11.1/docs/METADATA.md
PyPI and TUF Metadata
=====================
TUF metadata provides information that clients can use to make update
decisions. For example, a *targets* metadata lists the available distributions
on PyPI and includes the distribution's signatures, cryptographic hashes, and
file sizes. Different metadata files provide different information. The
various metadata files are signed by different roles, which are indicated by
the *root* role. The concept of roles allows TUF to delegate responsibilities
to multiple roles and minimizes the impact of a compromised role.
TUF requires four top-level roles. These are *root*, *timestamp*, *snapshot*,
and *targets*. The *root* role specifies the public cryptographic keys of the
top-level roles (including its own). The *timestamp* role references the
latest *snapshot* and can signify when a new snapshot of the repository is
available. The *snapshot* role indicates the latest version of all the TUF
metadata files (other than *timestamp*). The *targets* role lists the file
paths of available target files together with their hashes. The file paths must
be specified relative to a base URL. This allows the actual target files to be
served from anywhere, as long as the base URL can be accessed by the client.
Each top-level role will serve its responsibilities without exception. Figure
1 provides a table of the roles used in TUF.
.. image:: pep-0458-1.png
Figure 1: An overview of the TUF roles.
Signing Metadata and Repository Management
------------------------------------------
The top-level *root* role signs for the keys of the top-level *timestamp*,
*snapshot*, *targets*, and *root* roles. The *timestamp* role signs for every
new snapshot of the repository metadata. The *snapshot* role signs for *root*,
*targets*, and all delegated targets roles. The delegated targets role *bins*
further delegates to the *bin-n* roles, which sign for all distributions
belonging to registered PyPI projects.
Figure 2 provides an overview of the roles available within PyPI, which
includes the top-level roles and the roles delegated to by *targets*. The figure
also indicates the types of keys used to sign each role and which roles are
trusted to sign for files available on PyPI. The next two sections cover the
details of signing repository files and the types of keys used for each role.
.. image:: pep-0458-2.png
Figure 2: An overview of the role metadata available on PyPI.
The roles that change most frequently are *timestamp*, *snapshot* and roles
delegated to by *bins* (i.e., *bin-n*). The *timestamp* and *snapshot*
metadata MUST be updated whenever *root*, *targets* or delegated metadata are
updated. Observe, though, that *root* and *targets* metadata are much less
likely to be updated as often as delegated metadata. Similarly, the *bins* role
will only be updated when a *bin-n* role is added, updated, or removed. Therefore, *timestamp*,
*snapshot*, and *bin-n* metadata will most likely be updated frequently (possibly every
minute) due to delegated metadata being updated frequently in order to support
continuous delivery of projects. Continuous delivery is a set of processes
that PyPI uses to produce snapshots that can safely coexist and be deleted
independent of other snapshots [18]_.
Every year, PyPI administrators SHOULD sign for *root* and *targets* role keys.
Automation will continuously sign for a timestamped snapshot of all projects.
A `repository management`__ tool is available that can sign metadata files,
generate cryptographic keys, and manage a TUF repository.
__ https://github.com/theupdateframework/tuf/blob/v0.11.1/docs/TUTORIAL.md#how-to-create-and-modify-a-tuf-repository
How to Establish Initial Trust in the PyPI Root Keys
----------------------------------------------------
Package managers like pip MUST ship the *root* metadata file with the
installation files that users initially download. This includes information
about the keys trusted for all top-level roles (including the root keys themselves).
Package managers must also bundle a TUF client library. Any new version of *root*
metadata that the TUF client library may download are verified against the root keys
that the package manager was initially bundled with. If a root key is compromised,
but a threshold of keys are still secured, then PyPI administrators MUST push new
*root* metadata that revokes trust in the compromised keys. If a threshold of root
keys are compromised, then the *root* metadata MUST be updated out-of-band.
(However, the threshold of root keys should be chosen so that this event is extremely
unlikely.) Package managers do not necessarily need to be updated immediately if root
keys are revoked or added between new releases of the package manager: the TUF update
process automatically handles the cases where a threshold of previous *root* keys sign
for new *root* keys (assuming no backwards-incompatibility in the TUF specification
used). So, for example, if a package manager was initially shipped with version 1 of
the *root* metadata, and a threshold of *root* keys in version 1 signed version 2 of
the *root metadata*, and a threshold of *root* keys in version 2 signed version 3 of
the *root metadata, then the package manager should be able to transparently update
its copy of the *root* metadata from version 1 to 3 using its TUF client library.
Thus, to repeat, the latest good copy of *root* metadata and a TUF client library MUST
be included in any new version of pip shipped with CPython (via ensurepip). The TUF
client library inside the package manager then loads the *root* metadata and downloads
the rest of the roles, including updating the *root* metadata if it has changed.
An `outline of the update process`__ is available.
__ https://github.com/theupdateframework/specification/blob/master/tuf-spec.md#5-detailed-workflows
Minimum Security Model
----------------------
There are two security models to consider when integrating TUF with PyPI. The
one proposed in this PEP is the minimum security model, which supports
verification of PyPI distributions that are signed with private cryptographic
keys stored on PyPI. Distributions uploaded by developers are signed by PyPI
and immediately available for download. A possible future extension to this
PEP, discussed in Appendix B, proposes the maximum security model and allows a
developer to sign for his/her project. Developer keys are not stored online:
therefore, projects are safe from PyPI compromises.
The minimum security model requires no action from a developer and protects
against malicious CDNs [19]_ and public mirrors. To support continuous
delivery of uploaded packages, PyPI signs for projects with an online key.
This level of security prevents projects from being accidentally or
deliberately tampered with by a mirror or a CDN because the mirror or CDN will
not have any of the keys required to sign for projects. However, it does not
protect projects from attackers who have compromised PyPI, since attackers can
manipulate TUF metadata using the keys stored online.
This PEP proposes that the *bin-n* roles sign for all
PyPI projects with online keys. The *targets* role, which only signs with an
offline key, MUST delegate all PyPI projects to the *bins* role. This means
that when a package manager such as pip (i.e., using TUF) downloads a
distribution from a project on PyPI, it will consult the *bins* role about the
TUF metadata for the project. If no *bin-n* roles delegated by *bins* specify the
project's distribution, then the project is considered to be non-existent on
PyPI.
Metadata Expiry Times
---------------------
The metadata for the *root*, *targets*, and *bins* roles SHOULD each expire in one year, because these
two metadata files are expected to change very rarely.
The *timestamp*, *snapshot*, and *bin-n* metadata SHOULD each expire in one day
because a CDN or mirror SHOULD synchronize itself with PyPI every day.
Furthermore, this generous time frame also takes into account client clocks
that are highly skewed or adrift.
Metadata Scalability
--------------------
Due to the growing number of projects and distributions, TUF metadata will also
grow correspondingly. For example, consider the *bins* role. In August 2013,
it was found that the size of the *bins* metadata was about 42MB if the *bins*
role itself signed for about 220K PyPI targets (which are simple indices and
distributions). This PEP does not delve into the details, but TUF features a
so-called "`lazy bin walk`__" scheme that splits a large targets metadata file
into many small ones. This allows a TUF client updater to intelligently
download only a small number of TUF metadata files in order to update any
project signed for by the *bins* role. For example, applying this scheme to
the previous repository resulted in pip downloading between 1.3KB and 111KB to
install or upgrade a PyPI project via TUF.
__ https://github.com/theupdateframework/tuf/blob/v0.11.1/docs/TUTORIAL.md#delegate-to-hashed-bins
Based on our findings as of the time of updating it for implementation
(Oct 7 2019), PyPI SHOULD split all targets in the *bins* role by delegating
them to 16,384 *bin-n* roles, each of which would sign for PyPI targets whose
hashes fall into that bin (see Figure 2). It was found__
that this number of bins would result in a 12-17% metadata overhead for
returning users, and a 148% overhead for new users who are installing
pip for the first time.
__ https://docs.google.com/spreadsheets/d/11_XkeHrf4GdhMYVqpYWsug6JNz5ZK6HvvmDZX0__K2I/edit?usp=sharing
It is possible to make TUF metadata more compact by representing it in a binary
format as opposed to the JSON text format. Nevertheless, a sufficiently large
number of projects and distributions will introduce scalability challenges at
some point, and therefore the *bins* role will still need delegations (as
outlined in figure 2) in order to address the problem. Furthermore, the JSON
format is an open and well-known standard for data interchange. Due to the
large number of delegated metadata, compressed versions of *snapshot* metadata
SHOULD also be made available to clients.
PyPI and Key Requirements
=========================
In this section, the kinds of keys required to sign for TUF roles on PyPI are
examined. TUF is agnostic with respect to choices of digital signature
algorithms. For the purpose of discussion, it is assumed that all digital
signatures will be produced with the Ed25519 algorithm [25]_ as this algorithm
has native and well-tested Python support.
Nevertheless, we do NOT recommend any particular digital signature algorithm in
this PEP because there are a few important constraints: first, cryptography
changes over time; and second, TUF
recommends diversity of keys for certain applications.
Number and Type Of Keys Recommended
-----------------------------------
The *root* role key is critical for security and should very rarely be used.
It is primarily used for key revocation, and it is the locus of trust for all
of PyPI. The *root* role signs for the keys that are authorized for each of
the top-level roles (including its own). Keys belonging to the *root* role are
intended to be very well-protected and used with the least frequency of all
keys. It is RECOMMENDED that every PSF board member own a (strong) root key.
A majority of them can then constitute a quorum to revoke or endow trust in all
top-level keys. Alternatively, the system administrators of PyPI could be
given responsibility for signing for the *root* role. Therefore, the *root*
role SHOULD require (t, n) keys, where n is the number of either all PyPI
administrators or all PSF board members, and t > 1 (so that at least two
members must sign the *root* role).
The *targets* role will be used only to sign for the static delegation of all
targets to the *bins* role. Since these target delegations must be secured
against attacks in the event of a compromise, the keys for the *targets* role
MUST be offline and independent of other keys. For simplicity of key
management, without sacrificing security, it is RECOMMENDED that the keys of
the *targets* role be permanently discarded as soon as they have been created
and used to sign for the role. Therefore, the *targets* role SHOULD require
(2, 2) keys. Again, this is because the keys are going to be permanently
discarded, and more offline keys will not help resist key recovery
attacks [21]_ unless diversity of cryptographic algorithms is maintained.
For similar reasons, the keys for the *bins* role SHOULD be set up similar to
the keys for the *targets* role.
In order to support continuous delivery, the keys for the *timestamp*,
*snapshot*, and all *bin-n* roles MUST be online. There is little benefit in
requiring all of these roles to use different online keys, since attackers
would presumably be able to compromise all of them if they compromise PyPI.
Therefore, it is reasonable to use one online key for them all.
Managing online keys
---------------------
The online key shared by the *timestamp*, *snapshot*, and all *bin-n* roles
MAY be stored, encrypted or not, on the Python infrastructure. For example,
the key MAY be kept on a self-hosted key management service (e.g. Hashicorp
Vault__), or a third-party one (e.g. AWS KMS__, Google Cloud KMS__, or Azure
Key Vault__).
__ https://www.vaultproject.io/
__ https://aws.amazon.com/kms/
__ https://cloud.google.com/kms/
__ https://docs.microsoft.com/en-us/azure/key-vault/basic-concepts
Some of these key management services allow keys to be stored on Hardware
Security Modules (HSMs) (e.g., Hashicorp Vault__, AWS CloudHSM__, Google
Cloud HSM__, Azure Key Vault__). This prevents attackers from exfiltrating
the online private key (albeit not from using it, although their actions
may now be cryptographically auditable). However, this requires modifying
the reference TUF implementation to support HSMs (WIP__).
__ https://www.vaultproject.io/docs/enterprise/hsm/index.html
__ https://aws.amazon.com/cloudhsm/
__ https://cloud.google.com/hsm/
__ https://docs.microsoft.com/en-us/azure/key-vault/key-vault-hsm-protected-keys
__ https://github.com/secure-systems-lab/securesystemslib/pull/170
Regardless of where and how this online key is kept, its use SHOULD be
carefully logged, monitored, and audited, ideally in such a manner that
attackers who compromise PyPI are unable to immediately turn off this logging,
monitoring, and auditing.
Managing offline keys
----------------------
As explained in the previous section, the *root*, *targets*, and *bins* role
keys MUST be offline for maximum security: these keys will be offline in the
sense that their private keys MUST NOT be stored on PyPI, though some of them
MAY be online in the private infrastructure of the project.
There SHOULD be an offline key ceremony to generate, backup, and store these
keys in such a manner that the private keys can be read only by the Python
administrators when necessary (e.g., such as rotating the keys for the
top-level TUF roles). Thus, keys SHOULD be generated—preferably in a physical
location where side-channel attacks__ are not a concern—using:
1. A trusted, airgapped__ computer with a true random number generator__, and
with no **data** persisted after the ceremony
2. A trusted operating system
3. A trusted set of third-party packages (e.g., cryptographic libraries, the
TUF reference implementation)
__ https://en.wikipedia.org/wiki/Side-channel_attack
__ https://en.wikipedia.org/wiki/Air_gap_(networking)
__ https://en.wikipedia.org/wiki/Hardware_random_number_generator
In order to avoid persisting sensitive data (e.g., private keys) other than
on backup media after the ceremony, offline keys SHOULD be generated
encrypted using strong passwords, either on (in decreasing order of trust):
private HSMs (e.g., YubiHSM__), cloud-based HSMs (e.g., those listed above),
in volatile memory (e.g., RAM), or in nonvolatile memory
(e.g., SSD or microSD). If keys must be generated on nonvolatile memory,
then this memory MUST be irrecoverably destroyed after having securely
backed up the keys.
__ https://www.yubico.com/products/yubihsm/
Passwords used to encrypt keys SHOULD be stored somewhere durable and
trustworthy where only Python admins have access.
In order to minimize OPSEC__ errors during the ceremony, scripts SHOULD be
written to automate tedious parts such as:
- Exporting to sneakernet__ all code and data (e.g., previous TUF metadata,
targets, and *root* keys) required to generate new keys and replace old ones
- Tighten the firewall, update the entire operating system in order to
fix security vulnerabilities, and airgap the computer
- Print and save cryptographic hashes of new TUF metadata
- Export *all* new TUF metadata, targets, and keys to encrypted backup media
- Export *only* new TUF metadata, targets, and online keys to encrypted backup
media
__ https://en.wikipedia.org/wiki/Operations_security
__ https://en.wikipedia.org/wiki/Sneakernet
Note the one-time keys for the *targets* and *bins* roles MAY be safely
generated, used, and deleted during the offline key ceremony. Furthermore,
the *root* keys MAY not be generated during the offline key ceremony itself:
instead, a threshold t of n Python administrators, as discussed above, may
independently sign the *root* metadata **after** the offline key ceremony used
to generate all other keys.
How Should Metadata be Generated?
=================================
Project developers expect the distributions they upload to PyPI to be
immediately available for download. Unfortunately, there will be problems when
many readers and writers simultaneously access the same metadata and
distributions. That is, there needs to be a way to ensure consistency of
metadata and repository files when multiple developers simultaneously change the
same metadata or distributions. There are also issues with consistency on PyPI
without TUF, but the problem is more severe with signed metadata that MUST keep
track of the files available on PyPI in real-time.
Suppose that PyPI generates a *snapshot*, which indicates the latest version of
every metadata except *timestamp*, at version 1 and a client requests this
*snapshot* from PyPI. While the client is busy downloading this *snapshot*,
PyPI then timestamps a new snapshot at, say, version 2. Without ensuring
consistency of metadata, the client would find itself with a copy of *snapshot*
that disagrees with what is available on PyPI, which is indistinguishable from
arbitrary metadata injected by an attacker. The problem would also occur for
mirrors attempting to sync with PyPI.
Consistent Snapshots
--------------------
To keep TUF metadata on PyPI consistent with the highly volatile target files,
consistent snapshots SHOULD be used. Each consistent snapshot captures the
state of all known projects at a given time and MAY safely coexist with any
other snapshot, or be deleted independently, without affecting any other
snapshot.
To maintain consistent snapshots, all TUF metadata MUST, when written to disk,
include a version number in their filename:
VERSION_NUMBER.ROLENAME.json,
where VERSION_NUMBER is an incrementing integer, and ROLENAME is one of the
top-level metadata roles -- *root*, *snapshot* or *targets* -- or one of
the delegated targets roles -- *bins* or *bin-n*.
The only exception is the *timestamp* metadata file, whose version is not known
in advance, when a client performs an update. The *timestamp* metadata
lists the
version of the *snapshot* metadata, which in turn lists the versions of the
*targets* and delegated targets metadata, all part of a given consistent
snapshot.
Eventually, *targets* or delegated targets metadata point to the actual target
files, including their `cryptographic hashes`__. Thus, to mark a target file as
part of a consistent snapshot it MUST, when written to disk, include its hash
in its filename:
HASH.FILENAME
where HASH is the `hex digest`__ of the `SHA-256`__ hash of the file
contents and FILENAME is the original filename.
__ https://en.wikipedia.org/wiki/Cryptographic_hash_function
__ https://docs.python.org/3.7/library/hashlib.html#hashlib.hash.hexdigest
__ https://en.wikipedia.org/wiki/SHA-2
Assuming infinite disk space, strictly incrementing version numbers, and no
`hash collisions`__, a client may safely read from one snapshot while PyPI
produces another snapshot.
__ https://en.wikipedia.org/wiki/Collision_(computer_science)
In this simple but effective manner, PyPI is able to capture a consistent
snapshot of all projects and the associated metadata at a given time. The next
subsection provides implementation details of this idea.
Note: This PEP does not prohibit using advanced file systems or tools to
produce consistent snapshots. There are two important reasons for why this PEP
proposes the simple solution. First, the solution does not mandate that PyPI
use any particular file system or tool. Second, the generic file-system based
approach allows mirrors to use extant file transfer tools such as rsync to
efficiently transfer consistent snapshots from PyPI.
Producing Consistent Snapshots
------------------------------
Given a project, PyPI is responsible for updating the *bin-n* metadata. Every
project MUST upload its release in a single transaction. The uploaded set of
files is called the "project transaction". How PyPI MAY validate the files in
a project transaction is discussed in a later section. For now, the focus is
on how PyPI will respond to a project transaction.
When a project uploads a new transaction, the project transaction process MUST
add all new targets and relevant delegated *bin-n* metadata. Finally, the
project transaction process MUST inform the snapshot process about any new
*bin-n* metadata.
Project transaction processes SHOULD be automated and MUST also be applied
atomically: either all metadata and targets -- or none of them -- are added.
The project transaction and snapshot processes SHOULD work concurrently.
Finally, project transaction processes SHOULD use the latest *bin-n*
metadata so that they will be correctly updated in new consistent snapshots.
Signing updated *timestamp*, *snapshot*, and *bin-n* metadata needs to be done on each
update. Fortunately, the actual operation of signing is fast enough that this
may be done a thousand or more times per second. However, locking must be
used so that project transactions are handled sequentially. To achieve this,
all project transactions MAY be placed in a single queue and processed
serially. Alternatively, the queue MAY be processed concurrently in order of
appearance, provided that the following rules are observed:
1. No pair of project transaction processes must concurrently work on the same
project.
2. No pair of project transaction processes must concurrently work on
projects that belong to the same delegated *bin-n* role.
These rules MUST be observed so that metadata is not read from or written to
inconsistently.
Snapshot Process
----------------
The snapshot process is fairly simple and SHOULD be automated. The snapshot
process SHOULD use the latest working set of the *targets* and all
delegated targets roles' (i.e. *bins* and *bin-n* roles) metadata. Upon an update, the
snapshot process will sign for this
latest working set. (Recall that project transaction processes continuously
inform the snapshot process about the latest delegated metadata in a
concurrency-safe manner. The snapshot process will actually sign for a copy of
the latest working set while the latest working set in memory will be updated
with information that is continuously communicated by the project transaction
processes.) The snapshot process MUST generate and sign new *timestamp*
metadata that will vouch for the metadata (*root*, *targets*, and delegated
roles) generated in the previous step. Finally, the snapshot process MUST make
available to clients the new *timestamp* and *snapshot* metadata representing
the latest snapshot.
A few implementation notes are now in order. So far, we have seen only that
new metadata and targets are added, but not that old metadata and targets are
removed. Practical constraints are such that eventually PyPI will run out of
disk space to produce a new consistent snapshot. In that case, PyPI MAY then
use something like a "mark-and-sweep" algorithm to delete sufficiently old
consistent snapshots: in order to preserve the latest consistent snapshot, PyPI
would walk objects beginning from the root (*timestamp*) of the latest
consistent snapshot, mark all visited objects, and delete all unmarked objects.
The last few consistent snapshots may be preserved in a similar fashion.
Deleting a consistent snapshot will cause clients to see nothing except HTTP
404 responses to any request for a file within that consistent snapshot.
Clients SHOULD then retry (as before) their requests with the latest consistent
snapshot.
All clients, such as pip using the TUF protocol, MUST be modified to download
every metadata and target file (except for *timestamp* metadata) by including,
in the request for the file, the version of the file (for metadata), or the
cryptographic hash of the file (for target files) in the filename.
Finally, PyPI SHOULD use a `transaction log`__ to record project transaction
processes and queues so that it will be easier to recover from errors after a
server failure.
__ https://en.wikipedia.org/wiki/Transaction_log
Cleaning up old metadata
------------------------
Prior versions of snapshot, targets, and timestamp metadata does not need to
be kept indefinitely. (Root files must be indefinitely retained.)
However, a client that performs an update MUST be able
to retrieve a consistent set of versions of the files on the repository.
For example, if a client downloads a snapshot file, it should retrieve the versions
of targets metadata that existed when that snapshot file was created, even if
the targets metadata has been updated concurrently with the client requests.
Fortunately, the use of hash / version delegations handle this case automatically
since clients request targets unambiguously. Once no clients could reasonably
be requesting outdated targets, snapshot or timestamp files, those files may be
removed to save space. Thus files that were obsoleted some reasonable time
in the past (such as 1 hour) may be safely discarded.
Revoking Trust in Projects and Versions
=======================================
From time to time either a project or a version of a package will need to be revoked.
To revoke trust in a version of a package, the bin role can simply remove the
delegation and re-sign the bin metadata. Similarly, an entire project may be removed
by removing the bin metadata references to the metadata and package versions.
All of these actions only require actions with the online bin key.
Key Compromise Analysis
=======================
This PEP has covered the minimum security model, the TUF roles that should be
added to support continuous delivery of distributions, and how to generate and
sign the metadata of each role. The remaining sections discuss how PyPI
SHOULD audit repository metadata, and the methods PyPI can use to detect and
recover from a PyPI compromise.
Table 1 summarizes a few of the attacks possible when a threshold number of
private cryptographic keys (belonging to any of the PyPI roles) are
compromised. The leftmost column lists the roles (or a combination of roles)
that have been compromised, and the columns to its right show whether the
compromised roles leaves clients susceptible to malicious updates, a freeze
attack, or metadata inconsistency attacks.
+-----------------+-------------------+----------------+--------------------------------+
| Role Compromise | Malicious Updates | Freeze Attack | Metadata Inconsistency Attacks |
+=================+===================+================+================================+
| timestamp | NO | YES | NO |
| | snapshot and | limited by | snapshot needs to cooperate |
| | targets or any | earliest root, | |
| | of the bins need | targets, or | |
| | to cooperate | bin expiry | |
| | | time | |
+-----------------+-------------------+----------------+--------------------------------+
| snapshot | NO | NO | NO |
| | timestamp and | timestamp | timestamp needs to cooperate |
| | targets or any of | needs to | |
| | the bins need to | cooperate | |
| | cooperate | | |
+-----------------+-------------------+----------------+--------------------------------+
| timestamp | NO | YES | YES |
| **AND** | targets or any | limited by | limited by earliest root, |
| snapshot | of the bins need | earliest root, | targets, or bin metadata |
| | to cooperate | targets, or | expiry time |
| | | bin metadata | |
| | | expiry time | |
+-----------------+-------------------+----------------+--------------------------------+
| targets | NO | NOT APPLICABLE | NOT APPLICABLE |
| **OR** | timestamp and | need timestamp | need timestamp and snapshot |
| bin | snapshot need to | and snapshot | |
| | cooperate | | |
+-----------------+-------------------+----------------+--------------------------------+
| timestamp | YES | YES | YES |
| **AND** | | limited by | limited by earliest root, |
| snapshot | | earliest root, | targets, or bin metadata |
| **AND** | | targets, or | expiry time |
| bin | | bin metadata | |
| | | expiry time | |
+-----------------+-------------------+----------------+--------------------------------+
| root | YES | YES | YES |
+-----------------+-------------------+----------------+--------------------------------+
Table 1: Attacks possible by compromising certain combinations of role keys.
In `September 2013`__, it was shown how the latest version (at the time) of pip
was susceptible to these attacks and how TUF could protect users against them
[14]_.
__ https://mail.python.org/pipermail/distutils-sig/2013-September/022755.html
Note that compromising *targets* or any delegated role (except for project
targets metadata) does not immediately allow an attacker to serve malicious
updates. The attacker must also compromise the *timestamp* and *snapshot*
roles (which are both online and therefore more likely to be compromised).
This means that in order to launch any attack, one must not only be able to
act as a man-in-the-middle but also compromise the *timestamp* key (or
compromise the *root* keys and sign a new *timestamp* key). To launch any
attack other than a freeze attack, one must also compromise the *snapshot* key.
Finally, a compromise of the PyPI infrastructure MAY introduce malicious
updates to *bins* projects because the keys for these roles are online. The
maximum security model discussed in the appendix addresses this issue. PEP 480
also covers the maximum security model and goes into more detail on generating
developer keys and signing uploaded distributions.
In the Event of a Key Compromise
--------------------------------
A key compromise means that a threshold of keys (belonging to the metadata
roles on PyPI), as well as the PyPI infrastructure, have been compromised and
used to sign new metadata on PyPI.
If a threshold number of *timestamp*, *snapshot*, *targets*, *bins* or *bin-n*
keys have been compromised, then PyPI MUST take the following steps:
1. Revoke the *timestamp*, *snapshot* and *targets* role keys from
the *root* role. This is done by replacing the compromised *timestamp*,
*snapshot* and *targets* keys with newly issued keys.
2. Revoke the *bins* keys from the *targets* role by replacing their keys with
newly issued keys. Sign the new *targets* role metadata and discard the new
keys (because, as explained earlier, this increases the security of
*targets* metadata).
3. All targets of the *bin-n* roles SHOULD be compared with the last known
good consistent snapshot where none of the *timestamp*, *snapshot*,
*bins* or *bin-n* keys
were known to have been compromised. Added, updated or deleted targets in
the compromised consistent snapshot that do not match the last known good
consistent snapshot MAY be restored to their previous versions. After
ensuring the integrity of all *bin-n* targets, their keys should be renewed
in the *bins* metadata.
4. The *bins* and *bin-n* metadata MUST have their version numbers incremented,
expiry times suitably extended, and signatures renewed.
5. A new timestamped consistent snapshot MUST be issued.
Following these steps would preemptively protect all of these roles even though
only one of them may have been compromised.
If a threshold number of *root* keys have been compromised, then PyPI MUST take
above steps and in addition replace all *root* keys in the *root* role.
It is also RECOMMENDED that PyPI sufficiently document compromises with
security bulletins. These security bulletins will be most informative when
users of pip-with-TUF are unable to install or update a project because the
keys for the *timestamp*, *snapshot* or *root* roles are no longer valid. They
could then visit the PyPI web site to consult security bulletins that would
help to explain why they are no longer able to install or update, and then take
action accordingly. When a threshold number of *root* keys have not been
revoked due to a compromise, then new *root* metadata may be safely updated
because a threshold number of existing *root* keys will be used to sign for the
integrity of the new *root* metadata. TUF clients will be able to verify the
integrity of the new *root* metadata with a threshold number of previously
known *root* keys. This will be the common case. Otherwise, in the worst
case, where a threshold number of *root* keys have been revoked due to a
compromise, an end-user may choose to update new *root* metadata with
`out-of-band`__ mechanisms.
__ https://en.wikipedia.org/wiki/Out-of-band#Authentication
Auditing Snapshots
------------------
If a malicious party compromises PyPI, they can sign arbitrary files with any
of the online keys. The roles with offline keys (i.e., *root*, *targets* and *bins*)
are still protected. To safely recover from a repository compromise, snapshots
should be audited to ensure files are only restored to trusted versions.
When a repository compromise has been detected, the integrity of three types of
information must be validated:
1. If the online keys of the repository have been compromised, they can be
revoked by having the *targets* role sign new metadata delegating to a new
key.
2. If the role metadata on the repository has been changed, this would impact
the metadata that is signed by online keys. Any role information created
since the last period should be discarded. As a result, developers of new
projects will need to re-register their projects.
3. If the packages themselves may have been tampered with, they can be
validated using the stored hash information for packages that existed at the
time of the last period.
In order to safely restore snapshots in the event of a compromise, PyPI SHOULD
maintain a small number of its own mirrors to copy PyPI snapshots according to
some schedule. The mirroring protocol can be used immediately for this
purpose. The mirrors must be secured and isolated such that they are
responsible only for mirroring PyPI. The mirrors can be checked against one
another to detect accidental or malicious failures.
Another approach is to generate the cryptographic hash of *snapshot*
periodically and tweet it. Perhaps a user comes forward with the actual
metadata and the repository maintainers can verify the metadata's cryptographic
hash. Alternatively, PyPI may periodically archive its own versions of
*snapshot* rather than rely on externally provided metadata. In this case,
PyPI SHOULD take the cryptographic hash of every package on the repository and
store this data on an offline device. If any package hash has changed, this
indicates an attack.
As for attacks that serve different versions of metadata, or freeze a version
of a package at a specific version, they can be handled by TUF with techniques
like implicit key revocation and metadata mismatch detection [2].
Managing Future Changes to the Update Process
=============================================
If breaking changes are made to the update process, PyPI should implement these
changes without disrupting existing clients. For guidance on how to do so,
see ongoing discussion in the TAP repository__.
__ https://github.com/theupdateframework/taps/pull/107
Note that the changes to PyPI from this PEP will be backwards compatible. The
location of targets files and simple indices are not changed in this PEP, so any
existing PyPI clients will still be able to perform updates using these files.
This PEP adds the ability for clients to use TUF metadata to improve the
security of the update process.
Appendix A: Repository Attacks Prevented by TUF
===============================================
* **Arbitrary software installation**: An attacker installs anything they want
on the client system. That is, an attacker can provide arbitrary files in