-
Notifications
You must be signed in to change notification settings - Fork 14
/
onetime
executable file
·1923 lines (1754 loc) · 84.1 KB
/
onetime
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
#!/usr/bin/env python
__version__ = "2.0-beta15"
__doc__ = """\
OneTime: an open source encryption program that uses the one-time pad method.
(Run 'onetime --help' for usage information.)
The usual public-key encryption programs, such as GnuPG, are probably
secure for everyday purposes, but their implementations are too
complex for all but the most knowledgeable programmers to vet, and
in some cases there may be vulnerable steps in the supply chain
between their authors and the end user. When bootstrapping trust,
it helps to start with something you can trust by inspection.
Hence this script, OneTime, a simple program that encrypts plaintexts
against one-time pads. If you don't know what one-time pads are, this
program may not be right for you. If you do know what they are and
how to use them, this program can make using them more convenient.
OneTime handles some of the pad-management bureaucracy for you. It
avoids re-using pad data -- except when decrypting the same message
twice -- by maintaining records of pad usage in ~/.onetime/pad-records.
(The pads themselves are not typically stored there, just records
about pad usage.)
Recommended practice: if you are Alice communicating with Bob, then
keep two different pads, 'alice_to_bob.pad' and 'bob_to_alice.pad', as
opposed to sharing the same pad for both directions of communication.
With two separate pads, even if you each send a message simultaneously
to the other with no advance planning, you still won't accidentally
use any of the same pad data twice, assuming you let OneTime do its
bookkeeping naturally.
See http://en.wikipedia.org/wiki/One-time_pad for more information
about one-time pads in general.
OneTime is written by Karl Fogel and distributed under an MIT-style
open source license; run 'onetime --license' or see the LICENSE file
in the full distribution for complete licensing information.
OneTime's home page is http://www.red-bean.com/onetime/.
"""
import os
import sys
import stat
import getopt
import bz2
import base64
import hashlib
import re
import xml
import xml.dom
import xml.dom.minidom
import random
########################### Design Overview #############################
#
# To encrypt, OneTime first compresses input using bzip2, then XOR's
# the bzipped stream against the pad, and emits the result in base64
# encoding. It also records in ~/.onetime/pad-records that that
# length of pad data, starting from a particular offset, has been
# used, so that the user won't ever re-use that stretch of pad for
# encryption again.
#
# To decrypt, OneTime does the reverse: base64-decode the input, XOR
# it against the pad (starting at the pad offset specified in the
# encrypted message), and bunzip2 the result. Decryption also records
# that this range has been used, in ~/.onetime/pad-records, because
# the recipient of a message should of course never use the same
# pad data to encrypt anything else.
#
# The output format looks like this:
#
# -----BEGIN OneTime MESSAGE-----
# Format: internal << NOTE: OneTime 1.x and older cannot read this format. >>
# Pad ID: [...64 hexadecimal digits of unique pad ID...]
# Offset: [...a number expressed in decimal...]
#
# [...encrypted block, base64-encoded...]
#
# -----END OneTime MESSAGE-----
#
# The encrypted block has some structure, though -- it's not *just* a
# base64 encoding of pad-XOR'd bzipped plaintext data. Before and
# after the core data, there's some bookkeeping, all base64-encoded.
# Here's a diagram, with index increasing from left to right:
#
# FFHHTTRRRR******-------------------------------------DDDD*******
#
# The precise number of same characters in a row is not significant
# above, except as a rough guide to the relative lengths of the
# different sections. This is what the different sections mean:
#
# F == a few format indicator bytes:
#
# These tell OneTime what version of its internal format it
# is looking at. This is really just for future-proofing,
# since these internal format indicator bytes were only
# introduced in 2.0, and as of this writing OneTime is still
# using that first format (known as "internal format 0").
#
# H == a few head fuzz source length bytes:
#
# A few bytes of raw pad data, used to calculate
# the number of bytes of head fuzz (the first
# set of asterisks) that will be used.
#
# T == a few tail fuzz source length bytes:
#
# A few more bytes of raw pad data, used to calculate
# the number of bytes of tail fuzz (the concluding
# set of asterisks) that will be used.
#
# R == some bytes of raw pad input for the session hash
#
# These bytes are among the data fed to the session hash, so
# that the final message digest authenticates the pad as well
# as the plaintext. (The actual number of bytes used is
# PadSession._digest_source_length.)
#
# * == a random number of fuzz bytes (head fuzz or tail fuzz):
#
# A random number (derived from the pad -- see above) of
# runtime-generated random or pseudo-random bytes, XOR'd
# against the same number of pad bytes.
#
# When these encrypted bytes appear at the front, they are
# called "head fuzz", and because the number of bytes is
# based on pad data, they prevent an attacker from knowing
# exactly where in the encrypted text the message starts.
# When they appear at the end they are called "tail fuzz",
# and similarly, they mean that an attacker does not know
# exactly where the message ends.
#
# In other words, the real message (and its digest, described
# below) sits somewhere along a slider surrounded by fuzz on
# each side, and the precise amount of fuzz on each side is
# known only to those with the pad. This prevents a
# known-plaintext message substitution attack: because
# attackers cannot know where the message is, they cannot
# reliably replace a known plaintext with some other
# plaintext of the same length, even with channel control.
#
# The reason the fuzz regions are random data XOR'd against
# pad, instead of just being plain pad data (which would have
# been theoretically sufficient) is to avoid exposing weak
# pads. Even though it would be a bad mistake for a user to
# use a merely pseudo-random pad, instead of a truly random
# one, still at least OneTime should do its best to avoid
# exposing this mistake in an obvious way when it happens.
#
# - == encrypted plaintext
#
# Base64-encoded XOR'd bzipped plaintext.
#
# D == 32 bytes of digest
#
# A SHA256 digest, XOR'd with pad, of: the raw pad session
# hash input bytes, plus the raw head fuzz (that is, the
# random bytes from *before* they were XOR'd with the pad to
# produce the final head fuzz), plus the plaintext message.
#
# The purpose of this digest verify message integrity without
# revealing anything about the pad (because we don't want an
# attacker to be able to analyze whether the pad itself might
# have any weaknesses). Using a combined digest means that
# even in the case of a known plaintext there is no plaintext
# substitution attack and no way to recover any of the raw
# pad bytes.
#
# (Note that this is not a hexadecimal representation of the hash;
# to save space, it is the raw hash digest. However, any integrity
# errors display the hash in hexadecimal.)
#
# On decryption, OneTime verifies both the digest and that the tail
# fuzz is exactly as long as expected. If anything doesn't match,
# OneTime will raise an error -- though if the error was detected only
# in the digest or in the tail fuzz length, then there may still have
# been a successful decryption first, with plaintext output emitted.
#
# OneTime takes care never to expose even raw pad bytes that aren't
# going to be used for encryption. Although in theory the pad should
# be completely random, and therefore an exposure of pad bytes that
# aren't used for encryption should not reveal anything about other
# bytes that *are* used for encryption, in practice, if someone is
# using a pad that is not perfectly random, we don't want OneTime to
# make that situation any worse than it has to be. Therefore, in two
# places that could potentially expose such raw pad bytes, we don't:
#
# 1) To calculate the pad ID (which is recorded in the file
# ~/.onetime/pad-records, which we have to assume could get
# exposed to an attacker), we use the hexadecimal digest of a
# SHA256 hash of some bytes from the front of the pad, instead of
# just using a hex representation of those bytes in raw form.
#
# 2) When inserting head fuzz and tail fuzz into the encrypted
# message, we don't just use raw pad bytes, but rather XOR raw
# pad bytes against run-time-generated random bytes, so that the
# fuzz regions reveal as little as possible about the pad and
# are, ideally, in no way distinguishable on inspection from the
# encrypted data between them.
#
# As for the code, the design is pretty much what you'd expect:
#
# The PadSession class encapsulates one session of using a pad to
# encrypt or decrypt one contiguous data stream. It takes care of
# everything that happens in the encrypted block above.
#
# A PadSession object is created with a particular pad file and
# registered with a Configuration object. The Configuration object
# reads the ~/.onetime/pad-records file, to ensure that the PadSession
# starts from the right offset in the pad, and it records in
# ~/.onetime/pad-records that a new stretch of pad is consumed, once
# the PadSession is done.
#
# A PadSession is wrapped with a SessionEncoder if encrypting or with
# a Sessiondecoder if decrypting. These wrapper classes take care of
# the base64 encoding.
#
########################################################################
# The current format level.
#
# Some background is needed to understand what this means:
#
# The first releases of OneTime (the 1.x series) did not include any
# indication of the format in the plaintext headers of the output.
# This was deliberate: after all, if there *were* a format change in
# the future, a "Format:" header could be added, and its presence
# would indicate that the output was clearly from a later version than
# the 1.x series.
#
# Well, that has now happened -- but instead of specifying an exact
# format version in the plaintext header, we just specify that the
# format is "internal", and in the code we call that a "format level"
# instead of a "format version". We distinguish between this new
# "internal" level and the old level (now retroactively labeled
# "original"), for the purpose of supporting OneTime 1.x and earlier,
# but beyond that the plaintext header does not say anything about the
# format other than that label "internal".
#
# There are a couple of reasons to do it this way. One, to get away
# from the idea that the version of OneTime is relevant, since that
# what really matters is just the output format -- which can and often
# will remain unchanged, or at least backward-compatible, from version
# to version of OneTime. Two, starting from OneTime 2.0, all detailed
# format version information is embedded in "inner headers" in the
# ciphertext (see the PadSession class for details), not in the plaintext
# headers. This avoids leaking information about the earliest
# possible date on which the message could have been encrypted,
# because the ciphertext will reveal only that the message must have
# been encrypted with OneTime 2.0 or higher.
#
# Therefore, in OneTime's code, instead of using numbers for the
# format level, we use one of two words: "internal" or "original".
#
# Note also that the "original" format had a (rather embarrassing) bug
# whereby plaintext was encrypted and then compressed, instead of the
# other way around. This is fixed in all the "internal" level
# formats, and of course any further format details are now embedded
# in the inner headers in the cipthertext, as described in class PadSession.
# (And no, http://blog.appcanary.com/2016/encrypt-or-compress.html
# does not contradict that compress-then-encrypt is right for OneTime.)
Format_Level = "internal"
class Configuration:
"""A parsed representation of one user's ~/.onetime/ configuration area.
A .onetime/ directory contains just a 'pad-records' file right now.
Even in cases where we're operating without touching permanent
storage, a Configuration instance is still created and updated
internally. This is partly because the Configuration does some
consistency checks on incoming/outgoing data, and partly because it
would be useful if we're ever providing an API.
"""
class ConfigurationError(Exception):
"""Exception raised if we encounter an impossible state in a
Configuration."""
pass
def __init__(self, pad_session, path=None):
"""Initialize a new configuration with PAD_SESSION.
If PATH is None, try to find or create the config area in the
standard location in the user's home directory; otherwise, find or
create it at PATH.
If PATH is \"-\", instantiate a Configuration object but do not
connect it to any activity on disk; it will neither read from nor
write to permanent storage."""
self._pad_session = pad_session
self.config_area = path
if self.config_area is None:
self.config_area = os.path.join(os.path.expanduser("~"), ".onetime")
self.pad_records_file = os.path.join(self.config_area, "pad-records")
# Create the configuration area if necessary.
if self.config_area != '-' and not os.path.isdir(self.config_area):
# Legacy data check: if they have a ~/.otp dir, that's probably
# from a previous incarnation of this program, when it was named
# "otp". If so, just rename the old config area.
old_config_area = os.path.join(os.path.expanduser("~"), ".otp")
old_pad_records_file = os.path.join(old_config_area, "pad-records")
if os.path.isfile(old_pad_records_file):
os.rename(old_config_area, self.config_area)
else:
os.mkdir(self.config_area)
# Create the pad-records file if necessary.
if self.config_area != '-' and not os.path.isfile(self.pad_records_file):
open(self.pad_records_file, "w").close()
# Parse the pad-records file (if any) in the configuration area.
self.pad_records = self._parse_pad_records_file()
def _consolidate_used_ranges(self, used, allow_reconsumption=False):
"""Return a consolidated version of USED. USED is a list of
tuples, indicating offsets and lengths:
[ (OFFSET1, LENGTH1), (OFFSET2, LENGTH2), ... ]
Consolidation means returning a list of equal or shorter length,
that marks exactly the same ranges as used, but expressed in the
most compact way possible. For example:
[ (0, 10), (10, 20), (20, 25) ]
would become
[ (0, 25) ]
If ALLOW_RECONSUMPTION is False, raise a ConfigurationError
exception if the input is incoherent, such as a range beginning
inside another range. But if ALLOW_RECONSUMPTION is True, allow
ranges to overlap. Typically, it will be False when encrypting and
True when decrypting, because it's legitimate to decrypt a message
multiple times, as long as no one re-uses that range for encrypting."""
new_used = [ ]
last_offset = None
last_length = None
for tup in used:
(this_offset, this_length) = tup
if last_offset is not None:
if last_offset + last_length >= this_offset:
# It's only reconsumption if the end of the previous range
# extends past the next offset. So we error on that if
# we're not allowing reconsumption...
if (last_offset + last_length > this_offset
and not allow_reconsumption):
raise self.ConfigurationError(
"pad's used ranges are incoherent:\n %s" % str(used))
# ...but otherwise we just extend the range from the
# original offset, whether it was a true overlap or a
# snuggle-right-up-against kind of thing:
else:
# All the possible cases are:
#
# 1) first tuple entirely precedes second
# 2) second tuple begins inside first but ends after it
# 3) second tuple begins and ends inside first
# 4) second tuple begins *before* first and ends in it
# 5) second tuple begins and ends before first
#
# However, due to the conditional above, we must be in (2)
# or (3), and we only need to adjust last_length if (2).
if (this_offset + this_length) > (last_offset + last_length):
last_length = (this_offset - last_offset) + this_length
else:
new_used.append((last_offset, last_length))
last_offset = this_offset
last_length = this_length
else:
last_offset = this_offset
last_length = this_length
if last_offset is not None:
new_used.append((last_offset, last_length))
return new_used
def _get_next_offset(self, used):
"""Return the next free offset from USED, which is assumed to be in
consolidated form. PadSession._id_source_length is the minimum
returned; that way the pad ID stretch is always accounted for,
even if USED was initialized from an old original-format pad record."""
cur_offset = None
# We don't do anything fancy, just get the earliest available
# offset past the last used tuple. This means that any ranges in
# between tuples are wasted. See comment in main() about
# discontinuous ranges for why this is okay.
for tup in used:
(this_offset, this_length) = tup
cur_offset = this_offset + this_length
if cur_offset is None or cur_offset < PadSession._id_source_length:
return PadSession._id_source_length
else:
return cur_offset
def _parse_pad_records_file(self):
"""Return a dictionary representing this configuration's 'pad-records'
file (e.g., ~/.onetime/pad-records). If the file is empty, just
return an empty dictionary.
The returned dictionary is keyed on pad IDs, with sub-dictionaries
as values. Each sub-dictionary's keys are the remaining element
names inside a pad element, and the value of the 'used' element is
a list of tuples, each tuple of the form (OFFSET, LENGTH). So:
returned_dict[PAD_ID] ==> subdict
subdict['used'] ==> [(OFFSET1, LENGTH1), (OFFSET2, LENGTH2), ...]
subdict['some_elt_name'] ==> SOME_ELT_VALUE <!-- if any -->
subdict['another_elt_name'] ==> ANOTHER_ELT_VALUE <!-- if any -->
A 'pad-records' file is an XML document like this:
<?xml version="1.0" encode="UTF-8"?>
<!DOCTYPE TYPE_OF_DOC SYSTEM/PUBLIC "dtd-name">
<onetime-pad-records>
<pad-record>
<id>PAD_ID</id>
<used><offset>OFFSET_A</offset>
<length>LENGTH_A</length></used>
<used><offset>OFFSET_B</offset>
<length>LENGTH_B</length></used>
...
</pad-record>
<pad-record>
<id>SOME_OTHER_PAD_ID</id>
<used><offset>OFFSET_C</offset>
<length>LENGTH_C</length></used>
...
</pad-record>
...
</onetime-pad-records>
"""
dict = { }
if self.config_area == '-':
return dict
try:
dom = xml.dom.minidom.parse(self.pad_records_file)
for pad in dom.firstChild.childNodes:
id = None
path = None
used = [ ]
if pad.nodeType == xml.dom.Node.ELEMENT_NODE:
subdict = { }
for pad_part in pad.childNodes:
if pad_part.nodeType == xml.dom.Node.ELEMENT_NODE:
if pad_part.nodeName == "id":
id = pad_part.childNodes[0].nodeValue
elif pad_part.nodeName == "used":
offset = None
length = None
for used_part in pad_part.childNodes:
if used_part.nodeName == "offset":
offset = int(used_part.childNodes[0].nodeValue)
if used_part.nodeName == "length":
length = int(used_part.childNodes[0].nodeValue)
used.append((offset, length))
subdict["used"] = self._consolidate_used_ranges(used)
else:
# Parse unknown elements transparently.
subdict[pad_part.nodeName] = pad_part.childNodes[0].nodeValue
if not subdict.has_key("used"):
# We don't require the "used" element to be present; if it's
# absent, it just means none of this pad has been used yet.
subdict["used"] = [ (0, 0) ]
dict[id] = subdict
except xml.parsers.expat.ExpatError:
pass
return dict
def save(self):
"""Save the pad-records file."""
if self.config_area == '-':
return
tempfile = self.pad_records_file + ".tmp"
# Deliberately not setting binary mode here; this is a text file.
fp = open(tempfile, 'w')
fp.write("<onetime-pad-records>\n")
for pad_id in self.pad_records.keys():
fp.write(" <pad-record>\n")
fp.write(" <id>%s</id>\n" % pad_id)
for tuple in self._consolidate_used_ranges(
self.pad_records[pad_id]["used"]):
fp.write(" <used><offset>%d</offset>\n" % tuple[0])
fp.write(" <length>%d</length></used>\n" % tuple[1])
for key in self.pad_records[pad_id].keys():
if key != "used":
fp.write(" <%s>%s</%s>\n" % \
(key, self.pad_records[pad_id][key], key))
fp.write(" </pad-record>\n")
fp.write("</onetime-pad-records>\n")
fp.close()
# On some operating systems, renaming a file onto an existing file
# doesn't just silently overwrite the latter -- according to
# https://github.com/kfogel/OneTime/issues/13, Microsoft Windows
# will throw an error, for example. So we do this rename very
# carefully, and in such a way as to not to destroy any pad
# records that might be left over from a past failed rename.
intermediate_tempfile = self.pad_records_file + ".int"
if os.path.exists(intermediate_tempfile):
raise ConfigurationError(
"Leftover intermediate pad-records file found;"
"please sort things out:\n"
" %s" % intermediate_tempfile)
os.rename(self.pad_records_file, intermediate_tempfile)
os.rename(tempfile, self.pad_records_file)
os.remove(intermediate_tempfile)
def register(self):
"""Register this session's pad if it is not already registered, and
set its offset based on previously used regions for that pad, if any."""
next_offset = None
# This is a little complicated only because we need; to look for
# old original-style pad IDs and upgrade them if present.
if not self.pad_records.has_key(self._pad_session.id()):
if self.pad_records.has_key(
self._pad_session.id(format_level="original")):
# Upgrade original-style record to internal style.
self.pad_records[self._pad_session.id()] \
= self.pad_records[self._pad_session.id(format_level="original")]
del self.pad_records[self._pad_session.id(format_level="original")]
else:
# Initialize a new internal-style record.
self.pad_records[self._pad_session.id()] = { "used" : [ ] }
else:
if self.pad_records.has_key(
self._pad_session.id(format_level="original")):
raise Configuration.ConfigurationError(
"Pad has both v2 and v1 IDs present in pad-records file:\n" \
" v2: %s\n" \
" v1: %s\n" \
"This is supposed to be impossible. Please resolve." \
% (self._pad_session.id(),
self._pad_session.id(format_level="original")))
# One way or another, we now have an up-to-date v2 pad record.
# Set the next offset accordingly.
next_offset = self._get_next_offset(
self.pad_records[self._pad_session.id()]["used"])
self._pad_session.set_offset(next_offset)
def record_consumed(self, allow_reconsumption):
"""Record pad ranged currently used by self._pad_session.
If ALLOW_RECONSUMPTION is False, raise a ConfigurationError
if reconsuming any part of a range that has been consumed previously.
But if ALLOW_RECONSUMPTION is True, allow ranges to overlap.
Typically, it is False when encrypting and True when decrypting,
because it's okay to decrypt a message multiple times, but not to
re-use a range for encrypting."""
used = self.pad_records[self._pad_session.id()]["used"]
used.append((self._pad_session.offset(), self._pad_session.length()))
self.pad_records[self._pad_session.id()]["used"] \
= self._consolidate_used_ranges(used, allow_reconsumption)
def show_pad_records(self):
"""Print pad records, presumably for debugging."""
for pad_id in self.pad_records.keys():
print "PadSession %s:" % pad_id
print " used:", self.pad_records[pad_id]["used"]
class RandomEnough:
"""Class for providing [pseudo]random bytes that are XOR'd against pad.
**************************************************
*** NOTE: ***
*** ***
*** DON'T USE THIS FOR ENCRYPTING PLAINTEXT. ***
*** THAT IS NOT WHAT IT IS FOR. JUST DON'T. ***
*** ***
**************************************************
There are a couple of places (the head fuzz and tail fuzz) where raw
pad bytes would otherwise be exposed in the encrypted message,
except that they're not exposed because they're XOR'd with bytes
coming from this class. That's all this class is for.
If the pad is truly random, as it should be, then the randomness
produced by this class is irrelevant. If the pad is not truly
random, then that's tragic, but this class at least helps disguise
that fact a bit. Still, it's just a "best effort" kind of thing.
In any case, these bytes are *never* to be used as a fallback
replacement for missing pad data, obviously. They're just about
making the fuzz fuzzier; they have nothing to do with real data.
If TEST_MODE, then use pseudo-random numbers with a defined seed,
so that the same stream of random numbers is always produced.
"""
def __init__ (self, test_mode=False):
"""Initialize, optionally with integer SEED."""
if test_mode:
random.seed(1729)
def rand_bytes(self, num):
"""Return NUM random bytes."""
try:
# TODO: Right now we offer pseudo-random bytes based on whatever
# random seed Python is using (unless test_mode, in which case
# the seed is predefined). Even though the random bytes
# produced here are not essential to the security of OneTime's
# output, still it would be best if they were as random as we
# could make them. We could use a random.SystemRandom object
# here if one is available, but the Python documentation says
# that class is not supported on all systems, while not saying
# what error is raised if it's not supported. (Maybe one is
# just supposed to check directly for it in random.__dict__?)
raise NotImplementedError("just testing")
return os.urandom(num)
except NotImplementedError:
ret_data = bytearray(b'\x00' * num)
for i in range(num):
ret_data[i] = chr(random.randint(0, 255))
return str(ret_data)
class PadSession:
"""An encrypter/decrypter associated with a pad at an offset.
Feed bytes through convert() to XOR them against that pad.
A PadSession is used for a single encryption or decryption session; it
should not be used for subsequent sessions -- instead, a new PadSession
object should be generated (it might refer to the same underlying pad file,
but it still needs to be a new object due to certain initializations)."""
# Length of the front stretch of pad used for the ID.
_id_source_length = 32
# The plaintext is authenticated with a SHA256 hash digest
# (computed in self._session_hash) that is itself encrypted with the
# pad and included in the ciphertext. This is the length of that
# digest. Note it is the length of the raw digest, not the length
# of a hexadecimal representation of the digest.
_digest_length = 32
# Number of contiguous raw pad bytes to use as the source
# material for the digest computed in self._session_hash.
_digest_source_length = 32
def __init__(self, pad_path, config_area=None,
no_trace=False, test_mode=False):
"""Make a new pad session, with padfile PAD_PATH.
The pad session cannot be used for encrypting or decrypting until
set_offset() is called. If CONFIG_AREA is not None, it is the
directory containing the pad-records file. If NO_TRACE, then
don't make any changes in the configuration area. If TEST_MODE,
then use pseudo-random numbers with a defined seed, so that output
is always the same when the input is the same."""
self.pad_path = pad_path
self.config = Configuration(self, config_area)
self._no_trace = no_trace
self.padfile = open(self.pad_path, "rb")
self.pad_size = os.stat(self.pad_path)[stat.ST_SIZE]
self._offset = None # where to start using pad bytes (must initialize)
self._length = 0 # number of pad bytes used this time
self._default_fuzz_source_length = 2 # See _get_fuzz_length_from_pad()
self._default_fuzz_source_modulo = 512 # and see _make_inner_header().
self._randy = RandomEnough(test_mode)
# These are just caches for self.id(), which see.
self._id = None
self._original_format_level_id = None
# If this session saw a particular format level, record it so we
# can check that it remains consistent.
self._format_level = None
# On decrypting, a given call to convert() might not supply enough
# string to use up the inner headers. Therefore we must remember
# how much of the head_fuzz still needs to be used up.
self._fuzz_remaining_to_consume = 0
# We compute a hash of the plaintext head fuzz + plaintext message
# and embed that hash into the encrypted text, for authentication
# of the overall encrypted message. See self._initialize_hash().
self._session_hash = None
# There are both head and tail fuzz, but we only need to remember
# the tail fuzz length, because we learn that length at the start
# of processing but wait till the end to emit or consume it --
# whereas head fuzz we emit/consume as soon as we know its length.
self._tail_fuzz_length = None
self._tail_fuzz_length_source_bytes = None
# This buffer holds all encrypted head fuzz bytes that have not
# yet been emitted by convert(). That is, when this buffer is not
# empty, then it is what convert() needs to emit *before* it emits
# anything else -- the first part of the output of the first call.
self._head_buffer = ''
# This buffer always holds the latest input, and must always be at
# least as long as the digest + tail fuzz, so that we can
# refrain from decrypting them as part of the original plaintext.
self._tail_buffer = ''
# Most of what a pad session does is the same for encrypting and
# decrypting -- after all, the conversion step is symmetrical (XOR).
#
# However, before conversion can happen, the pad session needs to know
# whether to write or read the inner header flag bytes -- so for
# that it needs to know whether it's encrypting or decrypting. When
# that step is done, the appropriate variable below is set;
# exactly one of them *must* be set before any conversion happens.
self._encrypting = False
self._decrypting = False
# False until conversion starts, True thereafter. (Conversion
# starts after all the head fuzz has been consumed.)
self._begun = False
# Register with config as last thing we do.
self.config.register()
class PadSessionUninitialized(Exception):
"Exception raised if PadSession hasn't been initialized yet."
pass
class OverPrepared(Exception):
"Exception raised if a PadSession is initialized or prepared twice."
pass
class PadShort(Exception):
"Exception raised if pad doesn't have enough data for this encryption."
pass
class FormatLevel(Exception):
"Exception raised for an unknown or inconsistent format level."
pass
class InnerFormat(Exception):
"Exception raised if something is wrong with the inner format."
pass
class FuzzMismatch(Exception):
"Exception raised if the amount of tail fuzz is incorrect."
# In practice this error can never happen for head fuzz, because
# if the length of the head fuzz is wrong, the digest will not
# match either, and we'll catch the digest error first.
pass
class DigestMismatch(Exception):
"Exception raised if a digest check fails."
pass
def _initialize_hash(self):
"""Initialize the session hash with some raw pad bytes."""
if self._offset + self._digest_source_length >= self.pad_size:
raise PadSession.PadShort(
"digest initialization failed because pad too short")
digest_source_bytes = self.padfile.read(self._digest_source_length)
self._length += self._digest_source_length
if self._session_hash is not None:
raise PadSession.OverPrepared(
"pad session hash was prematurely initialized")
self._session_hash = hashlib.sha256()
self.digest_gulp(digest_source_bytes)
def prepare_for_encryption(self):
"""Mark this PadSession as encrypting. This or prepare_for_decryption()
must be called exactly once, before any conversion happens."""
if self._encrypting:
raise PadSession.OverPrepared("already prepared for encryption")
if self._decrypting:
raise PadSession.OverPrepared(
"cannot prepare for both encryption and decryption")
self._head_buffer = self._make_inner_header()
self._encrypting = True
def prepare_for_decryption(self):
"""Mark this PadSession as encrypting. This or prepare_for_encryption()
must be called exactly once, before any conversion happens."""
if self._decrypting:
raise PadSession.OverPrepared("already prepared for decryption")
if self._encrypting:
raise PadSession.OverPrepared(
"cannot prepare for both decryption and encryption")
# The fact that we don't call self._handle_inner_header() here is
# an unfortunate asymmetry w.r.t. self.prepare_for_encryption().
# The reason for it is that the decryption code flow is
# complicated by the need to handle remainder input, in a way that
# encryption is not. This is why self._handle_inner_header() has
# to be called from self.convert().
self._decrypting = True
def set_offset(self, offset):
"""Set this pad session's encryption/decryption offset to OFFSET."""
if offset >= self.pad_size:
raise PadSession.PadShort("offset exceeds pad size, need more pad")
self._offset = offset
self.padfile.seek(self._offset)
def convert(self, string, format_level="internal"):
"""If STRING is not empty or None, return it as XORed against the pad;
else return the empty string. Note STRING may be empty on intermediate
calls simply because a compressor has not yet had enough incoming data to
work with, not necessarily because input is ended yet.
If FORMAT_LEVEL is "original", then don't handle the head and tail
used by the later format levels. Otherwise, do handle the head and
tail: for the head, consume over the fuzz and include it in the
overall message digest; for the tail, just remember its length so we
can consume it later.
It is an error to call this multiple times with different FORMAT_LEVEL
values, for a given PadSession instance. Whatever you pass the first time
must be used for all subsequent calls with that instance.
"""
result = ''
if self._offset is None:
raise PadSession.PadSessionUninitialized(
"pad session not yet initialized (no offset)")
if self._format_level is None:
self._format_level = format_level
elif self._format_level != format_level:
raise PadSession.FormatLevel(
"inconsistent format levels requested: '%s' and '%s'"
% (self._format_level, format_level))
if format_level == "internal":
if self._encrypting and self._decrypting:
raise PadSession.OverPrepared(
"pad session cannot encrypt and decrypt simultaneously")
elif not self._encrypting and not self._decrypting:
raise PadSession.PadSessionUninitialized(
"pad session not yet prepared for either encrypting or decrypting")
elif not self._begun:
if self._decrypting:
# In the decryption case, the only way we receive any head
# fuzz material is during the initial call(s) to convert().
# So here we check for that and make sure to consume all the
# head fuzz before continuing on to regular decryption. In
# theory, this could involve multiple calls to convert,
# although in practice it always seems to get done during
# the first call.
#
# TODO (minor): It's possible string might be so short that
# it doesn't even contain enough information to know the fuzz
# length yet. The solution is easy: if we haven't yet begun,
# then just accumulate string to prepend to the next call(s),
# until a call comes when we have enough to work with.
#
# This is not an urgent problem, as in practice no I/O system
# is likely to deliver string in chunks so small. So, saving
# it to solve later.
# The complement of this call to self._handle_inner_header()
# is located in self._prepare_for_encryption() in the
# encryption case, which also prepares self._head_buffer for
# the first call to convert(). However, the decryption case
# is complicated by the need to return remainder
# information, in a way that the encryption case is not.
# This asymmetry is reflected in the code.
string, fuzz_remaining = self._handle_inner_header(string)
if string != "" and fuzz_remaining != 0:
raise PadSession.InnerFormat(
"Got both a result string and a pad remainder")
if fuzz_remaining > 0:
self._fuzz_remaining_to_consume = fuzz_remaining
if self._fuzz_remaining_to_consume > 0:
new_fuzz_remaining = 0
num_bytes_to_consume_now = None
if self._fuzz_remaining_to_consume > len(string):
new_fuzz_remaining = self._fuzz_remaining_to_consume - len(string)
num_bytes_to_consume_now = len(string)
else:
num_bytes_to_consume_now = self._fuzz_remaining_to_consume
self._fuzz_remaining_to_consume = new_fuzz_remaining
num_fuzz_bytes_remaining, string = self._consume_fuzz_bytes(
num_bytes_to_consume_now, string, is_head_fuzz=True)
self._fuzz_remaining_to_consume += num_fuzz_bytes_remaining
# Once we've handled any inner headers, buffer for decrypting.
if self._decrypting:
self._tail_buffer += string
# Reserve exactly the tail length each time, so that on the
# last iteration we can just check both parts of the tail
# (the digest and the fuzz) without emitting new output.
if len(self._tail_buffer) < (PadSession._digest_length
+ self._tail_fuzz_length):
string = '' # wait until we have more or are done
else:
string = self._tail_buffer[:(0 - (PadSession._digest_length
+ self._tail_fuzz_length))]
self._tail_buffer = self._tail_buffer[
(0 - (PadSession._digest_length + self._tail_fuzz_length)):]
string_len = len(string)
pad_str = self.padfile.read(string_len)
if len(pad_str) < string_len:
raise PadSession.PadShort(
"not enough pad data available to finish encryption")
for i in range(string_len):
result += chr(ord(string[i]) ^ ord(pad_str[i]))
self._length += string_len
self._begun = True
if self._head_buffer:
# In the encryption case, we generated the head fuzz entirely
# internally during the preparation stage, and just buffered it
# for prepending during the first call to convert(). So if
# we're here, then it must be the first call to convert() when
# encrypting, and it's time to use and empty that buffer.
#
# (The decryption case is not quite symmetrical. We can't
# consume the head fuzz during the preparation stage, because at
# that point we haven't received any of the input yet -- the
# only route for receiving input is via calls to convert(). So
# the complement of this code, in the decryption case, is the
# handling of head fuzz before the self._begun flag is set.)
result = self._head_buffer + result
self._head_buffer = ''
return result
def _get_id(self):
"""Get the ID for this session's underlying pad.
(The ID is just the pad's first 32 bytes in hexadecimal.)"""
# The astute reader may ask: why are we bothering to make a 32
# byte hash of 32 bytes worth of random data, instead of just
# using the data itself (expressed in hexadecimal) as the pad ID?
# The answer is just tradition, really. Well, and the very slight
# possibility that if there's *something* not quite random about
# the pad, even though that's bad, we can at least avoid revealing
# that fact by exposing the first 32 bytes of the pad. If a
# pad-records file gets leaked, that shouldn't show anything of
# interest about the pad itself, only about how much the pad has
# been used.
saved_posn = self.padfile.tell()
self.padfile.seek(0)
sha256 = hashlib.sha256()
string = self.padfile.read(PadSession._id_source_length)
sha256.update(string)
self.padfile.seek(saved_posn)
return sha256.hexdigest()
def _get_original_format_level_id(self):
"""Get the OneTime \"original\" format level ID for the session pad.
In that format level, pad IDs were based on the first 1024
(octet) bytes of the pad. This was needlessly spendy, or rather,
it would have been needlessly spendy if OneTime 1.x had been
paranoid enough to not use any of those bytes for encryption.
Version 2.0 fixed this, reducing the number of bytes used on ID but
also making they are not used for encryption."""
saved_posn = self.padfile.tell()
self.padfile.seek(0)
sha1 = hashlib.sha1()
string = self.padfile.read(1024)
sha1.update(string)
self.padfile.seek(saved_posn)
return sha1.hexdigest()
def id(self, format_level="internal"):
"""Return the pad ID of the pad belonging to this pad session.
If FORMAT_LEVEL is specified, return ID according to that level."""
if format_level == "internal":
if self._id is None:
self._id = self._get_id()
return self._id
elif format_level == "original":
if self._original_format_level_id is None:
self._original_format_level_id = self._get_original_format_level_id()
return self._original_format_level_id
else:
raise PadSession.FormatLevel("unknown format \"%s\" for ID"
% format_level)
def path(self):
"""Return the pad's path."""
return self.pad_path
def offset(self):
"""Return offset from which encryption/decryption starts."""
return self._offset
def length(self):
"""Return the number of pad bytes used so far."""
return self._length
def _get_fuzz_length_from_pad(self, num_bytes, modulo):
"""Calculate a fuzz length based on the next NUM_BYTES % MODULO,
advancing the pad accordingly. Return that length and the raw pad
data used to calculate it in a tuple of the form:
[calculated_length, source_bytes]
"""
# What's going on here? What is "fuzz"?
#
# "Fuzz" is some random data that pads the plaintext+digest on
# either side; the fuzz in front is "head fuzz" and the fuzz at
# the end is "tail fuzz". Fuzz consists of a random length of
# random bytes -- the length is computed from pad, the bytes
# themselves are generated randomly at run time), and XOR'd
# against pad -- such the position of the plaintext+digest is not
# known even to an attacker who can see the pad-records file.
#