-
Notifications
You must be signed in to change notification settings - Fork 327
/
python_primer.Rmd
1146 lines (885 loc) · 32.6 KB
/
python_primer.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Primer on Python for R Users"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Primer on Python for R Users}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: 72
---
<!-- ```{r setup, include=FALSE} -->
<!-- library(reticulate) -->
<!-- # this vignette requires python 3.8 or newer -->
<!-- eval <- tryCatch({ -->
<!-- config <- py_config() -->
<!-- numeric_version(config$version) >= "3.8" && py_numpy_available() -->
<!-- }, error = function(e) FALSE) -->
<!-- knitr::opts_chunk$set( -->
<!-- collapse = TRUE, -->
<!-- comment = "#>", -->
<!-- eval = eval -->
<!-- ) -->
<!-- ``` -->
``` r
library(reticulate)
```
## Primer on Python for R users
You may find yourself wanting to read and understand some Python, or
even port some Python to R. This guide is designed to enable you to do
these tasks as quickly as possible. As you'll see, R and Python are
similar enough that this is possible without necessarily learning all of
Python. We start with the basics of container types and work up to the
mechanics of classes, dunders, the iterator protocol, the context
protocol, and more!
### Whitespace
Whitespace matters in Python. In R, expressions are grouped into a code
block with `{}`. In Python, that is done by making the expressions share
an indentation level. For example, an expression with an R code block
might be:
``` r
if (TRUE) {
cat("This is one expression. \n")
cat("This is another expression. \n")
}
#> This is one expression.
#> This is another expression.
```
The equivalent in Python:
``` python
if True:
print("This is one expression.")
print("This is another expression.")
#> This is one expression.
#> This is another expression.
```
Python accepts tabs or spaces as the indentation spacer, but the rules
get tricky when they're mixed. Most style guides suggest (and IDE's
default to) using spaces only.
### Container Types
In R, the `list()` is a container you can use to organize R objects. R's
`list()` is feature packed, and there is no single direct equivalent in
Python that supports all the same features. Instead there are (at least)
4 different Python container types you need to be aware of: lists,
dictionaries, tuples, and sets.
#### Lists
Python lists are typically created using bare brackets `[]`. The Python
built-in `list()` function is more of a coercion function, closer in
spirit to R's `as.list()`. The most important thing to know about Python
lists is that they are modified in place. Note in the example below that
`y` reflects the changes made to `x`, because the underlying list object
which both symbols point to is modified in place.
``` python
x = [1, 2, 3]
y = x # `y` and `x` now refer to the same list!
x.append(4)
print("x is", x)
#> x is [1, 2, 3, 4]
print("y is", y)
#> y is [1, 2, 3, 4]
```
One Python idiom that might be concerning to R users is that of growing
lists through the `append()` method. Growing lists in R is typically
slow and best avoided. But because Python's list are modified in place
(and a full copy of the list is avoided when appending items), it is
efficient to grow Python lists in place.
Some syntactic sugar around Python lists you might encounter is the
usage of `+` and `*` with lists. These are concatenation and replication
operators, akin to R's `c()` and `rep()`.
``` python
x = [1]
x
#> [1]
x + x
#> [1, 1]
x * 3
#> [1, 1, 1]
```
You can index into lists with integers using trailing `[]`, but note
that indexing is 0-based.
``` python
x = [1, 2, 3]
x[0]
#> 1
x[1]
#> 2
x[2]
#> 3
try:
x[3]
except Exception as e:
print(e)
#> list index out of range
```
When indexing, negative numbers count from the end of the container.
``` python
x = [1, 2, 3]
x[-1]
#> 3
x[-2]
#> 2
x[-3]
#> 1
```
You can slice ranges of lists using the `:` inside brackets. Note that
the slice syntax is ***not*** inclusive of the end of the slice range.
You can optionally also specify a stride.
``` python
x = [1, 2, 3, 4, 5, 6]
x[0:2] # get items at index positions 0, 1
#> [1, 2]
x[1:] # get items from index position 1 to the end
#> [2, 3, 4, 5, 6]
x[:-2] # get items from beginning up to the 2nd to last.
#> [1, 2, 3, 4]
x[:] # get all the items (idiom used to copy the list so as not to modify in place)
#> [1, 2, 3, 4, 5, 6]
x[::2] # get all the items, with a stride of 2
#> [1, 3, 5]
x[1::2] # get all the items from index 1 to the end, with a stride of 2
#> [2, 4, 6]
```
#### Tuples
Tuples behave like lists, except they are not mutable, and they don't
have the same modify-in-place methods like `append()`. They are
typically constructed using bare `()`, but parentheses are not strictly
required, and you may see an implicit tuple being defined just from a
comma separated series of expressions. Because parentheses can also be
used to specify order of operations in expressions like `(x + 3) * 4`, a
special syntax is required to define tuples of length 1: a trailing
comma. Tuples are most commonly encountered in functions that take a
variable number of arguments.
``` python
x = (1, 2) # tuple of length 2
type(x)
#> <class 'tuple'>
len(x)
#> 2
x
#> (1, 2)
x = (1,) # tuple of length 1
type(x)
#> <class 'tuple'>
len(x)
#> 1
x
#> (1,)
x = () # tuple of length 0
print(f"{type(x) = }; {len(x) = }; {x = }")
#> type(x) = <class 'tuple'>; len(x) = 0; x = ()
# example of an interpolated string literals
x = 1, 2 # also a tuple
type(x)
#> <class 'tuple'>
len(x)
#> 2
x = 1, # beware a single trailing comma! This is a tuple!
type(x)
#> <class 'tuple'>
len(x)
#> 1
```
##### Packing and Unpacking
Tuples are the container that powers the *packing* and *unpacking*
semantics in Python. Python provides the convenience of allowing you to
assign multiple symbols in one expression. This is called *unpacking*.
For example:
``` python
x = (1, 2, 3)
a, b, c = x
a
#> 1
b
#> 2
c
#> 3
```
(You can access similar unpacking behavior from R using
`` zeallot::`%<-%` ``).
Tuple unpacking can occur in a variety of contexts, such as iteration:
``` python
xx = (("a", 1),
("b", 2))
for x1, x2 in xx:
print("x1 = ", x1)
print("x2 = ", x2)
#> x1 = a
#> x2 = 1
#> x1 = b
#> x2 = 2
```
If you attempt to unpack a container to the wrong number of symbols,
Python raises an error:
``` python
x = (1, 2, 3)
a, b, c = x # success
a, b = x # error, x has too many values to unpack
#> ValueError: too many values to unpack (expected 2)
a, b, c, d = x # error, x has not enough values to unpack
#> ValueError: not enough values to unpack (expected 4, got 3)
```
It is possible to unpack a variable number of arguments, using `*` as a
prefix to a symbol. (You'll see the `*` prefix again when we talk about
functions)
``` python
x = (1, 2, 3)
a, *the_rest = x
a
#> 1
the_rest
#> [2, 3]
```
You can also unpack nested structures:
``` python
x = ((1, 2), (3, 4))
(a, b), (c, d) = x
```
#### Dictionaries
Dictionaries are most similar to R environments. They are a container
where you can retrieve items by name, though in Python the name (called
a *key* in Python's parlance) does not need to be a string like in R. It
can be any Python object with a `hash()` method (meaning, it can be
almost any Python object). They can be created using syntax like
`{key: value}`. Like Python lists, they are modified in place. Note that
`r_to_py()` converts R named lists to dictionaries.
``` python
d = {"key1": 1,
"key2": 2}
d2 = d
d
#> {'key1': 1, 'key2': 2}
d["key1"]
#> 1
d["key3"] = 3
d2 # modified in place!
#> {'key1': 1, 'key2': 2, 'key3': 3}
```
Like R environments (and unlike R's named lists), you cannot index into
a dictionary with an integer to get an item at a specific index
position. Dictionaries are *unordered* containers. (However---beginning
with Python 3.7, dictionaries do preserve the item insertion order).
``` python
d = {"key1": 1, "key2": 2}
d[1] # error
#> KeyError: 1
```
A container that closest matches the semantics of R's named list is the
[`OrderedDict`](https://docs.python.org/3/library/collections.html#collections.OrderedDict),
but that's relatively uncommon in Python code so we don't cover it
further.
#### Sets
Sets are a container that can be used to efficiently track unique items
or deduplicate lists. They are constructed using `{val1, val2}` (like a
dictionary, but without `:`). Think of them as dictionary where you only
use the keys. Sets have many efficient methods for membership
operations, like `intersection()`, `issubset()`, `union()` and so on.
``` python
s = {1, 2, 3}
type(s)
#> <class 'set'>
s
#> {1, 2, 3}
s.add(1)
s
#> {1, 2, 3}
```
### Iteration with `for`
The `for` statement in Python can be used to iterate over any kind of
container.
``` python
for x in [1, 2, 3]:
print(x)
#> 1
#> 2
#> 3
```
R has a relatively limited set of objects that can be passed to `for`.
Python by comparison, provides an iterator protocol interface, which
means that authors can define custom objects, with custom behavior that
is invoked by `for`. (We'll have an example for how to define a custom
iterable when we get to classes). You may want to use a Python iterable
from R using reticulate, so it's helpful to peel back the syntactic
sugar a little to show what the `for` statement is doing in Python, and
how you can step through it manually.
There are two things that happen: first, an iterator is constructed from
the supplied object. Then, the new iterator object is repeatedly called
with `next()` until it is exhausted.
``` python
l = [1, 2, 3]
it = iter(l) # create an iterator object
it
#> <list_iterator object at 0x1402267a0>
# call `next` on the iterator until it is exhausted:
next(it)
#> 1
next(it)
#> 2
next(it)
#> 3
next(it)
#> StopIteration
```
In R, you can use reticulate to step through an iterator the same way.
``` r
library(reticulate)
l <- r_to_py(list(1, 2, 3))
it <- as_iterator(l)
iter_next(it)
#> 1.0
iter_next(it)
#> 2.0
iter_next(it)
#> 3.0
iter_next(it, completed = "StopIteration")
#> [1] "StopIteration"
```
Iterating over dictionaries first requires understanding if you are
iterating over the keys, values, or both. Dictionaries have methods that
allow you to specify which.
``` python
d = {"key1": 1, "key2": 2}
for key in d:
print(key)
#> key1
#> key2
for value in d.values():
print(value)
#> 1
#> 2
for key, value in d.items():
print(key, ":", value)
#> key1 : 1
#> key2 : 2
```
#### Comprehensions
Comprehensions are special syntax that allow you to construct a
container like a list or a dict, while also executing a small operation
or single expression on each element. You can think of it as special
syntax for R's `lapply`.
For example:
``` python
x = [1, 2, 3]
# a list comprehension built from x, where you add 100 to each element
l = [element + 100 for element in x]
l
#> [101, 102, 103]
# a dict comprehension built from x, where the key is a string.
# Python's str() is like R's as.character()
d = {str(element) : element + 100
for element in x}
d
#> {'1': 101, '2': 102, '3': 103}
```
### Defining Functions with `def`
Python functions are defined with the `def` statement. The syntax for
specifying function arguments and default values is very similar to R.
``` python
def my_function(name = "World"):
print("Hello", name)
my_function()
#> Hello World
my_function("Friend")
#> Hello Friend
```
The equivalent R snippet would be
``` r
my_function <- function(name = "World") {
cat("Hello", name, "\n")
}
my_function()
#> Hello World
my_function("Friend")
#> Hello Friend
```
Unlike R functions, the last value in a function is not automatically
returned. Python requires an explicit return statement.
``` python
def fn():
1
print(fn())
#> None
def fn():
return 1
print(fn())
#> 1
```
(Note for advanced R users: Python has no equivalent of R's argument
"promises". Function argument default values are evaluated once, when
the function is constructed. This can be surprising if you define a
Python function with a mutable object as a default argument value, like
a Python list!)
``` python
def my_func(x = []):
x.append("was called")
print(x)
my_func()
#> ['was called']
my_func()
#> ['was called', 'was called']
my_func()
#> ['was called', 'was called', 'was called']
```
You can also define Python functions that take a variable number of
arguments, similar to `...` in R. A notable difference is that R's `...`
makes no distinction between named and unnamed arguments, but Python
does. In Python, prefixing a single `*` captures unnamed arguments, and
two `**` signifies that *keyword* arguments are captured.
``` python
def my_func(*args, **kwargs):
print("args = ", args) # args is a tuple
print("kwargs = ", kwargs) # kwargs is a dictionary
my_func(1, 2, 3, a = 4, b = 5, c = 6)
#> args = (1, 2, 3)
#> kwargs = {'a': 4, 'b': 5, 'c': 6}
```
Whereas the `*` and `**` in a function definition signature *pack*
arguments, in a function call they *unpack* arguments. Unpacking
arguments in a function call is equivalent to using `do.call()` in R.
``` python
def my_func(a, b, c):
print(a, b, c)
args = (1, 2, 3)
my_func(*args)
#> 1 2 3
kwargs = {"a": 1, "b": 2, "c": 3}
my_func(**kwargs)
#> 1 2 3
```
### Defining Classes with `class`
One could argue that in R, the preeminent unit of composition for code
is the `function`, and in Python, it's the `class`. You can be a very
productive R user and never use R6, reference classes, or similar R
equivalents to the object-oriented style of Python `class`'s.
In Python, however, understanding the basics of how `class` objects work
is requisite knowledge, because `class`'s are how you organize and find
methods in Python. (In contrast to R's approach, where methods are found
by dispatching from a generic). Fortunately, the basics of `class`'s are
accessible.
Don't be intimidated if this is your first exposure to object oriented
programming. We'll start by building up a simple Python class for
demonstration purposes.
``` python
class MyClass:
pass # `pass` means do nothing.
MyClass
#> <class '__main__.MyClass'>
type(MyClass)
#> <class 'type'>
instance = MyClass()
instance
#> <__main__.MyClass object at 0x14023b260>
type(instance)
#> <class '__main__.MyClass'>
```
Like the `def` statement, the `class` statement binds a new callable
symbol, `MyClass`. First note the strong naming convention, classes are
typically `CamelCase`, and functions are typically `snake_case`. After
defining `MyClass`, you can interact with it, and see that it has type
`'type'`. Calling `MyClass()` creates a new object **instance** of the
class, which has type `'MyClass'` (ignore the `__main__.` prefix for
now). The instance prints with its memory address, which is a strong
hint that it's common to be managing many instances of a class, and that
the instance is mutable (modified-in-place by default).
In the first example, we defined an empty `class`, but when we inspect
it we see that it already comes with a bunch of attributes (`dir()` in
Python is equivalent to `names()` in R):
``` python
dir(MyClass)
#> ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__']
```
#### What are all the underscores?
Python typically indicates that something is special by wrapping the
name in double underscores. A special double-underscore-wrapped token is
commonly called a "dunder". "Special" is not a technical term, it just
means that the token invokes a Python language feature. Some dunder
tokens are merely ways code authors can plug into specific syntactic
sugars, others are values provided by the interpreter that would be
otherwise hard to acquire, yet others are for extending language
interfaces (e.g., the iteration protocol), and finally, a small handful
of dunders are truly complicated to understand. Fortunately, as an R
user looking to use some Python features through reticulate, you only
need to know about a few easy-to-understand dunders.
The most common dunder method you'll encounter when reading Python code
is `__init__()`. This is a function that is called when the class
constructor is called, that is, when a class is **instantiated**. It is
meant to initialize the new class instance. (In very sophisticated code
bases, you may also encounter classes where `__new__` is also defined,
this is called before `__init__`).
``` python
class MyClass:
print("MyClass's definition body is being evaluated")
def __init__(self):
print(self, "is initializing")
#> MyClass's definition body is being evaluated
print("MyClass is finished being created")
#> MyClass is finished being created
instance = MyClass()
#> <__main__.MyClass object at 0x140266330> is initializing
print(instance)
#> <__main__.MyClass object at 0x140266330>
instance2 = MyClass()
#> <__main__.MyClass object at 0x11e3ad490> is initializing
print(instance2)
#> <__main__.MyClass object at 0x11e3ad490>
```
A few things to note:
- the `class` statement takes a code block that is defined by a common
indentation level. The code block has the same exact semantics as
any other expression that takes a code block, like `if` and `def`.
The body of the class is evaluated only **once**, when the class
constructor is first being created. Beware that any objects defined
here are shared by all instances of the class!
- `__init__` is just a normal function, defined with `def` like any
other function. Except it's inside the class body.
- `__init__` take an argument: `self`. `self` is the class instance
being initialized (note the identical memory address between `self`
and `instance`). Also note that we didn't provide `self` when call
`MyClass()` to create the class instance, `self` was spliced into
the function call by the interpreter.
- `__init__` is called each time a new instance is created.
Functions defined inside a `class` code block are called *methods*, and
the important thing to know about methods is that each time they are
called from a class instance, the instance is spliced into the function
call as the first argument. This applies to all functions defined in a
class, including dunders. (The sole exception is if the function is
decorated with something like `@classmethod` or `@staticmethod`).
``` python
class MyClass:
def a_method(self):
print("MyClass.a_method() was called with", self)
instance = MyClass()
instance.a_method()
#> MyClass.a_method() was called with <__main__.MyClass object at 0x11e3c7f20>
MyClass.a_method() # error, missing required argument `self`
#> TypeError: MyClass.a_method() missing 1 required positional argument: 'self'
MyClass.a_method(instance) # identical to instance.a_method()
#> MyClass.a_method() was called with <__main__.MyClass object at 0x11e3c7f20>
```
Other dunder's worth knowing about are:
- `__getitem__`: the function invoked when subsetting an instance with
`[` (Equivalent to defining a `[` S3 method in R.
- `__getattr__`: the function invoked when subsetting with `.`
(Equivalent to defining a `$` S3 method in R.
- `__iter__` and `__next__`: functions invoked by `for`.
- `__call__`: invoked when a class instance is called like a function
(e.g., `instance()`).
- `__bool__`: invoked by `if` and `while` (equivalent to
`as.logical()` in R, but returning only a scalar, not a vector).
- `__repr__`, `__str__`, functions invoked for formatting and pretty
printing (akin to `format()`, `dput()`, and `print()` methods in R).
- `__enter__` and `__exit__`: functions invoked by `with`.
- Many [built-in](https://docs.python.org/3/library/functions.html)
Python functions are just sugar for invoking the dunder. For
example: calling `repr(x)` is identical to `x.__repr__()`. Other
builtins that are just sugar for invoking the dunder are `next()`,
`iter()`, `str()`, `list()`, `dict()`, `bool()`, `dir()`, `hash()`
and more!
#### Iterators, revisited
Now that we have the basics of `class`, it's time to revisit iterators.
First, some terminology:
**iterable**: something that can be iterated over. Concretely, a class
that defines an `__iter__` method, whose job is to return an *iterator*.
**iterator**: something that iterates. Concretely, a class that defines
a `__next__` method, whose job is to return the next element each time
it is called, and then raises a `StopIteration` exception once it's
exhausted.
It's common to see classes that are both iterables and iterators, where
the `__iter__` method is just a stub that returns `self`.
Here is a custom iterable / iterator implementation of Python's `range`
(similar to `seq` in R)
``` python
class MyRange:
def __init__(self, start, end):
self.start = start
self.end = end
def __iter__(self):
# reset our counter.
self._index = self.start - 1
return self
def __next__(self):
if self._index < self.end:
self._index += 1 # increment
return self._index
else:
raise StopIteration
for x in MyRange(1, 3):
print(x)
#> 1
#> 2
#> 3
# doing what `for` does, but manually
r = MyRange(1, 3)
it = iter(r)
next(it)
#> 1
next(it)
#> 2
next(it)
#> 3
next(it)
#> StopIteration
```
### Defining Generators with `yield`.
Generators are special Python functions that contain one or more `yield`
statements. As soon as `yield` is included in a code block passed to
`def`, the semantics change substantially. You're no longer defining a
mere function, but a generator constructor! In turn, calling a generator
constructor creates a generator object, which is just another type of
iterator.
Here is an example:
``` python
def my_generator_constructor():
yield 1
yield 2
yield 3
# At first glance it presents like a regular function
my_generator_constructor
#> <function my_generator_constructor at 0x1402579c0>
type(my_generator_constructor)
#> <class 'function'>
# But calling it returns something special, a 'generator object'
my_generator = my_generator_constructor()
my_generator
#> <generator object my_generator_constructor at 0x11e3ff530>
type(my_generator)
#> <class 'generator'>
# The generator object is both an iterable and an iterator
# it's __iter__ method is just a stub that returns `self`
iter(my_generator) == my_generator == my_generator.__iter__()
#> True
# step through it like any other iterator
next(my_generator)
#> 1
my_generator.__next__() # next() is just sugar for calling the dunder
#> 2
next(my_generator)
#> 3
next(my_generator)
#> StopIteration
```
Encountering `yield` is like hitting the pause button on a functions
execution, it preserves the state of everything in the function body and
returns control to whatever is iterating over the generator object.
Calling `next()` on the generator object resumes execution of the
function body until the next `yield` is encountered, or the function
finishes.
### Iteration closing remarks
Iteration is deeply baked into the Python language, and R users may be
surprised by how things in Python are iterable, iterators, or powered by
the iterator protocol under the hood. For example, the built-in `map()`
(equivalent to R's `lapply()`) yields an iterator, not a list.
Similarly, a tuple comprehension like `(elem for elem in x)` produces an
iterator. Most features dealing with files are iterators, and so on.
Any time you find an iterator inconvenient, you can materialize all the
elements into a list using the Python built-in `list()`, or
`reticulate::iterate()` in R. Also, if you like the readability of
`for`, you can utilize similar semantics to Python's `for` using
`coro::loop()`.
### `import` and Modules
In R, authors can bundle their code into shareable extensions called R
packages, and R users can access objects from R packages via `library()`
or `::`. In Python, authors bundle code into *modules*, and users access
modules using `import`. Consider the line:
``` python
import numpy
```
This statement has Python go out to the file system, find an installed
Python module named 'numpy', load it (commonly meaning: evaluate its
`__init__.py` file and construct a `module` type), and bind it to the
symbol `numpy`.
The closest equivalent to this in R might be:
``` r
dplyr <- loadNamespace("dplyr")
```
#### Where are modules found?
In Python, the file system locations where modules are searched can be
accessed (and modified) from the list found at `sys.path`. This is
Python's equivalent to R's `.libPaths()`. `sys.path` will typically
contain paths to the current working directory, the Python installation
which contains the built-in standard library, administrator installed
modules, user installed modules, values from environment variables like
`PYTHONPATH`, and any modifications made directly to `sys.path` by other
code in the current Python session (though this is relatively uncommon
in practice).
``` python
import sys
sys.path
#> ['', '/Users/tomasz/.pyenv/versions/3.12.4/bin', '/Users/tomasz/.pyenv/versions/3.12.4/lib/python312.zip', '/Users/tomasz/.pyenv/versions/3.12.4/lib/python3.12', '/Users/tomasz/.pyenv/versions/3.12.4/lib/python3.12/lib-dynload', '/Users/tomasz/.virtualenvs/r-reticulate/lib/python3.12/site-packages', '/Users/tomasz/github/rstudio/reticulate/inst/python', '/Users/tomasz/.virtualenvs/r-reticulate/lib/python312.zip', '/Users/tomasz/.virtualenvs/r-reticulate/lib/python3.12', '/Users/tomasz/.virtualenvs/r-reticulate/lib/python3.12/lib-dynload']
```
You can inspect where a module was loaded from by accessing the dunder
`__path__` or `__file__` (especially useful when troubleshooting
installation issues):
``` python
import os
os.__file__
#> '/Users/tomasz/.virtualenvs/r-reticulate/lib/python3.12/os.py'
numpy.__path__
#> ['/Users/tomasz/.virtualenvs/r-reticulate/lib/python3.12/site-packages/numpy']
```
Once a module is loaded, you can access symbols from the module using
`.` (equivalent to `::`, or maybe `$.environment`, in R).
``` python
numpy.abs(-1)
#> 1
```
There is also special syntax for specifying the symbol a module is bound
to upon import, and for importing only some specific symbols.
``` python
import numpy # import
import numpy as np # import and bind to a custom symbol `np`
np is numpy # test for identicalness, similar to identical(np, numpy)
#> True
from numpy import abs # import only `numpy.abs`, bind it to `abs`
abs is numpy.abs
#> True
from numpy import abs as abs2 # import only `numpy.abs`, bind it to `abs2`
abs2 is numpy.abs
#> True
```
If you're looking for the Python equivalent of R's `library()`, which
makes all of a package's exported symbols available, it might be using
`import` with a `*` wildcard, though it's relatively uncommon to do so.
The `*` wildcard will expand to include all the symbols in module, or
all the symbols listed in `__all__`, if it is defined.
``` python
from numpy import *
```
Python doesn't make a distinction like R does between package exported
and internal symbols. In Python, all module symbols are equal, though
there is the naming convention that intended-to-be-internal symbols are
prefixed with a single leading underscore. (Two leading underscores
invoke an advanced language feature called "name mangling", which is
outside the scope of this introduction).
### Integers and Floats
R users generally don't need to be aware of the difference between
integers and floating point numbers, but that's not the case in Python.
If this is your first exposure to numeric data types, here are the
essentials:
- integer types can only represent whole numbers like `1` or `2`, not
floating point numbers like `1.2`.
- floating-point types can represent any number, but with some degree
of imprecision.
In R, writing a bare literal number like `12` produces a floating point
type, whereas in Python, it produces an integer. You can produce an
integer literal in R by appending an `L`, as in `12L`. Many Python
functions expect integers, and will error when provided a float.
For example, say we have a Python function that expects an integer:
``` python
def a_strict_Python_function(x):
assert isinstance(x, int), "x is not an int"
print("Yay! x was an int")
```
When calling it from R, you must be sure to call it with an integer: