# Python Optimizations

Will use CPython, the standard (or reference) Python
implementation (written in C).<br>

But there are other Python implementations out there. These include:
• Jython – written in Java and can import and use any Java class – in fact it even
compiles to Java bytecode which can then run in a JVM. <br>
• IronPython – this one is written in C# and targets .Net (and mono) CLR. <br>
• PyPy – this one is written in RPython (which is itself a statically-typed subset of Python
written in C that is specifically designed to write interpreters)<br>
• and many more…<br>
https://wiki.python.org/moin/PythonImplementations <br>

`Interning`: reusing objects on-demand.<br>
At startup, Python (CPython), pre-loads (caches) a global list of integers in the range [-5, 256]<br>



In [1]:
a = 10 
b = 10

print(id(a), id(b) )

140703527811136 140703527811136


In [2]:
a = 257

b = 257

print(id(a), id(b) )

2779271001104 2779271001168


Python does not use that global list and a new object
is created every time

In [3]:
a = 300

b = 300

print(id(a), id(b) )

2779271000240 2779271001328


#### string interning<br>

Some strings are also automatically interned – but not all!<br>

As the Python code is compiled, identifiers are interned.<br>

Identifiers: must start with _ or a letter and can only contain _, letters and numbers.<br>

Why do this?<br>

It’s all about (speed and, possibly, memory) optimization.<br>



In [5]:
a = 'some_long_string'
b = 'some_long_string'

print( id(a), id(b))

2779271182256 2779271182256


In [6]:
a is b

True

In [7]:
a == b

True

In [8]:
a = 'hello world'
b = 'hello world'

print( id(a), id(b))

2779270951664 2779270951472


In [9]:
a is b

False

In [10]:
a == b

True

Not all strings are automatically interned by Python.<br>
But you can force strings to be interned by using the sys.intern() method.



In [11]:
import sys

a = sys.intern('the quick brown fox')
b = sys.intern('the quick brown fox')

In [12]:
a is b

True

In [13]:
a == b

True

When should you do this?<br>

• dealing with a large number of strings that could have high repetition
e.g. tokenizing a large corpus of text (NLP).<br>

• lots of string comparisons

### This is another variety of optimizations that can occur at compile time.

#### Constant Expressions

how Python reduces constant expressions for optimization purposes:

In [14]:
def my_func():
    a = 24 * 60
    b = (1, 2) * 5
    c = 'abc' * 3
    d = 'ab' * 11
    e = 'the quick brown fox' * 10
    f = [1, 2] * 5


In [15]:
my_func.__code__.co_consts

(None,
 1440,
 (1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
 'abcabcabc',
 'ababababababababababab',
 'the quick brown foxthe quick brown foxthe quick brown foxthe quick brown foxthe quick brown foxthe quick brown foxthe quick brown foxthe quick brown foxthe quick brown foxthe quick brown fox',
 1,
 2,
 5)

As you can see in the example above, `24 * 60` was pre-calculated and cached as a constant (`1440`).

Similarly, `(1, 2) * 5` was cached as `(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)` and `'abc' * 3` was cached as `abcabcabc`.

On the other hand, note how `'the quick brown fox' * 10` was **not** pre-calculated (too long).

Similarly `[1, 2] * 5` was not pre-calculated either since a list is *mutable*, and hence not a *constant*.

#### Membership Tests

In [18]:
def my_func():
    if e in [1, 2, 3]:
        pass
    

In [20]:
my_func.__code__.co_consts


(None, (1, 2, 3))

In [21]:
def my_func():
    if e in {1, 2, 3}:
        pass
    

In [22]:
my_func.__code__.co_consts

(None, frozenset({1, 2, 3}))

In [23]:
import string
import time 

char_list = list(string.ascii_letters)
char_tuple = tuple(string.ascii_letters)
char_set = set(string.ascii_letters)

print(char_list)
print()
print(char_tuple)
print()
print(char_set)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']

('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z')

{'v', 'O', 'i', 'o', 'Y', 'I', 'J', 'x', 'c', 'l', 'D', 'E', 'n', 'b', 'K', 'h', 'd', 'X', 'B', 'f', 'H', 'r', 'a', 'g', 'e', 'k', 's', 'N', 'F', 'q', 'M', 'Z', 'm', 'G', 'Q', 'P', 'z', 'W', 'w', 'y', 'L', 'T', 'j', 'V', 'u', 'C', 'U', 'S', 't', 'A', 'p', 'R'}


In [24]:
def membership_test(n, container):
    for i in range(n):
        if 'p' in container:
            pass
        

In [25]:
start = time.perf_counter()
membership_test(10000000, char_list)
end = time.perf_counter()
print('list membership: ', end-start)


list membership:  3.912742762000107


In [26]:
start = time.perf_counter()
membership_test(10000000, char_set)
end = time.perf_counter()
print('set membership: ', end-start)


set membership:  0.6530779730001086
