Some strings are also automatically *interned* - but not all!

As the Python code is compiled, identifiers are interned, such as:
- variable names
- function names
- etc.

Identifiers:
- must start with _ or a letter
- can only contain _, letters, and numbers

**Some** string literals may also be automatically interned.

Python does this as a means of optimization.


In [1]:
a = 'hello'
b = 'hello'

In [2]:
print(id(a), id(b))

140567978570352 140567978570352


In [3]:
a = 'hello world'
b = 'hello world'

In [4]:
print(id(a), id(b))

140567978632944 140567978632880


In [5]:
a == b

True

In [6]:
a is b

False

In [12]:
a = 'hello'
b = 'hello'

In [13]:
a == b

True

In [14]:
a is b

True

In [16]:
a = '_this_is_a_long_string_that_could_be_used_as_an_identifier'

In [17]:
b = '_this_is_a_long_string_that_could_be_used_as_an_identifier'

In [18]:
a is b

True

In [19]:
import sys

In [20]:
a = sys.intern('hello world')

In [21]:
b = sys.intern('hello world')

In [22]:
c = 'hello world'

In [23]:
print(id(a), id(b), id(c))

140568509441776 140568509441776 140569322255792


In [24]:
a == b

True

In [25]:
a is b

True

In [26]:
def compare_using_equals(n):
    a = 'a long string that is not interned' * 200
    b = 'a long string that is not interned' * 200
    for i in range(n):
        if a == b:
            pass
        

In [27]:
def compare_using_interning(n):
    a = sys.intern('a long string that is not interned' * 200)
    b = sys.intern('a long string that is not interned' * 200)
    for i in range(n):
        if a is b:
            pass

In [28]:
import time

In [30]:
start = time.perf_counter()
compare_using_equals(10000000)
end = time.perf_counter()
print('equality', end-start)

equality 2.527094568000166


In [31]:
start = time.perf_counter()
compare_using_interning(10000000)
end = time.perf_counter()
print('equality', end-start)

equality 0.2821131460000288
