# Python Optimizations - String Interning

Some strings are interned but don't count on it.

Identifiers are interned:

- variable names
- function names
- class names
- etc.

Some string literals may be automatically interned:

- string literals that look like identifiers (variable naming rules)
- starts with a digit

Interning is about (speed, and possibly memory) optimization.  

Not all strings are interned but you can force strings to be interned by using the sys.intern() method.

When should you do this?

- repetive string comparisons where strings hardly changes
- lots of string comparisons

In [1]:
a = "hello"
b = "hello"

In [2]:
a is b

True

In [3]:
a = "hello world"
b = "hello world"

In [4]:
a is b

False

In [5]:
a = "_this_is_a_string_that_follows_rules_for_variable_naming"
b = "_this_is_a_string_that_follows_rules_for_variable_naming"

In [6]:
a is b

True

# Interning example

In [7]:
import sys

In [8]:
a = sys.intern("Hello World")
b = sys.intern("Hello World")
c = "Hello World"

In [9]:
a is b # Interned strings

True

In [10]:
a is c # c was not interned

False

In [11]:
a == c # comparing strings character by character

True

# Example comparing non-interned vs interned

In [12]:
def compare_using_equals(n):
    a = "a long string that is not interned" * 200
    b = "a long string that is not interned" * 200
    
    for i in range(n):
        if a == b:
            pass

In [18]:
def compare_using_interning(n):
    a = sys.intern("a long string that is not interned" * 200)
    b = sys.intern("a long string that is not interned" * 200)
    
    for i in range(n):
        if a is b:
            pass

In [15]:
import time

In [16]:
start = time.perf_counter()
compare_using_equals(10_000_000)
end = time.perf_counter()
f"Equality: {end - start}"

'Equality: 3.8765763060655445'

In [20]:
start = time.perf_counter()
compare_using_interning(10_000_000)
end = time.perf_counter()
f"Interning: {end - start}"

'Interning: 0.6557386880740523'