### STRING INTERNING

Some strings are also automatically interned:
* Identifiers
  * variable names
  * function names
  * class names
  * etc.

In [1]:
a = "some_long_string"
b = "some_long_string"

In [3]:
a == b # Using the __eq__ operator requires to compare the two string objects character by character

True

### About interning
If we know that the values are interned, then we can assure that a and b (in this example) are indeed the same string
if they both point to the same memory address. In which case we can use the a is b instead - which compares two integers (representation of the memory address). As this comparison is much faster than a char by char comparison.

*Not all strings are auto interned by Python, it will try to intern strings that looks like identifiers*
*This process is not needed in most cases*

But we can force strings to be interned by using the sys.intern() method.

```python
import sys

a = sys.intern('the quick brown fox')
b = sys.intern('the quick brown fox')
```

Examples of usage:
* Dealing with a large number of strings that could have hight repetition, for example, tokenizing a large corpus of text (NLP)
* Lots of string comparisons

In [9]:
a = 'hello world'
b = 'hello world'

In [12]:
import sys

In [13]:
a = sys.intern('hello world')

In [14]:
b = sys.intern('hello world')

In [15]:
c = 'hello world'

In [16]:
print(id(a), id(b), id(c))

4443886704 4443886704 4443894384


In [21]:
def compare_using_equals(n):
    a = 'a long string that is not interned' * 200
    b = 'a long string that is not interned' * 200
    for i in range(n):
        if a == b:
            pass

In [22]:
def compare_using_interning(n):
    a = sys.intern('a long string that is not interned' * 200)
    b = sys.intern('a long string that is not interned' * 200)
    for i in range(n):
        if a is b:
            pass

In [23]:
import time

In [24]:
start = time.perf_counter()
compare_using_equals(10000000)
end = time.perf_counter()
print(f"compare_using_equals() - Took {end - start}ms")

compare_using_equals() - Took 3.047971916035749ms


In [25]:
start = time.perf_counter()
compare_using_interning(10000000)
end = time.perf_counter()
print(f"compare_using_interning() - Took {end - start}ms")

compare_using_interning() - Took 0.3341918099904433ms
