In [1]:
#hide

In [2]:
#hide
import utils
utils.hero("Getting To Know 'str' Better")

In [3]:
#hide
utils.h1("Text Representation")

Text representation is used to describe human readable information. To store them, we have a built-in datatype in python called 'string'

In [4]:
#hide
utils.h1("Important information about data type")

As mentioned previously,
For any datatype, one must know these information about them:
1. How to create an object/instance of a data type class?
2. How much memory does it consume?
3. How to find out the memory location where it is stored?
4. Can you make changes in the data once created? (Mutable and Immutable datatypes)
5. What are the common methods that can be applied to the value?
6. How is it represented internally? (small integer caching, string interning)
7. Is it iterable or indexable?

We will answer these questions for **string** datatype.

In [5]:
#hide
utils.h1("String")

In [6]:
#hide
utils.h2("Declaration")

In [7]:
# Declaration
a = "Hello" # Literal declaration, no additional overhead
print(f"{a} {type(a)}")
# OR 
b = str("Hi") # Calling the constructor, additional overhead
print(f"{b} {type(b)}")
print("-"*100)

Hello <class 'str'>
Hi <class 'str'>
----------------------------------------------------------------------------------------------------


In [8]:
#hide
utils.note("The variable 'a' and 'b' only stores the memory location of the objects and the values itself")

In [9]:
#hide
utils.note("Use literal declaration unless intended for type conversion")

In [10]:
#hide
utils.h2("Finding Memory Consumption")

In [11]:
# To find the memory consumption, we can use "sys" module
import sys
print(f"Memory used by the object a={a} is {sys.getsizeof(a)} bytes")
print(f"Memory used by the object b={b} is {sys.getsizeof(b)} bytes")
# For some built-in datatypes we also have a method '__sizeof__' that gives us the size excluding (garbage collector overhead)
print(f"Memory used by the object a={a} is {a.__sizeof__()} bytes")
print(f"Memory used by the object b={b} is {b.__sizeof__()} bytes")
print("-"*100)

Memory used by the object a=Hello is 46 bytes
Memory used by the object b=Hi is 43 bytes
Memory used by the object a=Hello is 46 bytes
Memory used by the object b=Hi is 43 bytes
----------------------------------------------------------------------------------------------------


In [12]:
# Let's try to see how much memory every character adds to the string object
a = "hello !"
for i in range(len(a)+1):
    print(f"{a[0:i]}-->{sys.getsizeof(a[0:i])} bytes")

-->41 bytes
h-->42 bytes
he-->43 bytes
hel-->44 bytes
hell-->45 bytes
hello-->46 bytes
hello -->47 bytes
hello !-->48 bytes


In [13]:
#hide
utils.note("Each additional character adds 1 byte, and the base string object starts with 41 bytes of additional overhead")

In [14]:
#hide
utils.h2("Finding Memory Location")

In [15]:
# To find the memory location of an object, we have a built-in function in python called 'id'
a = "Hello"
print(f"'a' points to {id(a)} | {hex(id(a))}")
print("-"*100)

'a' points to 4549567920 | 0x10f2ce5b0
----------------------------------------------------------------------------------------------------


In [16]:
#hide
utils.exercise(f"Given the memory location {hex(id(a))}, how can you find the value stored in that location?")

In [17]:
# Your solution (hint: Use ctypes module)

In [18]:
# Solution

import ctypes
value = ctypes.cast(obj=id(a), typ=ctypes.py_object).value # This is only for demonstration, use it extreme caution to avoid crash
print(value, type(value))
print("-"*100)

Hello <class 'str'>
----------------------------------------------------------------------------------------------------


In [19]:
#hide
utils.h2("Mutable/Immutable?")

Everything we declare in Python is an object. By object, I mean an instance of a defined 'class'. \
When you create an object and assign it to a variable, the variable stores the memory location of the object or we say the variable points to the object. If the object at that memory location can be modified during runtime then the object and the associated data type is said to be **mutable** otherwise **immutable**.

In [20]:
a = "Hello"
print(f"Variable 'a' points to {id(a)} where the stored value is {a}")
a = "Hi"
print(f"Variable 'a' points to {id(a)} where the stored value is {a}")
print("-"*100)

Variable 'a' points to 4549567920 where the stored value is Hello
Variable 'a' points to 4549568400 where the stored value is Hi
----------------------------------------------------------------------------------------------------


In [21]:
#hide
utils.question("What happened when we changed the value of a to 'Hi'? Is 'str' mutable?")

No, 'str' is immutable even though it seems like we managed to change the value (from 'Hello' to 'Hi') but under the hood when we tried to change the value by reassigning 'a' to 'Hi', it actually created a new object('Hi') at a different memory location and then 'a' started pointing to the new object('Hi')

In [22]:
#hide
utils.exercise("Find out the value stored at the memory location that 'a' was pointing before reassigning")

In [23]:
# Your solution

In [24]:
#hide
utils.h2("String Interning/Caching?")

In [25]:
a = "hello"
print(f"Varaible a points at {id(a)} which has {a}")
b = "hello"
print(f"Varaible b points at {id(b)} which has {b}")
print("-"*100)

Varaible a points at 4544472832 which has hello
Varaible b points at 4544472832 which has hello
----------------------------------------------------------------------------------------------------


Based on above, we observe that both 'a' and 'b' points to different address. Python caches string objects but there is more to it.

In [26]:
def is_cached(a, b):
    print(f"Is '{a}' cached? -> {a is b}")

a = "H"
b = "H"
is_cached(a, b)

a = "1"
b = "1"
is_cached(a, b)

a = "!"
b = "!"
is_cached(a, b)

a = "Hello"
b = "Hello"
is_cached(a, b)

a = "Hello World"
b = "Hello World"
is_cached(a, b)

a = "Hello123World"
b = "Hello123World"
is_cached(a, b)

Is 'H' cached? -> True
Is '1' cached? -> True
Is '!' cached? -> True
Is 'Hello' cached? -> True
Is 'Hello World' cached? -> False
Is 'Hello123World' cached? -> True


As per the observation:
1. Single character is always interned
2. 

In [27]:
#hide
utils.nav("./04-04.html", "")