## Strings


In Python, a string is a sequence of characters enclosed in single quotes ('...') or double quotes ("..."). Strings are one of the most commonly used data types in Python, and they can be used to represent text, numbers, or any other type of data that can be represented as a sequence of characters.

Strings are immutable, which means that once a string is created, its contents cannot be changed. However, you can create a new string by concatenating two or more strings together.

Strings are a sequence type, but they are more specialized than generic sequences.
* String is a homogeneous type
* Each element of it is a single character
* They have a lot of string related functionality, available

### How computer encodes characters (ASCI & Unicode)


#### ASCI
ASCII (American Standard Code for Information Interchange) is a character encoding standard that assigns a unique numerical value to each character in the English alphabet, as well as to other characters commonly used in computer systems, such as punctuation marks and control codes. In the ASCII encoding scheme, each character is represented by a 7-bit binary number, which can be easily translated into a decimal or hexadecimal value.

ASCII was first published in 1963 by the American Standards Association, and it quickly became the dominant character encoding scheme for computers and communication systems in the United States and other English-speaking countries. ASCII includes codes for 128 characters, which is sufficient to represent all the letters, numbers, and other symbols used in the English language, as well as some control codes for communication and formatting purposes.

The ASCII encoding scheme has been widely adopted by computer systems and programming languages, and it is still widely used today. For example, when you type a character on your keyboard, the computer translates that character into its corresponding ASCII code before storing it in memory or displaying it on the screen. Similarly, when you write a program in a programming language such as Python, you can use ASCII codes to represent characters in your code.

Here are some examples of ASCII codes for commonly used characters:

- The letter 'A' has an ASCII code of 65 (binary 01000001)
- The digit '0' has an ASCII code of 48 (binary 00110000)
- The space character has an ASCII code of 32 (binary 00100000)
- The exclamation mark has an ASCII code of 33 (binary 00100001)
- The dollar sign has an ASCII code of 36 (binary 00100100)

Overall, ASCII is an important standard for character encoding in computer systems, and it has paved the way for other encoding schemes that support a wider range of characters and scripts. However, ASCII is limited to the English language and does not include characters used in other languages, such as accented letters or non-Latin scripts. For this reason, other encoding schemes, such as Unicode, have been developed to support a wider range of characters and scripts.

#### Unicode

Unicode is a character encoding standard that was developed to support the representation of text in all the world's writing systems. Unlike earlier character encoding schemes such as ASCII, which were limited to the English language and a few other Western European languages, Unicode includes characters from all the world's scripts, including Latin, Greek, Cyrillic, Arabic, Hebrew, Chinese, Japanese, and many others. Unicode provides a unique code point for each character, which can be represented using a variety of encoding schemes, including UTF-8, UTF-16, and UTF-32.

Unicode was first developed in the 1980s by a group of computer scientists who recognized the need for a universal character encoding standard that could support all the world's scripts. The Unicode Consortium, a non-profit organization, was formed in 1991 to oversee the development and maintenance of the Unicode standard.

The Unicode standard includes over 143,000 characters, including letters, digits, punctuation marks, mathematical symbols, and other characters used in various scripts around the world. Each character is assigned a unique code point, which is represented using a hexadecimal notation, such as U+0041 for the Latin letter 'A'.

Unicode supports a variety of encoding schemes, which are used to represent Unicode characters as binary data. The most common encoding scheme is UTF-8, which uses one to four bytes to represent each character, depending on its code point. UTF-8 is widely used on the web and in many computer systems, because it is backwards-compatible with ASCII and can represent all the characters in the Unicode standard.

In conclusion, Unicode is a character encoding standard that provides a unique code point for each character in all the world's writing systems. Unicode is widely used in computer systems and the web, and it has enabled the development of multilingual applications and the global exchange of information.

For a better understanding, watch [this](https://www.youtube.com/watch?v=MijmeoH9LT4) video.

[List of unicode chars](https://www.compart.com/en/unicode/)

**In Python UTF-8 is the default coding for strings**

### `ord` & `chr` functions in Python

In Python, the `ord()` function is a built-in function that returns the integer Unicode code point of a given character. The `ord()` function takes a single argument, which is a Unicode character, and returns an integer that represents the Unicode code point of that character.

Here's an example of how to use the `ord()` function in Python:

```python
# Get the Unicode code point of the letter 'A'
code_point = ord('A')
print(code_point)  # Output: 65
```

In this example, the `ord()` function is used to get the Unicode code point of the letter 'A', which is 65. The `ord()` function works with any Unicode character, including letters, digits, punctuation marks, and other symbols.

The `ord()` function is often used in conjunction with the `chr()` function, which is another built-in function that does the opposite of `ord()`. The `chr()` function takes an integer Unicode code point and returns the corresponding character. Here's an example:

```python
# Get the character corresponding to the Unicode code point 65
character = chr(65)
print(character)  # Output: 'A'
```

In this example, the `chr()` function is used to get the character corresponding to the Unicode code point 65, which is 'A'.

Together, the `ord()` and `chr()` functions provide a convenient way to convert between Unicode characters and their corresponding code points, which is useful for tasks such as text processing, string manipulation, and internationalization.

In [1]:
code_point = ord('A')
print(code_point)

65


In [2]:
character = chr(65)
print(character)

A


### Defining chars in a string with Unicode number

In Python, you can define characters in a string using their Unicode number by using the Unicode escape sequence, which is represented by a backslash followed by the letter "u" and the Unicode number in hexadecimal format.

Here is an example:

```python
# Define a string using Unicode escape sequence
string_with_unicode = '\u03B1 World'

# Print the string
print(string_with_unicode)  # Output: α World
```

**Note that Unicode number for a character is represented in hexadecimal format**

**\u must be followed by exactly 4 hex digits**

In [16]:
string_with_unicode = '\u03B1 World'
print(string_with_unicode)

α World


In [17]:
print(ord(string_with_unicode[0]))

945


In [18]:
print(hex(ord(string_with_unicode[0])))

0x3b1
