# Main Python built-in Data Types

There are different data types in Python (click on the link for more information about the data type) and [here](https://docs.python.org/3/library/stdtypes.html) for the full list:

- [Numeric Type](http://thepythonguru.com/python-numbers/)  
    - Integer 
    - Float  
    - Complex Number  
- [Sequence Type](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range)
  - [List](http://thepythonguru.c"om/python-lists/)
  - [Tuple](http://thepythonguru.com/python-tuples/)
  - Range
- [Text Sequence Type](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)
  - [String](http://thepythonguru.com/python-strings/)
- [Set Type](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset)
  - Set
- [Mapping Type](https://docs.python.org/3/library/stdtypes.html#mapping-types-dict)
  - [Dictionary](http://thepythonguru.com/python-dictionaries/)
- Boolean Values
  - Bool


Useful type related built-in function:
- `type()` *is used to determine the type of data type.*  [read more](https://docs.python.org/3/library/functions.html#type)



To get the type of a variable, we can use the `type()` function and pass as parameter the variable we want to know.
Example:

**What will happen if I execute the following block of code?**

In [None]:
my_dog = "Nova"
type(my_dog)

str

# Type Casting

Type casting, also known as type conversion, is the process of converting a value from one data type to another. In Python, you can perform type casting using built-in functions, such as int(), float(), str(), list(), tuple(), and set().

Examples:

In [None]:
# Casting float to int
float_number = 3.14
int_number = int(float_number)  # Result: 3
print(int_number)

3

In [7]:
# Casting int to float
int_number = 5
float_number = float(int_number)  # Result: 5.0
print(float_number)

5.0


In [None]:
# Casting float to string
float_number = 42.0
string_number = str(float_number)
print(string_number) # "42.0"

42.0


### Question: What will happen if we cast string to `list`, `tuple` or `set`?

In [14]:
set(my_dog)

{'N', 'a', 'o', 'v'}

## Type Casting Pitfalls

Type casting is powerful, but it can also fail or behave in ways that are not obvious.

Two common issues:

1. **Invalid conversions**
   Trying to convert a non-numeric string to an `int` or `float` will raise a `ValueError`.

2. **Float to int truncation**
   Converting a `float` to `int` truncates the decimal part (it does *not* round).

When reading data from files or APIs, we often need to handle these cases carefully.

In [15]:
# Example 1: invalid conversion
text_values = ["10", "3.14", "abc"]

for t in text_values:
    print(f"Trying to convert {t!r} to int:")
    try:
        number = int(t)
        print("   Success:", number)
    except ValueError as e:
        print("   Failed with ValueError:", e)

print("\nFloat to int truncation:")
floats = [3.1, 3.5, 3.9, -2.7]

for f in floats:
    print(f"float: {f} -> int: {int(f)}")

Trying to convert '10' to int:
   Success: 10
Trying to convert '3.14' to int:
   Failed with ValueError: invalid literal for int() with base 10: '3.14'
Trying to convert 'abc' to int:
   Failed with ValueError: invalid literal for int() with base 10: 'abc'

Float to int truncation:
float: 3.1 -> int: 3
float: 3.5 -> int: 3
float: 3.9 -> int: 3
float: -2.7 -> int: -2


## `None` and Missing Values

`None` is a special value in Python that represents “no value” or “missing value”.

- It is often used when:
  - A function does not return anything explicitly.
  - A value is unknown or not yet computed.
  - A dictionary key is missing and we use `dict.get()` without a default.

`None` is **falsy** in boolean contexts.

We typically compare to `None` using `is` or `is not`:

```python
if value is None:
```


In [None]:
result = None

print("Initial result:", result)
print("Is result None?", result is None) # This is an expression that is being evaluated
print("Is bool(result) truthy?", bool(result))

record = {"name": "Nova", "age": 5}
print("\nAccessing record['country'] with .get():")
country = record.get("country")  # returns None by default
print("country:", country)
print("Is country None?", country is None)

print("\nUsing a default when the key is missing:")
country_with_default = record.get("country", "Unknown")
print("country_with_default:", country_with_default)

Initial result: None
Is result None? True
Is bool(result) truthy? False

Accessing record['country'] with .get():
country: None
Is country None? True

Using a default when the key is missing:
country_with_default: Unknown


# Type Mutability

Mutability refers to the ability of an object to change its state or content after it has been created.  

- **Mutable objects**: These are objects whose content or state can be changed after they are created. Examples include **lists**, **sets**, and **dictionaries**. 

- **Immutable objects**: These are objects whose content or state cannot be changed after they are created. Examples include **integers**, **floats**, **strings**, **booleans**, **tuples**, and **frozen sets**.

# Booleans and Truthiness

Python has a special data type for logical values: **booleans**.

- A boolean can be either `True` or `False`.
- Booleans are the result of comparisons, like `3 > 2`, or used directly in `if` statements.

In Python, many other values are treated as **truthy** or **falsy** when used in a boolean context:

- Falsy values include:
  - `0`, `0.0`
  - Empty strings: `""`
  - Empty collections: `[]`, `{}`, `set()`, `()`
  - `None`
- Everything else is generally truthy.

Understanding truthiness is important for writing clean conditions in `if`, `while`, and comprehensions.


In [17]:
values = [0, 1, "", "hello", [], [1, 2], {}, {"key": "value"}, None]

for v in values:
    print(f"Value: {repr(v):>12} | bool(value): {bool(v)}")

Value:            0 | bool(value): False
Value:            1 | bool(value): True
Value:           '' | bool(value): False
Value:      'hello' | bool(value): True
Value:           [] | bool(value): False
Value:       [1, 2] | bool(value): True
Value:           {} | bool(value): False
Value: {'key': 'value'} | bool(value): True
Value:         None | bool(value): False


# Strings -> `str`  
**Strings are sequences of characters. In Python, strings are immutable.** 
 
The object (data type) string is very important for data analysis. It has a number of important and very useful methods for string processing and parsing, and there is a package (called re) for string support for <a href= http://www.rexegg.com/regex-quickstart.html>regular expressions</a>  

Some common String object methods:

`len(string)`: Returns the length of the string.  
`string.upper()`: Returns a new string with all characters in uppercase.  
`string.lower()`: Returns a new string with all characters in lowercase.  
`string.replace(old, new)`: Returns a new string with all occurrences of old replaced with new.  
`string.split(sep)`: Returns a list of substrings separated by the specified separator sep. If no separator is provided, it splits on whitespace.  

In [1]:
test_string = "CorePython/Python_introduction_2.ipynb" # Define a test string

In [3]:
test_string

'CorePython/Python_introduction_2.ipynb'

In [4]:
test_string = test_string.replace(".ipynb", ".py")

In [5]:
test_string

'CorePython/Python_introduction_2.py'

In [7]:
test_string.split(".")[1]

'py'

In [None]:
# string[start=0:end=n-1:steps=1]
test_string

'CorePython/Python_introduction_2.py'

In [None]:
len(test_string) - 1

34

In [21]:
test_string[35 - 3: 35]

'.py'

In [24]:
test_string[-3:]

'.py'

In [33]:
test_string[::-1]

'yp.2_noitcudortni_nohtyP/nohtyPeroC'

## String formatting

.format()
f-strings
string interpolation with % operator

In [9]:
"string to format {} {}".format("here", 3)

'string to format here 3'

In [36]:
name = "Roberto"
string = "Here new string"

f"string to format {30} {name}"

'string to format 30 Roberto'

## Comments and Docstrings

In [12]:
# This is a comment

def new_function():
    """This is a new function"""
    print("I'm a new function")

In [13]:
help(new_function)

Help on function new_function in module __main__:

new_function()
    This is a new function



### Style guidelines for comments and docstrings in PEP 257

# Collections

# List -> `list` or `[]`

**Lists are ordered, mutable sequences of elements.**  

`len(list)`: Returns the length of the list.  
`list.append(item)`: Adds an item to the end of the list.  
`list.extend(iterable)`: Appends the elements of an iterable (e.g., list, tuple, string) to the list.  
`list.insert(index, item)`: Inserts an item at the specified index.  
`list.remove(item)`: Removes the first occurrence of the specified item from the list.  
`list.pop(index)`: Removes and returns the item at the specified index. If no index is provided, it removes and returns the last item in the list.  

# Tuple -> `tuple` or `()`
**Tuples are ordered, immutable sequences of elements.**  

Tuples are relationships between two or more objects (like lists) however, once created, tuples cannot be modified, i.e. adding, deleting, etc. is not allowed. They can be useful when reading data and preserving their characteristics. 

`len(tuple)`: Returns the length of the tuple.  
`tuple.index(item)`: Returns the index of the first occurrence of the specified item in the tuple. Raises a ValueError if the item is not found.  

In [None]:
# Convert a list to a tuple and explore its methods
# Try to apply the list methods to the tuple and explain

# Set -> `set`  
**Sets are unordered collections of unique elements.**  

`len(set)`: Returns the number of elements in the set.  
`set.add(item)`: Adds an item to the set.  
`set.remove(item)`: Removes the specified item from the set. Raises a KeyError if the item is not found.  
`set.discard(item)`: Removes the specified item from the set if it is present.  
`set.union(set2)`: Returns a new set containing all items from both sets.  
`set.intersection(set2)`: Returns a new set containing items present in both sets.  


# Dictionary -> `dict` or `{}`

**Dictionaries are unordered collections of key-value pairs.**  

Dictionaries are the hashes in python. They are also objects and allow to store a key-value relationship, as you know the keys in a dictionary or hash must be unique, while the values can be repeated. The flexibility of dictionaries in python allows to have dictionaries that *point* to other data structures such as tuples, lists and even other dictionaries.

Note, although it may sometimes appear that the .keys() and .values() methods of a dictionary retain some order and even correspond to each other, simply calling them does not ensure that element 'n' of one is equivalent to element 'n' of the other. 

`len(dict)`: Returns the number of key-value pairs in the dictionary.  
`dict.keys()`: Returns a view object displaying a list of all keys in the dictionary.  
`dict.values()`: Returns a view object displaying a list of all values in the dictionary.  
`dict.items()`: Returns a view object displaying a list of all key-value pairs in the dictionary as tuples.  
`dict.get(key, default)`: Returns the value associated with the specified key. If the key is not found, it returns the default value (or None if no default value is provided).  
`dict.update(dict2)`: Updates the dictionary with the key-value pairs from another dictionary, overwriting existing keys with new values.  
`dict.pop(key, default)`: Removes and returns the value associated with the specified key. If the key is not found, it returns the default value (or raises a KeyError if no default value is provided).  

## Simple Data-Engineering Example: Parsing a CSV Row

In real data engineering tasks, we often receive data as **text** (for example, from CSV files).

We then need to:

1. Split the line into fields.
2. Convert some fields to the correct types (e.g., `int`, `float`).
3. Handle potential errors or missing values.

Here is a small example where we parse a csv-style row into a dictionary.


In [37]:
raw_line = "42, Nova, 5.5, dog"
print("Raw line:", raw_line)

parts = [p.strip() for p in raw_line.split(",")]
print("Parts:", parts)

# Unpack into named variables
id_str, name_str, weight_str, species_str = parts

try:
    record = {
        "id": int(id_str),
        "name": name_str,
        "weight": float(weight_str),
        "species": species_str
    }
    print("Parsed record:", record)
    print("Types:", {k: type(v) for k, v in record.items()})
except ValueError as e:
    print("Error while parsing:", e)

Raw line: 42, Nova, 5.5, dog
Parts: ['42', 'Nova', '5.5', 'dog']
Parsed record: {'id': 42, 'name': 'Nova', 'weight': 5.5, 'species': 'dog'}
Types: {'id': <class 'int'>, 'name': <class 'str'>, 'weight': <class 'float'>, 'species': <class 'str'>}


> Content created by [**Carlos Cruz-Maldonado**](https://www.linkedin.com/in/carloscruzmaldonado/).  
> I am available to answer any questions or provide further assistance.   
> Feel free to reach out to me at any time.